Everyone Had an Opinion. Almost None of It Was Right for My Baby or My QE Org.

It starts before you even bring your baby home. The unsolicited advice.

Make sure to feed her on a schedule. No, feed on demand. Swaddle him tight. Don't swaddle, he needs to stretch out! Let them self-soothe. NEVER let them cry. Introduce a pacifier. Avoid a pacifier. Sleep when the baby sleeps. Clean while the baby sleeps to stay ahead. Use this app. Follow this method. Read this book. My cousin did it this way. My pediatrician said something completely different...

Everyone has an opinion. Most of them contradict each other. Almost all of them are delivered with complete confidence.

For the first few weeks, you try to follow all of it. You read the books, download the apps, take notes on the schedules. And then somewhere in the fog of sleep deprivation and formula math, you realize something: your baby didn't read any of those books. What works for someone else's child in someone else's home in someone else's life is not a guarantee of anything in yours. You stop trying to find the universal answer and start paying attention to the actual baby in front of you.

That's when things start to click.

QE metrics have exactly the same problem.

Everyone has a framework. Everyone has a number you should be hitting. Aim for 80% automation coverage. Track your defect escape rate. Measure mean time to resolution. Report on test execution time. Monitor your flaky test percentage. The list is endless and the opinions are confident and almost none of it was designed with your team, your product, or your stage of growth in mind.

I've seen teams chase automation coverage percentages while automating the wrong things entirely. I've seen orgs report zero escaped defects and interpret it as a win, when really it just meant their escalation process was broken and bugs were getting lost. I've seen dashboards full of numbers that looked healthy and told you nothing useful.

The problem isn't that these metrics are wrong. It's that metrics without intention are just numbers. They don't tell you what to do, they don't drive behavior, and they don't move anything in the direction you actually need to go.

When I build a metrics framework for a QE team, I start with three questions. Not "what should we measure" but something closer to first principles.

What behavior are we trying to drive?

Metrics shape behavior whether you intend them to or not. A team measured purely on test execution speed will find ways to go faster, and some of those ways will compromise coverage. A team measured on defect volume will file more defects, which may or may not reflect actual quality improvement. Before you decide what to track, decide what you want your team, and the broader engineering org, to actually do differently. The metric should make the right behavior visible and the wrong behavior obvious.

What needle do we need to move?

Every QE org has a specific problem it's trying to solve right now. Maybe releases are taking too long because testing is a bottleneck. Maybe production incidents keep clustering around the same features. Maybe the team is reactive and you're trying to shift them left. The right metrics for a team in one of those situations are different from the right metrics for a team in a different one. A metric that was exactly right six months ago might be irrelevant today because the needle has already moved. Your metrics should reflect where the org is, not where some other org was when they designed their dashboard.

If we track this metric, what does success look like, what does failure look like, and what are the levers we'll pull based on what we see?

This is the one that separates useful metrics from decorative ones. If you can't answer all three parts of this question for a metric you're tracking, you probably shouldn't be tracking it yet. Success should trigger a specific response. Failure should trigger a different specific response. And you should know in advance what those responses are. A metric you track but never act on isn't a metric. It's wallpaper.

There is no universal scorecard for a healthy baby. There is no weight percentile, sleep schedule, or feeding frequency that applies to every child in every home. Pediatricians will give you ranges and guidelines, and those matter, but the work of figuring out what your specific baby needs is yours to do. You do it by paying attention, by noticing what works and what doesn't, by adjusting when something stops making sense.

There is no universal scorecard for a healthy QE org either. The metrics that transformed one team might be meaningless noise for another. The dashboard that looks impressive in a board meeting might be hiding the thing you actually need to know.

The work is the same in both cases. Stop looking for the universal answer. Start paying attention to what's actually in front of you.

What are you measuring right now, and do you know what you'll do when the number moves?