Vanity Testing Metrics

This is a preview of a topic that I will cover in the upcoming talk, Testing Metrics – Choose Wisely at STPCon.

Vanity metrics are popular in marketing. These are metrics that allow you to feel good, but aren’t directly actionable, and are not related to your (true) goals.  Vanity metrics are also easily manipulated.  An example would be a hit counter, measuring page views, on a web site.  What would really matter for a business web site would be the conversion rate (how many visitors actually purchase) or revenue per customer.

I’ve seen marketing campaigns that add a lot of page views, but actually cause a decrease in conversion rate. The advertising may find more viewers, but if the people are less interested in your product, its not really useful to drive up traffic.   (and who knows if those viewers are really people and not bots) Measuring the impact of advertising by measuring revenue or number of visitors that become customers is more powerful.

An example in software testing is measuring the Average Age of bugs.  You might start a campaign to reduce bug backlog or improve the velocity of fixing the bugs, and a measure might be the average age.  However, what you are really looking for is a quicker response to every bug, not the average bug.

The average age of bugs chart from JIRA shows trends in the average age, over time.

The average age of bugs chart from JIRA shows trends in the average age, over time.

This metric is often misleading in these efforts, as really old bugs can be fixed or closed and dramatically reducing the average age.  In the chart above,  the dramatic downward swings actually came from closing only a couple of bugs. Those bugs weren’t fixed, they were closed as obsolete.  But, they were open in the backlog for several years, so closing them had a dramatic impact on the average age.  Closing those, however, didn’t tell us anything about the responsiveness to current bugs.

Instead of Average age, tracking the median age.   The median measure would be much less affected by really old bugs.  Medians are a way to prevent outliers in having outsized impact on your metrics.  Even better, a more direct measure of our goal to improve velocity might be to set a target timeframe, say 30 days – then measure the percentage of bugs that are fixed within that target.

These views will more directly measure your goal (improved velocity) and be less susceptible to manipulation.


The Test Leader

Katrina Clokie has a really good blog post describing the difference between a test leader and a test manager.  The test leader influences testing across the organization without direct positional authority.  I’d suggest one tweak, the test manager role should also include the leadership qualities of the test leader.

Especially in Agile, but applicable in any life-cycle, everyone tests.  Developers should be testing their code, and product managers (or product owners) should be participating in the acceptance testing. Everyone has a role in building high quality, even if the activities are not directly tests.  The test manager can play an influential role in these other groups, in addition to leading their direct team.

I’ve been advocating the role Quality Leader instead of Test Manger to stress the influential capabilities of test managers:


Leading from the front

Photo Credit: Olivier Carré-Delisle


Bill Gates on Automating Tests

OK, he wasn’t talking specifically about automating tests. But he talk about automating the jobs that can be automated and redirecting the human effort towards things “where human empathy and understanding are still very, very unique.”

His topic was taxing the output of robots and using those funds to train the displaced workers towards those new roles.

“So if you can take the labor that used to do the thing automation replaces, and financially and training-wise and fulfillment-wise have that person go off and do these other things, then you’re net ahead.”

Read the full article, and see the video here. 

Testing Efficiency – A Better View

The ISTQB defines Testing Efficiency as the number of defects resolved over the total number of defects reported.  This is meant to measure the test team by the relevance of the bugs they report.  A low efficiency would imply that the test team is reporting many bugs that are not worth fixing.

This view is pretty limited and simplistic.

A better approach would be to measure the “resolution category” for the bugs that are closed. The bugs, when resolved, are marked with a category like “fixed”, “Cannot Duplicate”, or “Duplicate of another bug”.  The categories can be graphed on a pie chart:

Pie Chart showing Resolved Bugs by Category

Pie Chart showing Resolved Bugs by Category

Now, you can have a conversation about the bugs being reported, and whether improvements are warranted. We had this exact issue in a team that I lead a while back.  We made a few adjustments:

Duplicate – we upgraded the bug tracking system to improve the search function. This allowed the testers to search for duplicates before submitting a new bug.  If they found a bug already, they reviewed it to see if they could add any new information.

Cannot Duplicate – for these, we did bug huddles with the developers. Showing them a demo of the bug before writing/submitting the bug. This practice really helped get the bugs fixed faster, by eliminating the back-forth that sometimes happens.

Business Decision – Many of these were closed by the developers without involving the Product Manager in the decision. We added the PM as the person to “verify” bugs closed with this resolution to make sure they agreed.

Pie chart after improvements.

Pie chart after improvements.

Want to learn more about leadership in software testing? Check out the Software Leadership Academy.