Bugs: To Track or Not To Track

Every now and then, I hear a debate about whether we should track bugs or just focus on fixing them.

One point of view is that tracking bugs is a waste of time.  The focus should just be on fixing the issues, as quickly as they happen.  You don’t build up a large backlog of bugs if you just fix them as they are found. This is argued as a healthy mindset to keep quality high at all times.  Tracking bugs, and keeping a drag of legacy bugs around, is wasted effort because it cost time and money and does nothing itself to improve the customer experience.

On the flip side, tracking bugs is vital. You need to make sure that bugs don’t fall through the cracks, and the bug fixes have proper verification and regression testing, and you have a set of data about bugs to make improvements in the process.

My point of view, anytime there is a debate between doing “this” or “that”, the right answer is usually “both”. There are situations where simplicity and efficiency are most appropriate and situations where tracking and data collection are appropriate.

Testing and review are a feedback loop. Someone creates something and someone evaluates that work and provides feedback. Some of these loops are “inner loops”, where the feedback cycle is very quick and very direct. The “outer loops” are longer, have more people involved.

Test Driven Development illustrates an example of an inner loop, where the cycle is “write a failing test”, “code until the test passes”, then “refactor”. Its clearly inefficient to write bugs for the failing tests, the developer us using this method to develop to the code – he/she doesn’t need to track the (intentional) bugs.

The customer support process is an obvious example of an outer loop. When customers find bugs and report them to us, we should make sure those bugs are addressed with the proper priority and that we do a root cause analysis to learn from the mistakes.

This stylized diagram illustrates the relationship between the TDD inner loop and the customer support outer loop.

Stylized SDLC showing an inner loop of TDD and an outer loop of Customer Support

Stylized SDLC showing an inner loop of TDD and an outer loop of Customer Support

Here are some examples of practices used in development and testing, along with how I generally recommend we track the issues and how we capture the learning from the mistakes. Of course, your mileage may vary based on your industry, product, and any regulatory requirements.

Practices Bug tracking Learning
Inner Loops

·      Personal Code review

·      Peer review/buddy check

·      Unit tests & debugging

·      Failing tests in TDD

·      Parallel testing with a buddy

No formal tracking, just fix the bug Learning and improvement is a personal endeavor
Medium Loops

·      Failures in tests on feature branch (CI, Build verification, etc.)

·      Bugs found inside a sprint – on a story being implemented

·      Non-real-time code review (using a tool, email, etc.)


Lightweight tracking. A simple list on a wiki/whiteboard, post-it notes, or some lightweight tracking with just open/closed state. Learning and improvement happen as a team, usually using the sprint retrospective.
Outer Loops

·      Failures in tests on trunk/main branch (CI, Build verification, etc.)

·      Bugs found after a sprint (regression testing, hardening tests)

·      In general, bugs found outside the immediate dev team for those features

·      Customer-reported bugs

·      Bugs found during certification tests

·      Bugs found by outside testers (crowdsource, off-shore, etc.)



Bug tracking system that has a workflow and meta-data like priority, severity, state, and the normal fields. Capturing RCA information in the tracking system is useful. Learning and improvement is part of the continuous improvement program. Root cause Analysis for the important bugs (customer found, etc.)

These guidelines have been formed by my experiences, and are meant to balance the best quality, continuous learning, team empowerment, and efficiency.



The Software Quality Engineering Leader

4 P’s and a T

I frequently describe the role of a Quality Engineering Leader as 4-P’s and a T.  The 4 P’s are People, Product, Process, Project, and the T stands for Technology. This is a good prompt to write it down.

I’ve developed this model leading quality engineering teams in several Silicon Valley organizations, which lead to several common elements in the context.  We are building software, services, and products in a competitive market, servicing many customers. We use agile development methodologies and iterative releases. Our focus is delivering high quality, at speed. To accomplish these goals, we need to build quality in rather than test it in. We believe that prevention and finding issues early is better than finding them late.

The Quality Engineering Leader is a close partner with the development leader, the product owner, and customer support team. The Quality Engineering Leader is someone who is passionate about delivering great quality outcomes to our customers. They will bring an engineering mindset – which means to help build quality in at all stages of the Software Development Life-cycle. They will also be a strong leader, with the influence to paint a vision of quality which leads to change in teams that are not necessarily in their direct control.

The Quality Engineering leader demonstrates a balance across several dimensions:

People Leadership: Able to attract and recruit strong Quality Engineers, and help them be the best that they can be professionally. Help deploy the right people to the projects, so the projects are successful, while finding the right projects for each person to help them develop their career.

Product Advocacy: Understand how our products improve our customer’s lives & businesses – and help the team build the right offering in addition to building it right. Being the customer advocate in the development squads helps us build the products that our customers love.

Process Leadership: Be current on the latest quality engineering practices and be able to apply the right practice to our situation. Have a well thought out strategy for when to automate, how to automate, where to automate, and what is best left to the humans. Another aspect of process leadership is to help the engineering teams repeat success again and again instead of relying on heroics.

Project Management: Organize our work to focus on the most important items, and be transparent to our stakeholders. . Help the wider team make the necessary trade-offs between time, features, and investment. Track progress of the work and resolve the inevitable issues that pop up in every project.

Technology Focus: Able to understand our technologies sufficiently to lead an engineering team, helping the team make the best decisions when it comes to technology, and ask the right questions. Stay current on the emerging technologies and platforms that are important to our products.

The typical front-line leader will be solid in 2-3 of these dimensions and developing/growing in the balance.

This is an edited version of my LinkedIn article with the same title.

Software Root Cause Analysis: 3 Questions to Answer

Image of explosion represents things going wrong on a project.

Sometimes, things don’t go as planned.

Here are three questions that I like to answer when performing a root cause analysis for escaped bugs:

  1. How was the bug introduced in the first place?
  2. How did we not catch it earlier?
  3. What are we doing to prevent this problem in the future?

For the first two questions, I have a handy template for performing root cause analysis.

Generally, for the 3rd question, what are we doing to prevent the problem, we have short and longer term solutions.  In the short term, we should add the appropriate test or check that missed the problem in the first place.  That is the answer to the specific question for that particular issue.

For the longer term, we collect data about the escaped causes and reasons for escape.  We collect that data in the bug tracking system as two fields with categories.  When we have enough data, we can examine trends.  I usually start with a simple Pareto analysis, showing the top few causes/reasons. Then work with the team to ask how can we improve our processes/practices.  Its often useful to filter the Pareto analysis to the most painful bugs (those found by customers, high severity, etc.)

Please drop a comment below and let me know what you do for root cause?

By Photo courtesy of National Nuclear Security Administration / Nevada Site Office [Public domain], via Wikimedia Commons

When is cutting corners the right answer?

Glass shelf with a sharp corner and a label saying "caution sharp corner"

Cutting corners is the right answer when your problem is sharp corners

When is cutting corners the right answer?  When the problem is sharp corners.

A key concept in quality engineering is “fail-safe” design.  I’ve written about fail-safe design in the past, regarding software controlled rifles.  This example, a sharp corner, is a much more simple, and visual, example.

In the US, we have lots of litigation. I’m sure this label was applied to the sharp corner to point out the danger to customers. Also, maybe to protect from lawsuits if someone gets injured.  A better solution would be to grind down that corner so it isn’t a hazard.

Fail safe design means to build your systems in a way that, if they fail, they fail in a safe manner.  In this case, if someone bumps into this shelf, they shouldn’t get cut.

Coming back to software, what if you have a cron job that does some cleanup.  What happens if that job fails?  Does it leave data behind which might consume your storage?  Would any of that data be Personally Identifiable?

Using a FMEA – Failure Mode and Effects Analysis is a good method to identify these potential failures and ask, does the system fail in the most safe manner?