Monthly Archives: April 2008

Root cause for problem escape

Being a test manager, I’ve been in the position a few times to explain why the test team did not catch a problem that our customers found. Early in my career, I would tend to defend the test team or test plan. However, that is not a very effective response to the situation. At the end of the day, we should be focused on delivering the highest quality products for our customers. Escapes happen, and we should use each as a learning opportunity.

In examining the escapes, I’ve rarely found a case where the answer was very simple. Its usually a “perfect storm“. A perfect storm is where multiple factors pile up to cause a critical problem. Its useful to spend a little effort in examining the root causes

To help guide the investigation, I’ve developed the following set of questions which help guide a comprehensive root cause. These questions help guide the review in more productive direction than finger pointing.

Describe Problem: (QA)

Describe circumstances and consequence of the problem.This description should be a summary of the information, with enough description for readers not familiar with the issue.Include a reference to CR number, or other tracking information.

Where was the problem introduced? (dev)

Describe the phase in which this was introduced, i.e. requirements, design, code, build, etc.

Describe the root cause of the problem: (dev)

For the phase where the problem was introduced, describe what actually happened. For example, the requirements could be missing, incorrect, unclear. Designs might not have provided for error handling, or consider the performance requirements. Coding errors may be logic, missing table entry, incorrect logic (“and” instead of “or”), etc.

The root cause is often difficult to get to. One tool to use is the “5-whys” approach. This is where you ask the question “why” 5 times or until you get to the root cause.

What reviews were held for the phase? (dev)

For example, requirements review for missing/incorrect requirements.Code reviews for coding errors.

How and why did the problem escape testing? (QA)

Examples, test case doesn’t exist, test was not run, defect introduced after test passed, etc. Also, describe why (for example, why didn’t a test exist, etc.)

What could be done to prevent this type of error in the future? (team)

What process improvement could have prevented this problem from occurring?

What could be done to find this type of problem in the future? (team)

What test or review should be implemented?Examples are adding a test case, implementing a new automated test, adding new exit criteria to peer review, etc.

Answering these questions does take some effort. If you find your self in the unfortunate position of having too many escapes to perform the full analysis, choose a subset driven by the severity of the problems. Start somewhere.

One customer was very helpful, finding many of our defects for us. To get some control of the situation, I performed the root cause analysis on all critical bugs. In a few months time, we had just a few Critical bugs that I extended the analysis to Major bugs. That customer gave us a letter grade (like in school). We went from an F to an A, in part because of this analysis.