Category Archives: Software Leadership

What Pokémon Go Teaches Us About SaaS Quality

By far, the most popular mobile game this year has been Pokémon Go. As you would expect, with a release of this magnitude, there have been some glitches. These issues, and the reaction from the player base, illustrates two core tenants of software quality for SaaS offerings:

Availability is vital, the most important thing to get right in SaaS
Regression bugs, which remove functionality, are especially painful for customers

Pokémon Go probably doesn’t require an overview, but just in case… Pokémon Go is a mobile, augmented reality game where players travel the real world to find creatures (called Pokémon), resources to help play the game (at Pokéstops), and places to compete with other player’s Pokémon (Gyms). Players are called trainers, and the objective is to train the most powerful Pokémon and to collect all of the different types.

Niantic, the company behind Pokémon Go, has rolled-out Pokémon Go in many countries very quickly. In terms of popularity, the launch was extremely successful. In the US, Pokémon Go was released on July 6^th, and by July 11^th, the number of daily active users surpassed Twitter. Take another look at those dates: it took only 5 days to go from 0 to Twitter scale.

Availability is the most important feature

They were not prepared for that load and the servers showed it. Customers responded in kind with reviews and very public complaints:

Even 5-star reviews complain about availability:

Once the servers seemed stable in the US, Niantic released Pokémon Go in new countries, only to bring the instability back for existing users. As usual in these times, there were even new Twitter accounts created to reflect the frustration:

Players were frustrated because many of the resources in the game have timers associated with them. The player would spend on a resource, only to have the servers go offline.

Bugs are bad, regression bugs even worse

Many players are also complaining about the Nearby Pokémon feature that was available in the first launch. This feature gave players an indication on how close a Pokémon is to their current location, so they could hunt the Pokémon down. This was represented by the number of footprints next to the image, the fewer the footprints, the closer the creature:

One of the updates introduced a bug, which caused all of the nearby Pokémon to be shown with 3 footprints, regardless of actual distance:

In the latest update, Niantic removed the footprints. I’m assuming they followed a solid software principle that “its better to show no information rather than false information”, however the players were livid. Players, rightly, were frustrated that a feature was removed from the app.

In hindsight, the players would have had a better experience if the footprint tracker was not included in the early launch, and instead added later when all of the glitches were fixed. This feature was a relatively small part of the game, but removing it intensifies the player’s reaction. Regression bugs that remove functionality are very painful in SaaS offerings.

Overall, Pokémon Go is so far a great success, and my guess is that Niantic will fix these issues. And once they are fixed, they will be forgotten. The players are very passionate about the game and they will continue to play. But, it will take some time for the players to forget, and the app store ratings will remain.

The popularity of Pokémon Go will allow Niantic to survive these glitches, but your app might not have as much of a passionate customer base. For any other app/game, how many of the players would have deleted it and never returned?

Girly Code: She++ Documentary

Today, I saw a great, and inspiring, documentary on women in software engineering. It’s called She++. The documentary focuses on the need for more software engineers in the near future, the lop-sided distribution between men and women in the field, and several societal constraints that women feel before going into computer science.

Enjoy…

she++: The Documentary from Ellora Israni on Vimeo.

The Value of Exploration in Software Testing

This is a story about how I came to value Exploratory testing. But first, a story about this story: Every Saturday morning, we go through the same ritual. I have a couple cups of coffee, and Jake, my German Shepherd watches my every move, and follows me everywhere, waiting for me to put on my shoes. He knows we are going on a walk.

Today, on our favorite trail, he started to limp a little. Instead of the full 3.5 mile hike, I turned off a side trail to loop around and cut it short. This new trail (new to us) was great. We found a small stream and surprised (and were surprised by) a flock of wild turkeys along the ridge line. If we kept to the standard loop, we would not have found this cool area of the hill.

Exploring with Jake & Rose

While walking, I remembered another value of exploration, in software testing. Our test team had naturally evolved a style, where we would perform targeted and purposeful “ad-hoc” testing on a new feature until we were comfortable enough with the functionality. Then, run the official test cases for “score”. Once we had the score, we would fill in the rest of available time trying different ways to break the system. These practices worked for us, and I really didn’t think about it until one day presenting status to the executive team.

30Bugs5TestFailures

The status (recreated here to protect the innocent) for this week showed 327 test cases passing with 5 failures. We also opened 30 new bugs, and received 43 bug fixes. A pretty average week. However, one of the directors asked a question. How could we have only 5 test failures and yet find 30 bugs? His point of view was that our written tests should find all of the bugs, and if we were finding most of the bugs through other means, this pointed to inadequacy of our written tests. I explained how this happens, the written test cases actually find few issues. Most of the bugs are found with the ad-hoc and negative tests, and especially in system with multiple test endpoints (several APIs, several UIs), its prohibitively expensive to script all of the possibilities. This conversation piqued my curiosity, though. Were we behaving optimally? Should we invest more in test case definition?

One thing that I came to realize, our test cases were generated using requirements and design documentation. The developers also used the same requirements and design documents to create the code. Developer testing will generally ensure proper operation, at least for the happy path. So, test cases generated by the test team, from the same original source, tended to have a high pass rate. Problems and bugs with the system had to be found through other means.

After some research, I found that our practices had a name, Exploratory Testing, and ET is used by many organizations in different industries and software types. The exploratory testing approach emphasizes the creative engagement by the tester (as opposed to following a test script), to contemporaneously design and execute tests. The test team was not following the test scripts, but using their experience, creativity, and observations to find new tests to try.

I valued the time we spend on exploratory testing method more than spending more time on written test cases for two reasons. Exploratory testing was far more productive in finding bugs and errors in the code, plus I had more confidence in release readiness based on the testers judgement, rather than the quantitative results of test case execution.

My research into exploratory testing lead to several great resources:

Exploratory Testing Explained, James Bach (pdf)
Exploratory Software Testing, James Whittaker (Amazon)
Lessons Learned in Software Testing, Cem Kaner, James Bach, Brent Pettichord (Amazon)

These resources helped our team refine and improve our practices, by learning new techniques and attacks from the authors. Some of these methods are called tours. Our team got better at testing by studying these new tours.

We even found a tool to help manage exploratory testing sessions, called Session-Based Test Management, which could help put a measurement (i.e. test case count) to the testing effort. This measurement could have eased some the questions raised about the quality of our test cases.

All in all, I’m glad that question about the quality of our test cases came up. Our team learned that we were following industry best practices and we learned how to improve our practices.

Root Cause Analysis for Software Problems

It happens. To the best effort of your developers and test team, bugs sometimes escape to customers. Being a quality or test leader, its important to handle these situations in a way to learn and improve. This template for root cause analysis has worked very well to help the team learn about the escape and make improvements

To help guide the investigation, I’ve developed the following set of questions which help guide a comprehensive root cause. These questions help guide the review in more productive direction than finger pointing.

Describe Problem:

Describe symptoms and consequences of the problem.This description should be a summary with enough description for readers to become familiar with the issue.Include a reference to the trouble ticket or other documentation your organizing used to track such issues. Usually the Quality team or leader prepares the description.

Where was the problem introduced?

Describe the phase in which this was introduced, i.e. requirements, design, code, build, etc.

Describe the root cause of the problem:

For the phase where the problem was introduced, describe what actually happened. For example, the requirements could be missing, incorrect, unclear. Designs might not have provided for error handling, or consider the performance requirements. Coding errors may be logic, missing table entry, incorrect logic (“and” instead of “or”), etc.

The root cause is often difficult to determine. One tool to use is the “5-whys” approach. This is where you ask the question “why” 5 times or until you get to the root cause.

In many cases, there will be multiple causes, that combined together, contributed to the escape to production. In these more complex scenarios, the “fishbone diagram” is a useful tool.

Determining the root cause is the hardest part of this process. Often, the people involved are defensive or may not be oriented towards digging for root cause (especially if the problem is already fixed).

Continue reading →

John Ruberto

Software Quality Leader

Category Archives: Software Leadership

What Pokémon Go Teaches Us About SaaS Quality

Girly Code: She++ Documentary

The Value of Exploration in Software Testing

Root Cause Analysis for Software Problems