Home United States USA — software Learning From Customer Reported Defects

Learning From Customer Reported Defects

328
0
SHARE

Investigating the root causes of customer reported defects will have great impact on your organization. The best ways to ensure customer satisfaction, lower costs and increase employee engagement is to look inside — you already have the data. At the end, it’s all about continuous improvement.
Software defects found by customers are the most expensive ones. Lots of people are involved in debugging (hard to do in production) , fixing and testing them. All those people also need to get paid, and time and resources need to be allocated away from new feature development. Customer reported defects are also an embarrassment for an organization — after all they have bypassed all the internal defenses. It’s no wonder that software maintenance costs are typically between 40% and 80% of the project cost (according to some studies they may reach up to 90%: How to save on software maintenance costs) , and a big chunk of those expenses is directly related to fixing defects. It’s easy to calculate the exact cost for a bug fix, but one thing that is hard to measure is the reputation loss. Customers will not recommend an app, or will downright trash it because of its bad quality.
Most companies are not investigating the root cause of any defect (even the most expensive ones) . And at Komfo we were no different. We accepted defects as the cost of doing business — never questioning or trying to improve. Since we couldn’ t find any industry data related to customer reported defects to benchmarks with initially, we just wanted to see where we stood.
Here is an example: our customers can report a defect through a number of channels: email, phone call, social media. Of all the reports we get, only 12% end up with actual bug fixes in the code base. The other 88% are also interesting but for other reasons; maybe our product is not intuitive to use, or maybe our customers need more training. 12% of bug fixes — is this good or bad? Until other companies start publishing such data there is no way to know.
A while ago, I read a book called “The Toyota Way to Lean Leadership”. In it, there is a story about how Toyota North America lowered the warranty costs with 60% by investigating the causes and the fixes of vehicle breakages within the warranty period. Inspired to do something similar, we started gathering data to investigate how we could improve.
Data Collection
All of our defects are logged in Jira. The defects are also tagged depending on in which phase they are found — in-house or reported by a customer. We gathered all the defects in the second group, ignoring those that were marked as will not fix, or were considered improvements. We were interested purely in the defects. We started searching in the git log for their Jira ID (we already had a policy to put the Jira ID in the commit message) .
In the end, we found 189 defects and their fixes in the code base, spanning a period of two and a half years. For each defect we gathered more than 40 statistics: when it was reported and fixed, in which part of the application, by what kind of test or technique we could have detected it earlier, what was the size and the complexity in the method/function where the defect was located, and so on (you can check the sanitized version of the stats we collected and use them as a guideline here: goo.gl/3Gdsnm) .
The data collection process was slow as we were gathering everything by hand. We already had our daily work to do and investigating 189 defects, gathering 40+ stats for each of them took us more than six months. Now that we know exactly what we’ re looking for, we’ re automating the tedious data collection.
One of the first things we noticed was that 10% of all the defects we were interested in, were actually not caused by our developers. Our product is SaaS that collects lots of data from the biggest social networks (we make more than 10 million requests a day to Facebook API alone) . Sometimes, the social networks change their APIs with no prior notification, and then our customers notice a defect. All we can do is react and patch our product. We ignore those defects from further analysis as there is no way to notice them early.
Frontend to backend defect ratio was almost 50/50 – we have to pay close attention to both. The backend distribution was interesting though. Almost 2/3 of those defects were in the PHP code, and 1/3 in the Java code. We had PHP backend from the beginning, and since a year and a half ago we started rewriting parts of the backend to Java. So PHP was around for a long time accumulating most of the defects in the two and a half year period we investigated defect.
There are lots of discussions about which programming language causes less defects (e.g. What programming languages generally produce the least buggy code?) We decided to find out empirically for our application – PHP or Java. Only 6% of the defects could have been avoided if the code they were found in was Java in the first place instead of PHP. In the PHP codebase we have lots of places where we don’t know the type of a variable. There are extra variable checks in case it is a string, a date, a number or an object. In the Java codebase we know the variable type and no extra checks are needed (which are potential source of defects) .
However, the 6% ‘Java advantage’ is reduced by the fact that when rewriting parts of the backend from PHP to Java, we simply forgot to include some functionality and this resulted in defects. Also (and we have only anecdotic evidence about this) , the developers feel that they are ‘slower’ developing in Java compared to PHP.
We started investigating customer reported defects two and a half years ago. Back then, our backend (no pun intended) was written 100% in PHP. One year after that we started rewriting parts of it in Java. The new backend went live six months later. We did not immediately see a decrease of incoming defects (see screenshot_9) . Switching from PHP to Java did not automatically mean less defects. We started implementing various other improvements described below and we had to wait six more months until the defects started to decrease. The rewrite was done with the same developers (we have very little turnover) .
What all this means, according to our data, is that in the long run, the quality of a product depends primarily on the developers involved and the process used. Quality depends to a lesser degree on the programming language or frameworks used.
There were three main things that we did not expect and were quite surprising to us.
The first one was that 38% of all the customer reported defects were actually regressions. This means that there was a feature that was working fine, then we made a change in the codebase (for a fix or a new functionality) and then the customers reported that the feature they were using stopped working. We did not detect this in-house. We knew that we had regressions but not that their number was that high. This means that we didn’ t have any sort of automated tests that would act as a detection mechanism to tell us: “that cool new code you just added is working, but it broke an old feature, make sure you go back and fix it before release”. By writing automated tests you kind of cement the feature logic in two places. This is double edge sword though. It is effective in catching regressions but may hinder your ability move fast. Too much failing tests after commit slows you down because you have to fix them too before continuing. It’s a fine balancing act, but for us the pendulum had swung too far to the part where we preferred fast development, so we had to reverse the direction.
The second surprise was the fact that the automation testing pyramid guidelines were not helping us to catch more defects early. Only 13% of the customer reported defects could have been detected early if unit tests were written. Compare this to the 36% ‘yield’ of the API level tests and 21% ‘yield’ of the UI level tests. A diamond shape (the majority of the tests are API level) is better for us than a pyramid. This is due to the nature of our software. It’s SaaS, and the bulk of what we do is gathering lots of data from the internet, then putting it in different databases for analysis later. Most of the defects lay somewhere in the seams of the software. We have more than 19 different services, all talking over the network constantly. The code for those services is in different repositories. It is impossible to test efficiently with unit test only (we consider unit tests to run only in memory, don’ t touch the network, the database of the filesystem; using test doubles if they have to) . And we think that with the rise of the microservices and lambda functions, high level integration tests executed on fully deployed apps will be way more effective in detecting defects than simple unit tests.

Continue reading...