Commentaries: On The Limited Value of Software Testing

In a recent article, Michael Bolton discussed how testing problems like too few testers, too little time for testing, or unstable test builds are really test results. And, while I agree with him, this crystallized a question that I’ve been dancing around for some time now. I’ve been involved with software testing for almost 30 years and the problems Michael mentioned are the same ones we were complaining about 30 years ago. How can this be? Its not as if we don’t know how to do better.

Unlike Dilbert, I can't simply lay all problems at the feet of pointy haired bosses. If the same result happens year after year, across companies, across industries, and across countries, there must be some deeper principle at work. Economics, another of my interests, suggests that that the answer lies in looking at how scarce resources are allocated to produce value. If we consistently fail to apply resources to an activity, perhaps it doesn't really have the value we think it does.

What is the value of software testing? As I have previously discussed, testing has many meanings. While some of these are pretty thought provoking, I believe that the people who hire testers expect their primary value to come from executing the software with the goal of finding bugs. (That many of the alternate meanings appear to have been created by testers to convince management that they provide value in other ways may well prove my point.). A lot of testing is done by a lot of people but I'm talking about the kind of testing done by testers. Testing at the system level with the goal of finding bugs. Testing that is traditionally done at the end of the release cycle although agile groups do a better job distributing it throughout the cycle.

And what is the value of finding bugs? In theory, finding bugs should allow us to deliver better products to our customers or at least make better decisions about those products. In practice, however, we fail to accomplish these goals and even when we do, it is generally not using the bugs we find from testing. And if we don't really benefit from the effort spent finding bugs, perhaps managers are acting rationally, if not always consciously, by not investing more in testing.

How is it that finding bugs fails to result in better products or better decisions? Here lies the heart of the issue.
First, I hope that we all understand that you can't find all the bugs. There is simply no such thing as complete testing. We build complex systems that interact with complex environments. You can't even find all the interesting bugs. To believe otherwise is hubris. The resource constrained environments in which we work force us to make decisions about the testing that we won't do and the bugs that we won't find. If you are in the software business, you are, at least in part, in the business of delivering bugs to your customers.

Your ability to find bugs is even effected by the quality of the software you're testing . When the software is buggy, testing is hard. Builds fail. You start to explore one failure and get sidetracked on a completely different failure. The time spent investigating and reporting bugs eats into the time that was supposed to be spent testing. Even worse, software bugs have the strange property that the more bugs you find, the more bugs remain to be found. Buggy software ends up being tested less effectively simply because it it buggy software.

Finding the bugs doesn't actually improve the software, you also have to fix them. And the kinds of teams that write buggy software are the kinds of teams that have problems fixing them. Finishing the functionality that didn't get completed on schedule takes precedence over fixing bugs. Deeper design and architectural problems are patched over because they would take too long to fix. As the release date nears and the number of bugs mounts, the time spent debugging gets condensed making fixes less likely to work. Eventually, the whole process comes to a halt as we give up and deliver the remaining bugs to our customers.

You can't get working software by starting with buggy software and removing the bugs one by one.

In the end, the quality of the software after testing is much the same as the quality of the software before testing started. Yes, some bugs get fixed before being found by customers and that is a good thing. This explains why we invest in testing at all. But software that was buggy when we started testing is still buggy. when we finish. Good software needs less testing and the testing finds fewer bugs that need to be fixed. For buggy software, testing is less effective and the kinds of teams that write buggy software are the kinds of teams that can't get the bugs fixed. Whether you find only a small number of bugs or you can't fix the ones you do, the value of finding bugs is limited.

The lack of real improvement in the quality of software as a result testing was recognized long ago. So testers changed the goal. Instead of making the software better, testing would allow managers to make better decisions about whether and when the software would be released. Unfortunately, it turns out that decisions aren't really made this way. Personal and organizational considerations trump data in any complex corporate decision. Particularly for the types of organizations that produce software that should be shelved or substantially delayed because of quality problems.

In the typical organization, testing happens near the end of the release cycle. By that time, expectations about the delivery of the software have been set. Managers are rewarded for meeting commitments and making dates. As the testing progresses, the cost to the decision maker of missing the date escalates. Loss aversion and the sunk cost fallacy kick in. It becomes almost impossible to choose not to deliver the software at least not without significant personal risk. Even significant delay has a cost. The inertia of a release date makes it really hard to miss. Its easier to just make the date and plan the maintenance release. And there are significantly fewer consequences.

The whole notion that testing can tell you when software is ready to be released is flawed. It depends on software improving through the testing process and, as a result, being able to project when it will reach some level of quality that we can release. However, software doesn’t really get better through the process of testing. Buggy software never reaches an improved level of quality. The question can’t be when software will reach the level of quality we want to release, but whether we are willing to release the software with the level of quality it has.

We could increase the value of the bugs we find by using them to improve how we develop software. That is usually how teams learn to develop good software. This is one of the key advantages of effective agile teams. It is my experience, however, that the quality of the software directly reflects the well-being of the team that creates it. Teams that create lots of bugs turn out to be the kinds of teams that are unable to learn from them.

In the final analysis, the value we get from finding bugs is limited. Teams that create good software get good value from the bugs they find by removing them and learning from them. But, since there are few bugs to be found, the cost of finding them is expensive and the value is limited. Teams that create buggy software face the opposite problem. Its easy (and cheap) to find bugs, but these kinds of teams are incapable of using them effectively. So the value is limited.

Spending more money on testing won't change that. If you want to deliver good software, you have to write good software. Yes, testers contribute to the quality of the software in more ways than just finding bugs. But, it turns out that finding bugs has only a limited impact on quality. If I asked you to invest in improving the quality of your software, you would almost certainly get better results by improving how you write the software not how you test it.

Tom Gilb calls testing "as a last, desperate attempt to assure quality." Viewed this way, we can reconcile why organizations both invest and under-invest in testing. We can set expectations that are appropriate and can be met. And we can develop ourselves and our profession to best meet those expectations.

Commentaries

Wednesday, November 2, 2011

On The Limited Value of Software Testing

No comments:

Post a Comment