The paradox of testing is, if testing has such limited value why do companies that produce great software do so much of it? If testing can't really improve quality and doesn't really provide data to decision makers, why do we bother? What is the value of testing?
In my experience, testing is best understood as a finishing activity, a polishing activity. It is the last few coats of clear paint that make a paint job shine. It is running your hand across the surface of a finished product to find any remaining small burrs that need to be sanded over. It is the attention to detail that differentiates value from junk.
This understanding of the purpose of testing unlocks the current trends and debates in the test community. It explains why the traditional practice of testing as a planned, designed, and scripted activity is being replaced by rapid testing techniques. If you have lots of the kinds of bugs that are uncovered by traditional "systemic" testing techniques, then testing can do nothing for you. But, if you have done your development well, the kind of intelligent investigation that characterizes exploratory testing can uncover the problems you have missed.
Is testing dead? If your business model does not require polished products, it may be. When Alberto Savoia or James Whittaker talk about how testing can be replaced by pretotyping, dogfooding, or beta testing, they are really just saying that Google's business model doesn't require polished products. For Google, it is more important to get customer feedback on the software's functionality than it is to get the software just right. And that may be true for some software on the web just as it is surely not for software that, for example, is embedded in a device.
This does suggest that as companies begin to understand the real value of testing, they will have fewer but better testers. And this explains why outsourcing testing has not produced the expected benefits. If you are trying to replace an intelligent exploratory tester with a phalanx of testers for whom testing is the script and nothing but the script, you get little value from it.
Testing, at least the kind of system level testing done by testers after development is "done," is part of the zen in the art of software development. It is the attention to detail that differentiates software that users love from software that merely meets needs.
Tuesday, November 15, 2011
Friday, November 4, 2011
On Testing Will Not Solve Your Quality Problems
I remember a conversation that I had with my Linear Algebra professor when I was in college about his class. He said it was a classic example of a double humped class. For half the class, he could teach the material twice as fast and it wouldn't be a problem. For the other half of the class, however, he could spend twice the amount of time and they still wouldn't succeed.
In economics, they refer to this as multiple equilibria. I believe that software quality has two equilibria. We have understood for a decades now the types of practices that are needed to develop working software. Whether it takes the form of code reviews, pair programming, or a gatekeeping committer, every line of code needs to be seen by multiple pairs of eyes. Whether by convention or test driven means, you need need comprehensive unit tests. Since code evolves, you need automated tests that cover the code to tell you when changes have had unexpected effects.
Creating working software requires the disciplined and diligent use of these techniques. It requires both intention and effort. It is, as a result, subject to Broken Windows effects. When you start to loosen your practices, you send the message that quality is not a goal and create an environment where the number of quality problems begins to escalate.
When managers try to solve quality problems by doing more testing, they are really trying to avoid the cost of reaching the "working software" equilibria. Unfortunately, the same lack of attention to quality that created the problems, sabotages our efforts to fix them as well. Trying to take the cheap way out reinforces the social norm that quality is not really that important for developers. We are simply unable realize a specific level of quality by dialing up or down our quality practices. And testing alone will not enable us to reach the working software equilibria.
Testing will not solve your quality problems.
In economics, they refer to this as multiple equilibria. I believe that software quality has two equilibria. We have understood for a decades now the types of practices that are needed to develop working software. Whether it takes the form of code reviews, pair programming, or a gatekeeping committer, every line of code needs to be seen by multiple pairs of eyes. Whether by convention or test driven means, you need need comprehensive unit tests. Since code evolves, you need automated tests that cover the code to tell you when changes have had unexpected effects.
Creating working software requires the disciplined and diligent use of these techniques. It requires both intention and effort. It is, as a result, subject to Broken Windows effects. When you start to loosen your practices, you send the message that quality is not a goal and create an environment where the number of quality problems begins to escalate.
When managers try to solve quality problems by doing more testing, they are really trying to avoid the cost of reaching the "working software" equilibria. Unfortunately, the same lack of attention to quality that created the problems, sabotages our efforts to fix them as well. Trying to take the cheap way out reinforces the social norm that quality is not really that important for developers. We are simply unable realize a specific level of quality by dialing up or down our quality practices. And testing alone will not enable us to reach the working software equilibria.
Testing will not solve your quality problems.
Wednesday, November 2, 2011
On The Limited Value of Software Testing
In a recent article, Michael Bolton discussed how testing problems like too few testers, too little time for testing, or unstable test builds are really test results. And, while I agree with him, this crystallized a question that I’ve been dancing around for some time now. I’ve been involved with software testing for almost 30 years and the problems Michael mentioned are the same ones we were complaining about 30 years ago. How can this be? Its not as if we don’t know how to do better.
Unlike Dilbert, I can't simply lay all problems at the feet of pointy haired bosses. If the same result happens year after year, across companies, across industries, and across countries, there must be some deeper principle at work. Economics, another of my interests, suggests that that the answer lies in looking at how scarce resources are allocated to produce value. If we consistently fail to apply resources to an activity, perhaps it doesn't really have the value we think it does.
What is the value of software testing? As I have previously discussed, testing has many meanings. While some of these are pretty thought provoking, I believe that the people who hire testers expect their primary value to come from executing the software with the goal of finding bugs. (That many of the alternate meanings appear to have been created by testers to convince management that they provide value in other ways may well prove my point.). A lot of testing is done by a lot of people but I'm talking about the kind of testing done by testers. Testing at the system level with the goal of finding bugs. Testing that is traditionally done at the end of the release cycle although agile groups do a better job distributing it throughout the cycle.
And what is the value of finding bugs? In theory, finding bugs should allow us to deliver better products to our customers or at least make better decisions about those products. In practice, however, we fail to accomplish these goals and even when we do, it is generally not using the bugs we find from testing. And if we don't really benefit from the effort spent finding bugs, perhaps managers are acting rationally, if not always consciously, by not investing more in testing.
How is it that finding bugs fails to result in better products or better decisions? Here lies the heart of the issue.
First, I hope that we all understand that you can't find all the bugs. There is simply no such thing as complete testing. We build complex systems that interact with complex environments. You can't even find all the interesting bugs. To believe otherwise is hubris. The resource constrained environments in which we work force us to make decisions about the testing that we won't do and the bugs that we won't find. If you are in the software business, you are, at least in part, in the business of delivering bugs to your customers.
Your ability to find bugs is even effected by the quality of the software you're testing . When the software is buggy, testing is hard. Builds fail. You start to explore one failure and get sidetracked on a completely different failure. The time spent investigating and reporting bugs eats into the time that was supposed to be spent testing. Even worse, software bugs have the strange property that the more bugs you find, the more bugs remain to be found. Buggy software ends up being tested less effectively simply because it it buggy software.
Finding the bugs doesn't actually improve the software, you also have to fix them. And the kinds of teams that write buggy software are the kinds of teams that have problems fixing them. Finishing the functionality that didn't get completed on schedule takes precedence over fixing bugs. Deeper design and architectural problems are patched over because they would take too long to fix. As the release date nears and the number of bugs mounts, the time spent debugging gets condensed making fixes less likely to work. Eventually, the whole process comes to a halt as we give up and deliver the remaining bugs to our customers.
You can't get working software by starting with buggy software and removing the bugs one by one.
In the end, the quality of the software after testing is much the same as the quality of the software before testing started. Yes, some bugs get fixed before being found by customers and that is a good thing. This explains why we invest in testing at all. But software that was buggy when we started testing is still buggy. when we finish. Good software needs less testing and the testing finds fewer bugs that need to be fixed. For buggy software, testing is less effective and the kinds of teams that write buggy software are the kinds of teams that can't get the bugs fixed. Whether you find only a small number of bugs or you can't fix the ones you do, the value of finding bugs is limited.
The lack of real improvement in the quality of software as a result testing was recognized long ago. So testers changed the goal. Instead of making the software better, testing would allow managers to make better decisions about whether and when the software would be released. Unfortunately, it turns out that decisions aren't really made this way. Personal and organizational considerations trump data in any complex corporate decision. Particularly for the types of organizations that produce software that should be shelved or substantially delayed because of quality problems.
In the typical organization, testing happens near the end of the release cycle. By that time, expectations about the delivery of the software have been set. Managers are rewarded for meeting commitments and making dates. As the testing progresses, the cost to the decision maker of missing the date escalates. Loss aversion and the sunk cost fallacy kick in. It becomes almost impossible to choose not to deliver the software at least not without significant personal risk. Even significant delay has a cost. The inertia of a release date makes it really hard to miss. Its easier to just make the date and plan the maintenance release. And there are significantly fewer consequences.
The whole notion that testing can tell you when software is ready to be released is flawed. It depends on software improving through the testing process and, as a result, being able to project when it will reach some level of quality that we can release. However, software doesn’t really get better through the process of testing. Buggy software never reaches an improved level of quality. The question can’t be when software will reach the level of quality we want to release, but whether we are willing to release the software with the level of quality it has.
We could increase the value of the bugs we find by using them to improve how we develop software. That is usually how teams learn to develop good software. This is one of the key advantages of effective agile teams. It is my experience, however, that the quality of the software directly reflects the well-being of the team that creates it. Teams that create lots of bugs turn out to be the kinds of teams that are unable to learn from them.
In the final analysis, the value we get from finding bugs is limited. Teams that create good software get good value from the bugs they find by removing them and learning from them. But, since there are few bugs to be found, the cost of finding them is expensive and the value is limited. Teams that create buggy software face the opposite problem. Its easy (and cheap) to find bugs, but these kinds of teams are incapable of using them effectively. So the value is limited.
Spending more money on testing won't change that. If you want to deliver good software, you have to write good software. Yes, testers contribute to the quality of the software in more ways than just finding bugs. But, it turns out that finding bugs has only a limited impact on quality. If I asked you to invest in improving the quality of your software, you would almost certainly get better results by improving how you write the software not how you test it.
Tom Gilb calls testing "as a last, desperate attempt to assure quality." Viewed this way, we can reconcile why organizations both invest and under-invest in testing. We can set expectations that are appropriate and can be met. And we can develop ourselves and our profession to best meet those expectations.
Unlike Dilbert, I can't simply lay all problems at the feet of pointy haired bosses. If the same result happens year after year, across companies, across industries, and across countries, there must be some deeper principle at work. Economics, another of my interests, suggests that that the answer lies in looking at how scarce resources are allocated to produce value. If we consistently fail to apply resources to an activity, perhaps it doesn't really have the value we think it does.
What is the value of software testing? As I have previously discussed, testing has many meanings. While some of these are pretty thought provoking, I believe that the people who hire testers expect their primary value to come from executing the software with the goal of finding bugs. (That many of the alternate meanings appear to have been created by testers to convince management that they provide value in other ways may well prove my point.). A lot of testing is done by a lot of people but I'm talking about the kind of testing done by testers. Testing at the system level with the goal of finding bugs. Testing that is traditionally done at the end of the release cycle although agile groups do a better job distributing it throughout the cycle.
And what is the value of finding bugs? In theory, finding bugs should allow us to deliver better products to our customers or at least make better decisions about those products. In practice, however, we fail to accomplish these goals and even when we do, it is generally not using the bugs we find from testing. And if we don't really benefit from the effort spent finding bugs, perhaps managers are acting rationally, if not always consciously, by not investing more in testing.
How is it that finding bugs fails to result in better products or better decisions? Here lies the heart of the issue.
First, I hope that we all understand that you can't find all the bugs. There is simply no such thing as complete testing. We build complex systems that interact with complex environments. You can't even find all the interesting bugs. To believe otherwise is hubris. The resource constrained environments in which we work force us to make decisions about the testing that we won't do and the bugs that we won't find. If you are in the software business, you are, at least in part, in the business of delivering bugs to your customers.
Your ability to find bugs is even effected by the quality of the software you're testing . When the software is buggy, testing is hard. Builds fail. You start to explore one failure and get sidetracked on a completely different failure. The time spent investigating and reporting bugs eats into the time that was supposed to be spent testing. Even worse, software bugs have the strange property that the more bugs you find, the more bugs remain to be found. Buggy software ends up being tested less effectively simply because it it buggy software.
Finding the bugs doesn't actually improve the software, you also have to fix them. And the kinds of teams that write buggy software are the kinds of teams that have problems fixing them. Finishing the functionality that didn't get completed on schedule takes precedence over fixing bugs. Deeper design and architectural problems are patched over because they would take too long to fix. As the release date nears and the number of bugs mounts, the time spent debugging gets condensed making fixes less likely to work. Eventually, the whole process comes to a halt as we give up and deliver the remaining bugs to our customers.
You can't get working software by starting with buggy software and removing the bugs one by one.
In the end, the quality of the software after testing is much the same as the quality of the software before testing started. Yes, some bugs get fixed before being found by customers and that is a good thing. This explains why we invest in testing at all. But software that was buggy when we started testing is still buggy. when we finish. Good software needs less testing and the testing finds fewer bugs that need to be fixed. For buggy software, testing is less effective and the kinds of teams that write buggy software are the kinds of teams that can't get the bugs fixed. Whether you find only a small number of bugs or you can't fix the ones you do, the value of finding bugs is limited.
The lack of real improvement in the quality of software as a result testing was recognized long ago. So testers changed the goal. Instead of making the software better, testing would allow managers to make better decisions about whether and when the software would be released. Unfortunately, it turns out that decisions aren't really made this way. Personal and organizational considerations trump data in any complex corporate decision. Particularly for the types of organizations that produce software that should be shelved or substantially delayed because of quality problems.
In the typical organization, testing happens near the end of the release cycle. By that time, expectations about the delivery of the software have been set. Managers are rewarded for meeting commitments and making dates. As the testing progresses, the cost to the decision maker of missing the date escalates. Loss aversion and the sunk cost fallacy kick in. It becomes almost impossible to choose not to deliver the software at least not without significant personal risk. Even significant delay has a cost. The inertia of a release date makes it really hard to miss. Its easier to just make the date and plan the maintenance release. And there are significantly fewer consequences.
The whole notion that testing can tell you when software is ready to be released is flawed. It depends on software improving through the testing process and, as a result, being able to project when it will reach some level of quality that we can release. However, software doesn’t really get better through the process of testing. Buggy software never reaches an improved level of quality. The question can’t be when software will reach the level of quality we want to release, but whether we are willing to release the software with the level of quality it has.
We could increase the value of the bugs we find by using them to improve how we develop software. That is usually how teams learn to develop good software. This is one of the key advantages of effective agile teams. It is my experience, however, that the quality of the software directly reflects the well-being of the team that creates it. Teams that create lots of bugs turn out to be the kinds of teams that are unable to learn from them.
In the final analysis, the value we get from finding bugs is limited. Teams that create good software get good value from the bugs they find by removing them and learning from them. But, since there are few bugs to be found, the cost of finding them is expensive and the value is limited. Teams that create buggy software face the opposite problem. Its easy (and cheap) to find bugs, but these kinds of teams are incapable of using them effectively. So the value is limited.
Spending more money on testing won't change that. If you want to deliver good software, you have to write good software. Yes, testers contribute to the quality of the software in more ways than just finding bugs. But, it turns out that finding bugs has only a limited impact on quality. If I asked you to invest in improving the quality of your software, you would almost certainly get better results by improving how you write the software not how you test it.
Tom Gilb calls testing "as a last, desperate attempt to assure quality." Viewed this way, we can reconcile why organizations both invest and under-invest in testing. We can set expectations that are appropriate and can be met. And we can develop ourselves and our profession to best meet those expectations.
Subscribe to:
Posts (Atom)