Commentaries

Tuesday, November 15, 2011

On Why Are We Testing Anyway

The paradox of testing is, if testing has such limited value why do companies that produce great software do so much of it? If testing can't really improve quality and doesn't really provide data to decision makers, why do we bother? What is the value of testing?

In my experience, testing is best understood as a finishing activity, a polishing activity. It is the last few coats of clear paint that make a paint job shine. It is running your hand across the surface of a finished product to find any remaining small burrs that need to be sanded over. It is the attention to detail that differentiates value from junk.

This understanding of the purpose of testing unlocks the current trends and debates in the test community. It explains why the traditional practice of testing as a planned, designed, and scripted activity is being replaced by rapid testing techniques. If you have lots of the kinds of bugs that are uncovered by traditional "systemic" testing techniques, then testing can do nothing for you. But, if you have done your development well, the kind of intelligent investigation that characterizes exploratory testing can uncover the problems you have missed.

Is testing dead? If your business model does not require polished products, it may be. When Alberto Savoia or James Whittaker talk about how testing can be replaced by pretotyping, dogfooding, or beta testing, they are really just saying that Google's business model doesn't require polished products. For Google, it is more important to get customer feedback on the software's functionality than it is to get the software just right. And that may be true for some software on the web just as it is surely not for software that, for example, is embedded in a device.

This does suggest that as companies begin to understand the real value of testing, they will have fewer but better testers. And this explains why outsourcing testing has not produced the expected benefits. If you are trying to replace an intelligent exploratory tester with a phalanx of testers for whom testing is the script and nothing but the script, you get little value from it.

Testing, at least the kind of system level testing done by testers after development is "done," is part of the zen in the art of software development. It is the attention to detail that differentiates software that users love from software that merely meets needs.

Friday, November 4, 2011

On Testing Will Not Solve Your Quality Problems

I remember a conversation that I had with my Linear Algebra professor when I was in college about his class. He said it was a classic example of a double humped class. For half the class, he could teach the material twice as fast and it wouldn't be a problem. For the other half of the class, however, he could spend twice the amount of time and they still wouldn't succeed.

In economics, they refer to this as multiple equilibria. I believe that software quality has two equilibria. We have understood for a decades now the types of practices that are needed to develop working software. Whether it takes the form of code reviews, pair programming, or a gatekeeping committer, every line of code needs to be seen by multiple pairs of eyes. Whether by convention or test driven means, you need need comprehensive unit tests. Since code evolves, you need automated tests that cover the code to tell you when changes have had unexpected effects.

Creating working software requires the disciplined and diligent use of these techniques. It requires both intention and effort. It is, as a result, subject to Broken Windows effects. When you start to loosen your practices, you send the message that quality is not a goal and create an environment where the number of quality problems begins to escalate.

When managers try to solve quality problems by doing more testing, they are really trying to avoid the cost of reaching the "working software" equilibria. Unfortunately, the same lack of attention to quality that created the problems, sabotages our efforts to fix them as well. Trying to take the cheap way out reinforces the social norm that quality is not really that important for developers. We are simply unable realize a specific level of quality by dialing up or down our quality practices. And testing alone will not enable us to reach the working software equilibria.

Testing will not solve your quality problems.

Wednesday, November 2, 2011

On The Limited Value of Software Testing

In a recent article, Michael Bolton discussed how testing problems like too few testers, too little time for testing, or unstable test builds are really test results. And, while I agree with him, this crystallized a question that I’ve been dancing around for some time now. I’ve been involved with software testing for almost 30 years and the problems Michael mentioned are the same ones we were complaining about 30 years ago. How can this be? Its not as if we don’t know how to do better.

Unlike Dilbert, I can't simply lay all problems at the feet of pointy haired bosses. If the same result happens year after year, across companies, across industries, and across countries, there must be some deeper principle at work. Economics, another of my interests, suggests that that the answer lies in looking at how scarce resources are allocated to produce value. If we consistently fail to apply resources to an activity, perhaps it doesn't really have the value we think it does.

What is the value of software testing? As I have previously discussed, testing has many meanings. While some of these are pretty thought provoking, I believe that the people who hire testers expect their primary value to come from executing the software with the goal of finding bugs. (That many of the alternate meanings appear to have been created by testers to convince management that they provide value in other ways may well prove my point.). A lot of testing is done by a lot of people but I'm talking about the kind of testing done by testers. Testing at the system level with the goal of finding bugs. Testing that is traditionally done at the end of the release cycle although agile groups do a better job distributing it throughout the cycle.

And what is the value of finding bugs? In theory, finding bugs should allow us to deliver better products to our customers or at least make better decisions about those products. In practice, however, we fail to accomplish these goals and even when we do, it is generally not using the bugs we find from testing. And if we don't really benefit from the effort spent finding bugs, perhaps managers are acting rationally, if not always consciously, by not investing more in testing.

How is it that finding bugs fails to result in better products or better decisions? Here lies the heart of the issue.
First, I hope that we all understand that you can't find all the bugs. There is simply no such thing as complete testing. We build complex systems that interact with complex environments. You can't even find all the interesting bugs. To believe otherwise is hubris. The resource constrained environments in which we work force us to make decisions about the testing that we won't do and the bugs that we won't find. If you are in the software business, you are, at least in part, in the business of delivering bugs to your customers.

Your ability to find bugs is even effected by the quality of the software you're testing . When the software is buggy, testing is hard. Builds fail. You start to explore one failure and get sidetracked on a completely different failure. The time spent investigating and reporting bugs eats into the time that was supposed to be spent testing. Even worse, software bugs have the strange property that the more bugs you find, the more bugs remain to be found. Buggy software ends up being tested less effectively simply because it it buggy software.

Finding the bugs doesn't actually improve the software, you also have to fix them. And the kinds of teams that write buggy software are the kinds of teams that have problems fixing them. Finishing the functionality that didn't get completed on schedule takes precedence over fixing bugs. Deeper design and architectural problems are patched over because they would take too long to fix. As the release date nears and the number of bugs mounts, the time spent debugging gets condensed making fixes less likely to work. Eventually, the whole process comes to a halt as we give up and deliver the remaining bugs to our customers.

You can't get working software by starting with buggy software and removing the bugs one by one.

In the end, the quality of the software after testing is much the same as the quality of the software before testing started. Yes, some bugs get fixed before being found by customers and that is a good thing. This explains why we invest in testing at all. But software that was buggy when we started testing is still buggy. when we finish. Good software needs less testing and the testing finds fewer bugs that need to be fixed. For buggy software, testing is less effective and the kinds of teams that write buggy software are the kinds of teams that can't get the bugs fixed. Whether you find only a small number of bugs or you can't fix the ones you do, the value of finding bugs is limited.

The lack of real improvement in the quality of software as a result testing was recognized long ago. So testers changed the goal. Instead of making the software better, testing would allow managers to make better decisions about whether and when the software would be released. Unfortunately, it turns out that decisions aren't really made this way. Personal and organizational considerations trump data in any complex corporate decision. Particularly for the types of organizations that produce software that should be shelved or substantially delayed because of quality problems.

In the typical organization, testing happens near the end of the release cycle. By that time, expectations about the delivery of the software have been set. Managers are rewarded for meeting commitments and making dates. As the testing progresses, the cost to the decision maker of missing the date escalates. Loss aversion and the sunk cost fallacy kick in. It becomes almost impossible to choose not to deliver the software at least not without significant personal risk. Even significant delay has a cost. The inertia of a release date makes it really hard to miss. Its easier to just make the date and plan the maintenance release. And there are significantly fewer consequences.

The whole notion that testing can tell you when software is ready to be released is flawed. It depends on software improving through the testing process and, as a result, being able to project when it will reach some level of quality that we can release. However, software doesn’t really get better through the process of testing. Buggy software never reaches an improved level of quality. The question can’t be when software will reach the level of quality we want to release, but whether we are willing to release the software with the level of quality it has.

We could increase the value of the bugs we find by using them to improve how we develop software. That is usually how teams learn to develop good software. This is one of the key advantages of effective agile teams. It is my experience, however, that the quality of the software directly reflects the well-being of the team that creates it. Teams that create lots of bugs turn out to be the kinds of teams that are unable to learn from them.

In the final analysis, the value we get from finding bugs is limited. Teams that create good software get good value from the bugs they find by removing them and learning from them. But, since there are few bugs to be found, the cost of finding them is expensive and the value is limited. Teams that create buggy software face the opposite problem. Its easy (and cheap) to find bugs, but these kinds of teams are incapable of using them effectively. So the value is limited.

Spending more money on testing won't change that. If you want to deliver good software, you have to write good software. Yes, testers contribute to the quality of the software in more ways than just finding bugs. But, it turns out that finding bugs has only a limited impact on quality. If I asked you to invest in improving the quality of your software, you would almost certainly get better results by improving how you write the software not how you test it.

Tom Gilb calls testing "as a last, desperate attempt to assure quality." Viewed this way, we can reconcile why organizations both invest and under-invest in testing. We can set expectations that are appropriate and can be met. And we can develop ourselves and our profession to best meet those expectations.

Tuesday, October 25, 2011

On "Why didn't you find this bug?"

The question "Why didn't you find this bug?" is an organizational smell. It reflects fundamental problems with way your organization thinks about testing and quality.

First, as Dawn Haynes points out, this isn't a "you" question, it is a "we" question. The "you" divides the team. A team that is functioning well, has collective ownership for delivering code that works and for the techniques used to remove bugs. When you ask why "We didn't find this bug?", you open up the range of possible answers.

The question is usually asked in the context of "Why didn't you (tester) find this bug (when doing the feature/system/acceptance testing of the software)? Notice the embedded assumption that regardless of how the software is written, by testing at the end, we should be able to remove all bugs. If you expect to remove all problems by testing, you will be sadly mistaken. Even when the test you planned should have caught the bug, the execution often does not go as planned because the software was delivered late or wasn't really ready to test. Its a classic case of an unrealistic expectation that we usually undermine anyway.

But the real problem is that the question"Why didn't you find this bug?" is usually asked to avoid having to answer the question that really matters, "Why did we write the bug in the first place?" The answer to this question requires reflection and creates the responsibility to learn which is, perhaps, why organizations would choose to avoid it. Better to imply blame and make you-know-who get better.

Until we really understand how we created the bug in the first place, we can't answer the question that really matters, "What is the cheapest thing we can do to prevent us from delivering software with this bug again?" Since testing is one of the most expensive and least reliable of the techniques we use to remove bugs, the answer often lies elsewhere. If you don't take steps to prevent the bug from being written in the future, you are giving permission to make it again. And since we shouldn't count on testing to find it in all its future occurrences, we are giving permission to deliver it again as well.

Friday, October 14, 2011

On Bugs

Another debate in the Test community is what the appropriate response should be to bugs found during testing. One camp believes that not all bugs should necessarily be fixed. As bugs are filed, some representative of the business (product owner, product manager, etc.) prioritizes them and determines which bugs should be fixed. In many organizations, this is accompanied by test exit criteria expressed in terms of bugs that remain open: no high priority bugs, and some number of of lower priority bugs. This is also often accompanied by a queue of bugs to be fixed due to the delay introduced to prioritizing them.

A different school believes that all bugs should be fixed. Immediately. When a bug comes in, work on some new feature stops and a developer is assigned to fix the bug. In this way, software is ready to release when the functionality is done. This paragraph is shorter, because the rule is a simpler even though it usually produces the response that you can't possibly fix all your bugs when you develop software.

Finally, the agilists seems to straddle both camps. Produce no bugs but let the business prioritize the ones you do.

Personally, I believe that no bugs is the most responsible position. First, let me be clear, when I talk about bugs, I mean the kinds of issues where the software does not do what it is supposed to do or fails in ways that the user would consider an error - like going down or destroying data. Software has lots of other kinds of issues. Sometimes it does what is it supposed to do but what it is supposed to do is not actually useful. Sometimes, it does what it is supposed to do, but in too complicated a way. Bugs, however, are developer errors that should be fixed.

When you hand over the responsibility for determining whether a bug gets fixed to the business, you assume that the incorrect behavior is the only reason to fix a but. Its not. One thing we know about bugs is that they cluster. Finding one increases the likelihood that there are others that we haven't found yet. And these other bugs may be worse than the one we found. Counting on further testing to find them? That's just playing Russian roulette. I've tried one chamber and its empty and then another and its empty, that must mean all the chambers are empty, right? You can't prioritize bugs by their symptoms, you need to understand their causes. Which means you need to do the hard bit anyway, you need to debug the bug. Every bug.

Debugging bugs has another benefit, it teaches developers how to stop coding them. It is a strange property of software development that while we try to avoid coding the same solution twice, we use the same programming patterns over and over again. The consequence of this is that any mistake a programmer makes is likely to be repeated over and over again. We found this bug, won't testing find all those others? That's just another game of Russian roulette. You can't find all the bugs, you can't even find all the important ones. So you had better learn to stop creating them.

Finally, a bug is a broken window. When you don't fix it, when you have other bugs that you also don't fix, you are creating a social norm that bugs aren't all that important. That working software is not all that important. And when you tell developers that working software is not important, you damage their morale and effectiveness as a team. I do not believe that this is a choice that the business gets to make. Agile, at least Scrum and XP, make the distinction between decisions that the business gets to make and decisions that engineering gets to make. The business does not get to make decisions about the engineering process and question of whether or not to fix bugs is an engineering process decision. It makes no more sense to have the business choose which bugs to fix than it does to have the business choose whether to do test driven design or choose which code review issues to address,

That's fine in theory, you may say, but it can't work in practice. Let me ask you this. I often hear the complaint that there aren't enough test resources. Most companies have one tester for every 3, 4, or more developers. How is it that one testers is finding more bugs than those 3, 4, or more developers can fix? Could it be because the software contains too many bugs that are too easily found? Not being able to fix all the bugs that testers find reflects deeper problems on the team. Prioritizing bugs won't fix those problems, although it may allow them to linger. It is a crutch that teams use to avoid having to improve.

You may think that you can't commit to fixing all the bugs. Others in the industry would disagree. Jeff McKenna, one of the founders of Scrum discussed fixing all bugs in a recent video interview. Joel on Software discussed Microsoft adopting a zero defects methodology in his 12 Steps to Better Code. In "The Art of Agile Development," James Shore discusses several XP projects that adopted a no bugs philosophy. If these teams can do it, so can yours.

Monday, October 10, 2011

On The Many Meanings of Testing

I have long been frustrated by the many different meanings that people have for the word testing. The past few days have added several more. These differences in understanding add to the adamance of our positions and occasional rancor of our discussions. So, for discussions on this blog at least, I wanted to set down the definition of testing that I use.

One of the definitions of testing that I learned recently was that (and I'm paraphrasing here because I don't remember the exact wording) testing includes any inquiry that we make that gives us information about the product. For clarification, I asked "does that include code reviews." "Yes" was the answer. "OK, how about attending a staff meeting?" "If it tells you something about the product." While I appreciate the attention on larger quality issues, I think there is value in distinguishing between the act and impact of inquiring through the execution of the software and other forms of inquiry. The power of techniques like exploratory testing come from running the software and looking at and being effected by the results.

If testing involves executing software, does that mean that any time you are executing software (before release at least) that you are testing? In one sense, certainly. But, again, I think this serves only to muddy the issue. Some characteristics of a system simply cannot be engineered without executing the system. All the modeling in the world won't identify all the bottlenecks in your system. Usability, too, can only be achieved through a process of trial and improvement. There are many organizations where these kinds of efforts are done by specialist engineers and not by those having the role of tester.

The testing that we increasingly do to prevent errors falls into this category as well. Test Driven Design and Acceptance Test Driven design are great methods for engineering software (in the small and large) that does what it is supposed to do. But when you introduce these topics to testers, you meet a great deal of resistance. Certainly not because the methods don't improve quality. We can all agree that they do. It seems to me, that the more likely reason is because these methods simply don't accomplish the ends that testers believe they are responsible for.

Enough already,The definition of testing I use is this: testing is the act of executing software in order to find bugs. This is the essentially the definition that Glen Myers gave us many years ago and is the definitions that I believe would find the greatest degree of acceptance among test practitioners. We test to find bugs and these bugs are a big part of our value.

That's not do say that bugs are our only value. In a previous post, I discussed the notion of contrarianism. Every team needs a skeptic to puncture the generally optimistic group think that infects teams. We also provide additional perspective on the value of the functionality that is being implemented and the usability of that implementation. We make teams think about what they are doing before they rush headlong into implementation. We do all these things, but they are not the act of testing that defines us.

I hope this helps to make sense of my ramblings and establishes a foundation for future conversations.

Sunday, October 9, 2011

On Buts

noun

a person who takes an opposing view, especially one who rejectsthe majority opinion
[dictionary.com]

'But' is the contrarian word in the English language. No matter what has come before, you know that once you hear 'but', you are about to hear an opposing view. And that opposing view is even more important than the original view that it stands in opposition to.

I recognize the power of 'but' because I have a wide streak of contrarianism. I don't accept ideas, I wrestle with them. If Jacob got a cool new name by wrestling with God, I always feel like I should get a cool new name when I wrestle with an idea. I recognize this streak in others as well. Have you ever had a discussion with someone where it seemed that no matter what you said, their response was to find some point to challenge or some nit to pick? Bingo, contrarian. Progress results from contrarians. They make ideas stronger, more resilient.

Testers are contrarians. We get paid for it. When a developer says "this works," we say "but what about?" This process of applied contrarianism makes software better. It hardens the software, makes it more resilient. Without contrarians on the team, the software would never reach its full potential.

But (there's that word), if you love your children and want them to grow up happy and socially well adjusted, don't let them be contrarians. Teach them the word 'and' instead. 'And' is the ultimate social word. 'And' allows us to work together in harmony to construct a shared idea. So remember to teach your children, don't be a 'but' be an 'and' instead.