Tuesday, December 6, 2011
Tuesday, November 15, 2011
In my experience, testing is best understood as a finishing activity, a polishing activity. It is the last few coats of clear paint that make a paint job shine. It is running your hand across the surface of a finished product to find any remaining small burrs that need to be sanded over. It is the attention to detail that differentiates value from junk.
This understanding of the purpose of testing unlocks the current trends and debates in the test community. It explains why the traditional practice of testing as a planned, designed, and scripted activity is being replaced by rapid testing techniques. If you have lots of the kinds of bugs that are uncovered by traditional "systemic" testing techniques, then testing can do nothing for you. But, if you have done your development well, the kind of intelligent investigation that characterizes exploratory testing can uncover the problems you have missed.
Is testing dead? If your business model does not require polished products, it may be. When Alberto Savoia or James Whittaker talk about how testing can be replaced by pretotyping, dogfooding, or beta testing, they are really just saying that Google's business model doesn't require polished products. For Google, it is more important to get customer feedback on the software's functionality than it is to get the software just right. And that may be true for some software on the web just as it is surely not for software that, for example, is embedded in a device.
This does suggest that as companies begin to understand the real value of testing, they will have fewer but better testers. And this explains why outsourcing testing has not produced the expected benefits. If you are trying to replace an intelligent exploratory tester with a phalanx of testers for whom testing is the script and nothing but the script, you get little value from it.
Testing, at least the kind of system level testing done by testers after development is "done," is part of the zen in the art of software development. It is the attention to detail that differentiates software that users love from software that merely meets needs.
Friday, November 4, 2011
In economics, they refer to this as multiple equilibria. I believe that software quality has two equilibria. We have understood for a decades now the types of practices that are needed to develop working software. Whether it takes the form of code reviews, pair programming, or a gatekeeping committer, every line of code needs to be seen by multiple pairs of eyes. Whether by convention or test driven means, you need need comprehensive unit tests. Since code evolves, you need automated tests that cover the code to tell you when changes have had unexpected effects.
Creating working software requires the disciplined and diligent use of these techniques. It requires both intention and effort. It is, as a result, subject to Broken Windows effects. When you start to loosen your practices, you send the message that quality is not a goal and create an environment where the number of quality problems begins to escalate.
When managers try to solve quality problems by doing more testing, they are really trying to avoid the cost of reaching the "working software" equilibria. Unfortunately, the same lack of attention to quality that created the problems, sabotages our efforts to fix them as well. Trying to take the cheap way out reinforces the social norm that quality is not really that important for developers. We are simply unable realize a specific level of quality by dialing up or down our quality practices. And testing alone will not enable us to reach the working software equilibria.
Testing will not solve your quality problems.
Wednesday, November 2, 2011
Unlike Dilbert, I can't simply lay all problems at the feet of pointy haired bosses. If the same result happens year after year, across companies, across industries, and across countries, there must be some deeper principle at work. Economics, another of my interests, suggests that that the answer lies in looking at how scarce resources are allocated to produce value. If we consistently fail to apply resources to an activity, perhaps it doesn't really have the value we think it does.
What is the value of software testing? As I have previously discussed, testing has many meanings. While some of these are pretty thought provoking, I believe that the people who hire testers expect their primary value to come from executing the software with the goal of finding bugs. (That many of the alternate meanings appear to have been created by testers to convince management that they provide value in other ways may well prove my point.). A lot of testing is done by a lot of people but I'm talking about the kind of testing done by testers. Testing at the system level with the goal of finding bugs. Testing that is traditionally done at the end of the release cycle although agile groups do a better job distributing it throughout the cycle.
And what is the value of finding bugs? In theory, finding bugs should allow us to deliver better products to our customers or at least make better decisions about those products. In practice, however, we fail to accomplish these goals and even when we do, it is generally not using the bugs we find from testing. And if we don't really benefit from the effort spent finding bugs, perhaps managers are acting rationally, if not always consciously, by not investing more in testing.
How is it that finding bugs fails to result in better products or better decisions? Here lies the heart of the issue.
First, I hope that we all understand that you can't find all the bugs. There is simply no such thing as complete testing. We build complex systems that interact with complex environments. You can't even find all the interesting bugs. To believe otherwise is hubris. The resource constrained environments in which we work force us to make decisions about the testing that we won't do and the bugs that we won't find. If you are in the software business, you are, at least in part, in the business of delivering bugs to your customers.
Your ability to find bugs is even effected by the quality of the software you're testing . When the software is buggy, testing is hard. Builds fail. You start to explore one failure and get sidetracked on a completely different failure. The time spent investigating and reporting bugs eats into the time that was supposed to be spent testing. Even worse, software bugs have the strange property that the more bugs you find, the more bugs remain to be found. Buggy software ends up being tested less effectively simply because it it buggy software.
Finding the bugs doesn't actually improve the software, you also have to fix them. And the kinds of teams that write buggy software are the kinds of teams that have problems fixing them. Finishing the functionality that didn't get completed on schedule takes precedence over fixing bugs. Deeper design and architectural problems are patched over because they would take too long to fix. As the release date nears and the number of bugs mounts, the time spent debugging gets condensed making fixes less likely to work. Eventually, the whole process comes to a halt as we give up and deliver the remaining bugs to our customers.
You can't get working software by starting with buggy software and removing the bugs one by one.
In the end, the quality of the software after testing is much the same as the quality of the software before testing started. Yes, some bugs get fixed before being found by customers and that is a good thing. This explains why we invest in testing at all. But software that was buggy when we started testing is still buggy. when we finish. Good software needs less testing and the testing finds fewer bugs that need to be fixed. For buggy software, testing is less effective and the kinds of teams that write buggy software are the kinds of teams that can't get the bugs fixed. Whether you find only a small number of bugs or you can't fix the ones you do, the value of finding bugs is limited.
The lack of real improvement in the quality of software as a result testing was recognized long ago. So testers changed the goal. Instead of making the software better, testing would allow managers to make better decisions about whether and when the software would be released. Unfortunately, it turns out that decisions aren't really made this way. Personal and organizational considerations trump data in any complex corporate decision. Particularly for the types of organizations that produce software that should be shelved or substantially delayed because of quality problems.
In the typical organization, testing happens near the end of the release cycle. By that time, expectations about the delivery of the software have been set. Managers are rewarded for meeting commitments and making dates. As the testing progresses, the cost to the decision maker of missing the date escalates. Loss aversion and the sunk cost fallacy kick in. It becomes almost impossible to choose not to deliver the software at least not without significant personal risk. Even significant delay has a cost. The inertia of a release date makes it really hard to miss. Its easier to just make the date and plan the maintenance release. And there are significantly fewer consequences.
The whole notion that testing can tell you when software is ready to be released is flawed. It depends on software improving through the testing process and, as a result, being able to project when it will reach some level of quality that we can release. However, software doesn’t really get better through the process of testing. Buggy software never reaches an improved level of quality. The question can’t be when software will reach the level of quality we want to release, but whether we are willing to release the software with the level of quality it has.
We could increase the value of the bugs we find by using them to improve how we develop software. That is usually how teams learn to develop good software. This is one of the key advantages of effective agile teams. It is my experience, however, that the quality of the software directly reflects the well-being of the team that creates it. Teams that create lots of bugs turn out to be the kinds of teams that are unable to learn from them.
In the final analysis, the value we get from finding bugs is limited. Teams that create good software get good value from the bugs they find by removing them and learning from them. But, since there are few bugs to be found, the cost of finding them is expensive and the value is limited. Teams that create buggy software face the opposite problem. Its easy (and cheap) to find bugs, but these kinds of teams are incapable of using them effectively. So the value is limited.
Spending more money on testing won't change that. If you want to deliver good software, you have to write good software. Yes, testers contribute to the quality of the software in more ways than just finding bugs. But, it turns out that finding bugs has only a limited impact on quality. If I asked you to invest in improving the quality of your software, you would almost certainly get better results by improving how you write the software not how you test it.
Tom Gilb calls testing "as a last, desperate attempt to assure quality." Viewed this way, we can reconcile why organizations both invest and under-invest in testing. We can set expectations that are appropriate and can be met. And we can develop ourselves and our profession to best meet those expectations.
Tuesday, October 25, 2011
First, as Dawn Haynes points out, this isn't a "you" question, it is a "we" question. The "you" divides the team. A team that is functioning well, has collective ownership for delivering code that works and for the techniques used to remove bugs. When you ask why "We didn't find this bug?", you open up the range of possible answers.
The question is usually asked in the context of "Why didn't you (tester) find this bug (when doing the feature/system/acceptance testing of the software)? Notice the embedded assumption that regardless of how the software is written, by testing at the end, we should be able to remove all bugs. If you expect to remove all problems by testing, you will be sadly mistaken. Even when the test you planned should have caught the bug, the execution often does not go as planned because the software was delivered late or wasn't really ready to test. Its a classic case of an unrealistic expectation that we usually undermine anyway.
But the real problem is that the question"Why didn't you find this bug?" is usually asked to avoid having to answer the question that really matters, "Why did we write the bug in the first place?" The answer to this question requires reflection and creates the responsibility to learn which is, perhaps, why organizations would choose to avoid it. Better to imply blame and make you-know-who get better.
Until we really understand how we created the bug in the first place, we can't answer the question that really matters, "What is the cheapest thing we can do to prevent us from delivering software with this bug again?" Since testing is one of the most expensive and least reliable of the techniques we use to remove bugs, the answer often lies elsewhere. If you don't take steps to prevent the bug from being written in the future, you are giving permission to make it again. And since we shouldn't count on testing to find it in all its future occurrences, we are giving permission to deliver it again as well.
Friday, October 14, 2011
A different school believes that all bugs should be fixed. Immediately. When a bug comes in, work on some new feature stops and a developer is assigned to fix the bug. In this way, software is ready to release when the functionality is done. This paragraph is shorter, because the rule is a simpler even though it usually produces the response that you can't possibly fix all your bugs when you develop software.
Finally, the agilists seems to straddle both camps. Produce no bugs but let the business prioritize the ones you do.
Personally, I believe that no bugs is the most responsible position. First, let me be clear, when I talk about bugs, I mean the kinds of issues where the software does not do what it is supposed to do or fails in ways that the user would consider an error - like going down or destroying data. Software has lots of other kinds of issues. Sometimes it does what is it supposed to do but what it is supposed to do is not actually useful. Sometimes, it does what it is supposed to do, but in too complicated a way. Bugs, however, are developer errors that should be fixed.
When you hand over the responsibility for determining whether a bug gets fixed to the business, you assume that the incorrect behavior is the only reason to fix a but. Its not. One thing we know about bugs is that they cluster. Finding one increases the likelihood that there are others that we haven't found yet. And these other bugs may be worse than the one we found. Counting on further testing to find them? That's just playing Russian roulette. I've tried one chamber and its empty and then another and its empty, that must mean all the chambers are empty, right? You can't prioritize bugs by their symptoms, you need to understand their causes. Which means you need to do the hard bit anyway, you need to debug the bug. Every bug.
Debugging bugs has another benefit, it teaches developers how to stop coding them. It is a strange property of software development that while we try to avoid coding the same solution twice, we use the same programming patterns over and over again. The consequence of this is that any mistake a programmer makes is likely to be repeated over and over again. We found this bug, won't testing find all those others? That's just another game of Russian roulette. You can't find all the bugs, you can't even find all the important ones. So you had better learn to stop creating them.
Finally, a bug is a broken window. When you don't fix it, when you have other bugs that you also don't fix, you are creating a social norm that bugs aren't all that important. That working software is not all that important. And when you tell developers that working software is not important, you damage their morale and effectiveness as a team. I do not believe that this is a choice that the business gets to make. Agile, at least Scrum and XP, make the distinction between decisions that the business gets to make and decisions that engineering gets to make. The business does not get to make decisions about the engineering process and question of whether or not to fix bugs is an engineering process decision. It makes no more sense to have the business choose which bugs to fix than it does to have the business choose whether to do test driven design or choose which code review issues to address,
That's fine in theory, you may say, but it can't work in practice. Let me ask you this. I often hear the complaint that there aren't enough test resources. Most companies have one tester for every 3, 4, or more developers. How is it that one testers is finding more bugs than those 3, 4, or more developers can fix? Could it be because the software contains too many bugs that are too easily found? Not being able to fix all the bugs that testers find reflects deeper problems on the team. Prioritizing bugs won't fix those problems, although it may allow them to linger. It is a crutch that teams use to avoid having to improve.
You may think that you can't commit to fixing all the bugs. Others in the industry would disagree. Jeff McKenna, one of the founders of Scrum discussed fixing all bugs in a recent video interview. Joel on Software discussed Microsoft adopting a zero defects methodology in his 12 Steps to Better Code. In "The Art of Agile Development," James Shore discusses several XP projects that adopted a no bugs philosophy. If these teams can do it, so can yours.
Monday, October 10, 2011
One of the definitions of testing that I learned recently was that (and I'm paraphrasing here because I don't remember the exact wording) testing includes any inquiry that we make that gives us information about the product. For clarification, I asked "does that include code reviews." "Yes" was the answer. "OK, how about attending a staff meeting?" "If it tells you something about the product." While I appreciate the attention on larger quality issues, I think there is value in distinguishing between the act and impact of inquiring through the execution of the software and other forms of inquiry. The power of techniques like exploratory testing come from running the software and looking at and being effected by the results.
If testing involves executing software, does that mean that any time you are executing software (before release at least) that you are testing? In one sense, certainly. But, again, I think this serves only to muddy the issue. Some characteristics of a system simply cannot be engineered without executing the system. All the modeling in the world won't identify all the bottlenecks in your system. Usability, too, can only be achieved through a process of trial and improvement. There are many organizations where these kinds of efforts are done by specialist engineers and not by those having the role of tester.
The testing that we increasingly do to prevent errors falls into this category as well. Test Driven Design and Acceptance Test Driven design are great methods for engineering software (in the small and large) that does what it is supposed to do. But when you introduce these topics to testers, you meet a great deal of resistance. Certainly not because the methods don't improve quality. We can all agree that they do. It seems to me, that the more likely reason is because these methods simply don't accomplish the ends that testers believe they are responsible for.
Enough already,The definition of testing I use is this: testing is the act of executing software in order to find bugs. This is the essentially the definition that Glen Myers gave us many years ago and is the definitions that I believe would find the greatest degree of acceptance among test practitioners. We test to find bugs and these bugs are a big part of our value.
That's not do say that bugs are our only value. In a previous post, I discussed the notion of contrarianism. Every team needs a skeptic to puncture the generally optimistic group think that infects teams. We also provide additional perspective on the value of the functionality that is being implemented and the usability of that implementation. We make teams think about what they are doing before they rush headlong into implementation. We do all these things, but they are not the act of testing that defines us.
I hope this helps to make sense of my ramblings and establishes a foundation for future conversations.
Sunday, October 9, 2011
If bugs were distributed randomly, the likelihood of a risk occurring would simply reflect the usage model of the software. But we know that bugs are not distributed randomly. Instead they cluster. They cluster in the code of a particular developer. The cluster in a design that doesn't fully meet its need. They cluster in how an environmental issue is handled across different areas of the software. Bugs cluster for all sorts of reasons. And we can't identify where these clusters are until we have already tested and found the bugs.
Difficult as the problem of likelihood is, predicting impact is much harder. The severity of a bug may well be random and this randomness thwarts us. The severity of the bug is not related to the importance of the functionality the code implements. Even bugs in remote byways of a system can cause it to crash or corrupt data. The honest answer to the question of what can take the system down is any line of code in it.
Sure, we find bugs using risk based techniques. We find bugs using any testing technique. Does it really result, though, in the identification and removal of our riskiest bugs? The evidence strongly suggests not.
Saturday, October 8, 2011
"Agile" is everywhere. Agile, on the other hand, is continues to be adopted only fitfully. For many attendees, agile meant the addition of iterations and standups and not much else. Practitioners seem to understand that this is not actually working so sessions about making agile work were very popular.
The executives who presented during the leadership summit seemed to be really out of touch with agile. You really can't claim to be agile when you have weekly status reports delivered to a central PMO and you had better be making your weekly progress "or else". The contempt for agile by the next speaker was almost dripping. It should come as no surprise that executives are clueless... I mean out of step with the current state of practice.
Pockets of traditional development and test remain. There were several tutorials from SQE instructors that have probably not changed since the early 1990s. This was much less true in the sessions where there were only a couple of sessions from the Military/Industrial/SEI faction.
Acceptance Test Driven Development (ATDD) was a hot topic. I think its supporters fail to do it justice. ATDD changes how we do specification and provides a mechanism for demonstrating that the software has implemented the intended functionality. But fulfilling requirements is not the same as not having bugs. ATDD is a technique for preventing bugs not a technique for detecting them. Practitioners intuitively understand this which accounts for the resistance of many test professionals to ATDD. It is unfortunate because it is unnecessary.
Another source of resistance to ATDD is its often incorrect implementation. Done correctly, ATDD means specifying the story's tests, preferably in an executable framework, and then developing the code to make the tests run one by one. It is TDD in the large. You know what still needs to be completed on a story by the tests that are still failing. In fact, the percentage of passing acceptance tests may be a more accurate measure of progress during a sprint than the hours remaining in the sprint burndown. (I think it would be interesting to put both measures on a single, highly visible, graph.)
Exploratory testing was also popular. It seems fair to say that one theme of the conference is that test cases are out (or at least limited to the things that can be automated) and exploratory testing is in. It is not hard to understand why exploratory testing is so popular. It encourages testers to do more of the testing they most enjoy and best do. It also helps that the practice has charismatic champions and is amenable to easy demonstration. The various exploratory demonstrations in the test lab were surprisingly popular given the limited space and that attending meant missing prepared sessions. Practitioners resistant to ATDD should consider exploratory testing as a complementary practice.
Test automation is something that everyone seems to do but everyone also seems to be dissatisfied with the results and wonders whether they should do more. Tool vendors keep pushing the concept that we can make this easy enough for the business analysts to do so testers won't have to. (Why you would try to sell that message to a room full of testers is beyond me.) Perhaps they should get together with the ATDD folks since their pitches now seem more directed at demonstrating that the functionality was implemented rather than finding bugs. I suspect they spend too much time talking to CIOs. As the vendors work harder and harder to make test automation accessible to non-technical testers, technical testers like me will have more incentive to find other solutions.
I wonder if some of the uneasiness around automation reflects a recognition that we are simply not finding and fixing enough bugs with automated tests. Sure, we find regression issues but these are, or should be, only a small proportion of the bugs. For years, I have talked about the role serendipity plays in finding bugs. You are running a manual test and notice something wrong completely unrelated to the purpose of the test. Automated tests completely fail to find these kinds of problems. (To which the exploratory testing folks would reply, "Of course.")
Mobile devices and cloud computing are new software platforms that were also program topics. Cloud computing offers some real opportunities for testing. Building environments to support realistic performance testing has always been expensive and these environments usually end up underutilized. Cloud computing offers a solution. Environments that can be built to scale when needed and released when not. Cloud computing also presents unique testing challenges. Services built in the cloud use highly parallel architectures that are susceptible to all sorts of concurrency issues which are difficult to test. Testing software for mobile devices seems more familiar. Its really just a configuration problem but the rapid improvement of these platforms means that the number of configurations and the difference in their capabilities is larger than most of us are used to.
Controversy came in the form of James Whittaker's keynote "All That Testing Is Getting in the Way of Quality." In it, he claimed testing was dead. Or, in more subtle terms, that the practice of testing to find and fix bugs had not resulted in significant improvements in software and could, in any case, be done more effectively, thoroughly, and cheaply by crowd-sourcing it. I think we can safely say by the hisses that accompanied any further mention of google and the topics of table conversations over the rest of the week, that he struck some sort of nerve. Since I had entered the conference with that very point of view, it was fascinating to watch the reaction. On this, there will be much more in subsequent posts.
The other question that seemed to be on many minds is "what's new here?" The answer from my part of the world is not much. I believe that James Bach did his original presentation on good enough software and rapid testing in the mid 90's. Agile, too, has passed its 15th birthday. ATDD is new but (perhaps justifiably) is not fully accepted as a test technique by practitioners. The software platforms are new, but the testing challenges are not.
I last attended a STAR conference in 1993. The differences were striking. Sessions had an air of discovery about them. Concepts like risk based testing or model based testing were new and promising. The internet had not yet happened so new information traveled more slowly. Conferences were the conduits where concepts transitioned from IEEE and ACM papers to practitioners. There were more sessions given by researchers from either academic or industrial labs. (Do those kinds of labs even exist anymore - outside of google that is?) Everything was open to investigation. I remember attending one very interesting session on regression test selection. Now we just run 'em all. Another session suggested that we all needed to be doing mutation testing to prove our test sets. And another promised automated test generation from program analysis.
There is a deeper reality here too, one that Dr. Whittaker alluded to. The practice of testing hasn't really changed all that much since the mid-90s. Sure, we've repackaged some of the things that we all knew in, hopefully, more effective forms. Long before ATDD, testers understood that tests were a much better specification than the specs we usually got. I remember suggesting that we just write the specs as tests years ago. Exploratory testing? Have we ever actually not done exploratory testing whether we told our bosses or not? We use the same tools, models, strategies, The design strategies in Lee Copeland's tutorial "Key Test Design Techniques" are the same ones that I learned from Dr. Boris Beizer at STAR in 1992. And the simple fact is that the application of these techniques has not, on the whole, resulted in software that works all that much better. As Dr. Whittaker suggests, the drivers of whatever improvements we've had in software have not come from improvements in testing. They could not, since there haven't really been any improvements in testing. And I believe that the industry is (or should be) entering a period of evolution to diminish the dependence on testing to develop working software. And that is the jumping off point for the posts to come.
Wednesday, October 5, 2011
Dr. James Whittaker's keynote has certainly been the hot topic of conversation at StarWest. Since I may be the only person here who agrees with him, let me share my take.
First, lets be clear that he was talking about "testing" as in the kind of testing done by testers in a testing department. He is not saying that no one will ever again run software to evaluate some characteristic of it. Just that it won't be done by testers as they exist today doing testing as we understand it today.
Second, the improvements in quality that have been made over the past decade do not involved testers testing software. He mentioned several improvements. Some, like better programming practices include using testing in new ways (like TDD). Some, like conformance to standards, involve testing not at all. But, the improvement in quality has not been due to testers using new testing practices. I last attended a STAR in the early 90s and the testing practices we had back then are by-and-large the same ones we use today. Even if they have new fangled names.
Third, he contends that the act executing software to find bugs can be done more cheaply and effectively through crowd-sourcing than by dedicated testers. Yes, I'm sure he knows that not all software is on the web and not all software testing can be replaced by crowd sourcing. But, to echo a comment in a previous session, if you have a software product and you don't think that someone, somewhere is trying to build something on the web to replace it, than you aren't paying attention. (Those are not the embedded systems you are looking for....)
As a result, if your only value comes from running tests to find bugs, you are on the path to obsolescence. Both because people are finding ways to build software that works well enough without doing that kind of testing and because even those companies that will continue to use that use finding and fixing bugs as a development strategy will find cheaper ways to let users do it. Sooner or later they will come take your job away.
The only place I differ is in the rate of change. Googlers tend to see the world changing at the speed of light. I look around and see the continued reliance on obsolete management practices, obsolete software systems and languages, and obsolete development practices, and come to the conclusion that inertia is much more resistent in the face of change that we think. Of course, that doesn't mean that I want to be trapped there.