Commentaries: October 2011

Tuesday, October 25, 2011

On "Why didn't you find this bug?"

The question "Why didn't you find this bug?" is an organizational smell. It reflects fundamental problems with way your organization thinks about testing and quality.

First, as Dawn Haynes points out, this isn't a "you" question, it is a "we" question. The "you" divides the team. A team that is functioning well, has collective ownership for delivering code that works and for the techniques used to remove bugs. When you ask why "We didn't find this bug?", you open up the range of possible answers.

The question is usually asked in the context of "Why didn't you (tester) find this bug (when doing the feature/system/acceptance testing of the software)? Notice the embedded assumption that regardless of how the software is written, by testing at the end, we should be able to remove all bugs. If you expect to remove all problems by testing, you will be sadly mistaken. Even when the test you planned should have caught the bug, the execution often does not go as planned because the software was delivered late or wasn't really ready to test. Its a classic case of an unrealistic expectation that we usually undermine anyway.

But the real problem is that the question"Why didn't you find this bug?" is usually asked to avoid having to answer the question that really matters, "Why did we write the bug in the first place?" The answer to this question requires reflection and creates the responsibility to learn which is, perhaps, why organizations would choose to avoid it. Better to imply blame and make you-know-who get better.

Until we really understand how we created the bug in the first place, we can't answer the question that really matters, "What is the cheapest thing we can do to prevent us from delivering software with this bug again?" Since testing is one of the most expensive and least reliable of the techniques we use to remove bugs, the answer often lies elsewhere. If you don't take steps to prevent the bug from being written in the future, you are giving permission to make it again. And since we shouldn't count on testing to find it in all its future occurrences, we are giving permission to deliver it again as well.

Friday, October 14, 2011

On Bugs

Another debate in the Test community is what the appropriate response should be to bugs found during testing. One camp believes that not all bugs should necessarily be fixed. As bugs are filed, some representative of the business (product owner, product manager, etc.) prioritizes them and determines which bugs should be fixed. In many organizations, this is accompanied by test exit criteria expressed in terms of bugs that remain open: no high priority bugs, and some number of of lower priority bugs. This is also often accompanied by a queue of bugs to be fixed due to the delay introduced to prioritizing them.

A different school believes that all bugs should be fixed. Immediately. When a bug comes in, work on some new feature stops and a developer is assigned to fix the bug. In this way, software is ready to release when the functionality is done. This paragraph is shorter, because the rule is a simpler even though it usually produces the response that you can't possibly fix all your bugs when you develop software.

Finally, the agilists seems to straddle both camps. Produce no bugs but let the business prioritize the ones you do.

Personally, I believe that no bugs is the most responsible position. First, let me be clear, when I talk about bugs, I mean the kinds of issues where the software does not do what it is supposed to do or fails in ways that the user would consider an error - like going down or destroying data. Software has lots of other kinds of issues. Sometimes it does what is it supposed to do but what it is supposed to do is not actually useful. Sometimes, it does what it is supposed to do, but in too complicated a way. Bugs, however, are developer errors that should be fixed.

When you hand over the responsibility for determining whether a bug gets fixed to the business, you assume that the incorrect behavior is the only reason to fix a but. Its not. One thing we know about bugs is that they cluster. Finding one increases the likelihood that there are others that we haven't found yet. And these other bugs may be worse than the one we found. Counting on further testing to find them? That's just playing Russian roulette. I've tried one chamber and its empty and then another and its empty, that must mean all the chambers are empty, right? You can't prioritize bugs by their symptoms, you need to understand their causes. Which means you need to do the hard bit anyway, you need to debug the bug. Every bug.

Debugging bugs has another benefit, it teaches developers how to stop coding them. It is a strange property of software development that while we try to avoid coding the same solution twice, we use the same programming patterns over and over again. The consequence of this is that any mistake a programmer makes is likely to be repeated over and over again. We found this bug, won't testing find all those others? That's just another game of Russian roulette. You can't find all the bugs, you can't even find all the important ones. So you had better learn to stop creating them.

Finally, a bug is a broken window. When you don't fix it, when you have other bugs that you also don't fix, you are creating a social norm that bugs aren't all that important. That working software is not all that important. And when you tell developers that working software is not important, you damage their morale and effectiveness as a team. I do not believe that this is a choice that the business gets to make. Agile, at least Scrum and XP, make the distinction between decisions that the business gets to make and decisions that engineering gets to make. The business does not get to make decisions about the engineering process and question of whether or not to fix bugs is an engineering process decision. It makes no more sense to have the business choose which bugs to fix than it does to have the business choose whether to do test driven design or choose which code review issues to address,

That's fine in theory, you may say, but it can't work in practice. Let me ask you this. I often hear the complaint that there aren't enough test resources. Most companies have one tester for every 3, 4, or more developers. How is it that one testers is finding more bugs than those 3, 4, or more developers can fix? Could it be because the software contains too many bugs that are too easily found? Not being able to fix all the bugs that testers find reflects deeper problems on the team. Prioritizing bugs won't fix those problems, although it may allow them to linger. It is a crutch that teams use to avoid having to improve.

You may think that you can't commit to fixing all the bugs. Others in the industry would disagree. Jeff McKenna, one of the founders of Scrum discussed fixing all bugs in a recent video interview. Joel on Software discussed Microsoft adopting a zero defects methodology in his 12 Steps to Better Code. In "The Art of Agile Development," James Shore discusses several XP projects that adopted a no bugs philosophy. If these teams can do it, so can yours.

Monday, October 10, 2011

On The Many Meanings of Testing

I have long been frustrated by the many different meanings that people have for the word testing. The past few days have added several more. These differences in understanding add to the adamance of our positions and occasional rancor of our discussions. So, for discussions on this blog at least, I wanted to set down the definition of testing that I use.

One of the definitions of testing that I learned recently was that (and I'm paraphrasing here because I don't remember the exact wording) testing includes any inquiry that we make that gives us information about the product. For clarification, I asked "does that include code reviews." "Yes" was the answer. "OK, how about attending a staff meeting?" "If it tells you something about the product." While I appreciate the attention on larger quality issues, I think there is value in distinguishing between the act and impact of inquiring through the execution of the software and other forms of inquiry. The power of techniques like exploratory testing come from running the software and looking at and being effected by the results.

If testing involves executing software, does that mean that any time you are executing software (before release at least) that you are testing? In one sense, certainly. But, again, I think this serves only to muddy the issue. Some characteristics of a system simply cannot be engineered without executing the system. All the modeling in the world won't identify all the bottlenecks in your system. Usability, too, can only be achieved through a process of trial and improvement. There are many organizations where these kinds of efforts are done by specialist engineers and not by those having the role of tester.

The testing that we increasingly do to prevent errors falls into this category as well. Test Driven Design and Acceptance Test Driven design are great methods for engineering software (in the small and large) that does what it is supposed to do. But when you introduce these topics to testers, you meet a great deal of resistance. Certainly not because the methods don't improve quality. We can all agree that they do. It seems to me, that the more likely reason is because these methods simply don't accomplish the ends that testers believe they are responsible for.

Enough already,The definition of testing I use is this: testing is the act of executing software in order to find bugs. This is the essentially the definition that Glen Myers gave us many years ago and is the definitions that I believe would find the greatest degree of acceptance among test practitioners. We test to find bugs and these bugs are a big part of our value.

That's not do say that bugs are our only value. In a previous post, I discussed the notion of contrarianism. Every team needs a skeptic to puncture the generally optimistic group think that infects teams. We also provide additional perspective on the value of the functionality that is being implemented and the usability of that implementation. We make teams think about what they are doing before they rush headlong into implementation. We do all these things, but they are not the act of testing that defines us.

I hope this helps to make sense of my ramblings and establishes a foundation for future conversations.

Sunday, October 9, 2011

On Buts

noun

a person who takes an opposing view, especially one who rejectsthe majority opinion
[dictionary.com]

'But' is the contrarian word in the English language. No matter what has come before, you know that once you hear 'but', you are about to hear an opposing view. And that opposing view is even more important than the original view that it stands in opposition to.

I recognize the power of 'but' because I have a wide streak of contrarianism. I don't accept ideas, I wrestle with them. If Jacob got a cool new name by wrestling with God, I always feel like I should get a cool new name when I wrestle with an idea. I recognize this streak in others as well. Have you ever had a discussion with someone where it seemed that no matter what you said, their response was to find some point to challenge or some nit to pick? Bingo, contrarian. Progress results from contrarians. They make ideas stronger, more resilient.

Testers are contrarians. We get paid for it. When a developer says "this works," we say "but what about?" This process of applied contrarianism makes software better. It hardens the software, makes it more resilient. Without contrarians on the team, the software would never reach its full potential.

But (there's that word), if you love your children and want them to grow up happy and socially well adjusted, don't let them be contrarians. Teach them the word 'and' instead. 'And' is the ultimate social word. 'And' allows us to work together in harmony to construct a shared idea. So remember to teach your children, don't be a 'but' be an 'and' instead.

On Risk Based Testing

Risk based testing is one of the many testing methodologies that has not delivered promised returns. Its not really hard to understand why. Ordering risks requires knowing two things about each risk: the likelihood of the risk occurring and impact if it does. For software, neither of these is knowable.

If bugs were distributed randomly, the likelihood of a risk occurring would simply reflect the usage model of the software. But we know that bugs are not distributed randomly. Instead they cluster. They cluster in the code of a particular developer. The cluster in a design that doesn't fully meet its need. They cluster in how an environmental issue is handled across different areas of the software. Bugs cluster for all sorts of reasons. And we can't identify where these clusters are until we have already tested and found the bugs.

Difficult as the problem of likelihood is, predicting impact is much harder. The severity of a bug may well be random and this randomness thwarts us. The severity of the bug is not related to the importance of the functionality the code implements. Even bugs in remote byways of a system can cause it to crash or corrupt data. The honest answer to the question of what can take the system down is any line of code in it.

Sure, we find bugs using risk based techniques. We find bugs using any testing technique. Does it really result, though, in the identification and removal of our riskiest bugs? The evidence strongly suggests not.

Saturday, October 8, 2011

On Blogging

The last post is a really good example of why I don't really do this whole blogging thing. Short as that essay is, it still took almost 5 hours of a Saturday to write and I had starting planning it in my head as early as last Tuesday. The people who blog in this industry must be smarter, write better, or have less of a life than even I do. It seems that four or five months after I escape a bad situation, I feel compelled to get my thoughts out on paper as it were. Perhaps it is my way to learn from the experience or just have a catharsis. So get ready for a flurry of activity.

On STARWest 2011

"Agile" is everywhere. Agile, on the other hand, is continues to be adopted only fitfully. For many attendees, agile meant the addition of iterations and standups and not much else. Practitioners seem to understand that this is not actually working so sessions about making agile work were very popular.

The executives who presented during the leadership summit seemed to be really out of touch with agile. You really can't claim to be agile when you have weekly status reports delivered to a central PMO and you had better be making your weekly progress "or else". The contempt for agile by the next speaker was almost dripping. It should come as no surprise that executives are clueless... I mean out of step with the current state of practice.

Pockets of traditional development and test remain. There were several tutorials from SQE instructors that have probably not changed since the early 1990s. This was much less true in the sessions where there were only a couple of sessions from the Military/Industrial/SEI faction.

Acceptance Test Driven Development (ATDD) was a hot topic. I think its supporters fail to do it justice. ATDD changes how we do specification and provides a mechanism for demonstrating that the software has implemented the intended functionality. But fulfilling requirements is not the same as not having bugs. ATDD is a technique for preventing bugs not a technique for detecting them. Practitioners intuitively understand this which accounts for the resistance of many test professionals to ATDD. It is unfortunate because it is unnecessary.

Another source of resistance to ATDD is its often incorrect implementation. Done correctly, ATDD means specifying the story's tests, preferably in an executable framework, and then developing the code to make the tests run one by one. It is TDD in the large. You know what still needs to be completed on a story by the tests that are still failing. In fact, the percentage of passing acceptance tests may be a more accurate measure of progress during a sprint than the hours remaining in the sprint burndown. (I think it would be interesting to put both measures on a single, highly visible, graph.)

Exploratory testing was also popular. It seems fair to say that one theme of the conference is that test cases are out (or at least limited to the things that can be automated) and exploratory testing is in. It is not hard to understand why exploratory testing is so popular. It encourages testers to do more of the testing they most enjoy and best do. It also helps that the practice has charismatic champions and is amenable to easy demonstration. The various exploratory demonstrations in the test lab were surprisingly popular given the limited space and that attending meant missing prepared sessions. Practitioners resistant to ATDD should consider exploratory testing as a complementary practice.

Test automation is something that everyone seems to do but everyone also seems to be dissatisfied with the results and wonders whether they should do more. Tool vendors keep pushing the concept that we can make this easy enough for the business analysts to do so testers won't have to. (Why you would try to sell that message to a room full of testers is beyond me.) Perhaps they should get together with the ATDD folks since their pitches now seem more directed at demonstrating that the functionality was implemented rather than finding bugs. I suspect they spend too much time talking to CIOs. As the vendors work harder and harder to make test automation accessible to non-technical testers, technical testers like me will have more incentive to find other solutions.

I wonder if some of the uneasiness around automation reflects a recognition that we are simply not finding and fixing enough bugs with automated tests. Sure, we find regression issues but these are, or should be, only a small proportion of the bugs. For years, I have talked about the role serendipity plays in finding bugs. You are running a manual test and notice something wrong completely unrelated to the purpose of the test. Automated tests completely fail to find these kinds of problems. (To which the exploratory testing folks would reply, "Of course.")

Mobile devices and cloud computing are new software platforms that were also program topics. Cloud computing offers some real opportunities for testing. Building environments to support realistic performance testing has always been expensive and these environments usually end up underutilized. Cloud computing offers a solution. Environments that can be built to scale when needed and released when not. Cloud computing also presents unique testing challenges. Services built in the cloud use highly parallel architectures that are susceptible to all sorts of concurrency issues which are difficult to test. Testing software for mobile devices seems more familiar. Its really just a configuration problem but the rapid improvement of these platforms means that the number of configurations and the difference in their capabilities is larger than most of us are used to.

Controversy came in the form of James Whittaker's keynote "All That Testing Is Getting in the Way of Quality." In it, he claimed testing was dead. Or, in more subtle terms, that the practice of testing to find and fix bugs had not resulted in significant improvements in software and could, in any case, be done more effectively, thoroughly, and cheaply by crowd-sourcing it. I think we can safely say by the hisses that accompanied any further mention of google and the topics of table conversations over the rest of the week, that he struck some sort of nerve. Since I had entered the conference with that very point of view, it was fascinating to watch the reaction. On this, there will be much more in subsequent posts.

The other question that seemed to be on many minds is "what's new here?" The answer from my part of the world is not much. I believe that James Bach did his original presentation on good enough software and rapid testing in the mid 90's. Agile, too, has passed its 15th birthday. ATDD is new but (perhaps justifiably) is not fully accepted as a test technique by practitioners. The software platforms are new, but the testing challenges are not.

I last attended a STAR conference in 1993. The differences were striking. Sessions had an air of discovery about them. Concepts like risk based testing or model based testing were new and promising. The internet had not yet happened so new information traveled more slowly. Conferences were the conduits where concepts transitioned from IEEE and ACM papers to practitioners. There were more sessions given by researchers from either academic or industrial labs. (Do those kinds of labs even exist anymore - outside of google that is?) Everything was open to investigation. I remember attending one very interesting session on regression test selection. Now we just run 'em all. Another session suggested that we all needed to be doing mutation testing to prove our test sets. And another promised automated test generation from program analysis.

There is a deeper reality here too, one that Dr. Whittaker alluded to. The practice of testing hasn't really changed all that much since the mid-90s. Sure, we've repackaged some of the things that we all knew in, hopefully, more effective forms. Long before ATDD, testers understood that tests were a much better specification than the specs we usually got. I remember suggesting that we just write the specs as tests years ago. Exploratory testing? Have we ever actually not done exploratory testing whether we told our bosses or not? We use the same tools, models, strategies, The design strategies in Lee Copeland's tutorial "Key Test Design Techniques" are the same ones that I learned from Dr. Boris Beizer at STAR in 1992. And the simple fact is that the application of these techniques has not, on the whole, resulted in software that works all that much better. As Dr. Whittaker suggests, the drivers of whatever improvements we've had in software have not come from improvements in testing. They could not, since there haven't really been any improvements in testing. And I believe that the industry is (or should be) entering a period of evolution to diminish the dependence on testing to develop working software. And that is the jumping off point for the posts to come.

Wednesday, October 5, 2011

On "All That Testing Is Getting in the Way of Quality"

Dr. James Whittaker's keynote has certainly been the hot topic of conversation at StarWest. Since I may be the only person here who agrees with him, let me share my take.

First, lets be clear that he was talking about "testing" as in the kind of testing done by testers in a testing department. He is not saying that no one will ever again run software to evaluate some characteristic of it. Just that it won't be done by testers as they exist today doing testing as we understand it today.

Second, the improvements in quality that have been made over the past decade do not involved testers testing software. He mentioned several improvements. Some, like better programming practices include using testing in new ways (like TDD). Some, like conformance to standards, involve testing not at all. But, the improvement in quality has not been due to testers using new testing practices. I last attended a STAR in the early 90s and the testing practices we had back then are by-and-large the same ones we use today. Even if they have new fangled names.

Third, he contends that the act executing software to find bugs can be done more cheaply and effectively through crowd-sourcing than by dedicated testers. Yes, I'm sure he knows that not all software is on the web and not all software testing can be replaced by crowd sourcing. But, to echo a comment in a previous session, if you have a software product and you don't think that someone, somewhere is trying to build something on the web to replace it, than you aren't paying attention. (Those are not the embedded systems you are looking for....)

As a result, if your only value comes from running tests to find bugs, you are on the path to obsolescence. Both because people are finding ways to build software that works well enough without doing that kind of testing and because even those companies that will continue to use that use finding and fixing bugs as a development strategy will find cheaper ways to let users do it. Sooner or later they will come take your job away.

The only place I differ is in the rate of change. Googlers tend to see the world changing at the speed of light. I look around and see the continued reliance on obsolete management practices, obsolete software systems and languages, and obsolete development practices, and come to the conclusion that inertia is much more resistent in the face of change that we think. Of course, that doesn't mean that I want to be trapped there.