If a test suite is used to judge something about the goodness of a software product, surely it itself must be good. And surely its goodness must be evaluated before its results can be trusted.
This paper is about how to do that.
We will first discuss how one might statically evaluate a finished test suite. As a part of that evaluation, we will begin to answer the question "what does it mean for a test suite to be good?" We'll illustrate the analysis with a specific example. The static evaluation procedure will include these elements (not all of which would be used in any given situation). Note: the final paper's list will doubtless differ in some ways.
1. Is the suite's purpose clearly understood? A test suite is created to serve some person, often a project manager wanting to know if the software is ready to ship, sometimes a developer wanting to know if it's time to move on. The customer of the suite has certain criteria. A suite whose creator doesn't know those criteria cannot be a good one.
2. In practice, the customer may have only vague criteria. In even the best case, a suite's creator should have refined the criteria through a risk analysis. Does the suite's creator have a good understanding of:
* The different classes of end users and their relative importance to the project?
* What kinds of failures matter most to those different end users?
* What kinds of failure-producing situations the product is likely to encounter?
* What characteristics of the product are shakiest?
A suite whose creator cannot answer these and related questions will be misdirected.
3. Is the suite's approach appropriate to the risks? Can the suite's creator describe (or has she already described) how the decision about what, where, and how to test was guided by the risks? Has the approach chosen been compared to any checklists of common test practices?
4. Were the right tests retained? Here, we distinguish between two different definitions of "the test suite". One definition is "all the tests run against the product up to the present time". The other is "all the tests that will be run again in the future". It often makes sense to discard tests if they are not expected to be likely to find bugs in the future. Can the test suite creator articulate the policy by which the tests were discarded or retained? Does it seem to sort tests into the right categories?
5. Are the retained tests maintainable? For those tests that are automated, do they avoid the automation trap: tests that are intended to be rerun, but are so sensitive to product change that the testers will spend all their time trying to keep old tests working, none of their time creating new tests? Do manual tests use elaborate scripts? Do these combine the worst aspects of automated tests (poor maintainability) with the labor-intensiveness of manual testing?
6. What does the suite not do? Here, we evaluate the suite against various types of coverage. Code coverage is one example, but there can be many others (such as checking whether every statement in the user documentation has been tested). More than one type of coverage is appropriate. A larger set will compensate for the weaknesses of any given type.
In all uses of coverage, we must take care to match the coverage against the test approach. Branch coverage, for example, says little about the goodness of load testing of multi-user systems.
You will note that this approach is highly subjective. We will not duck this issue. We will in fact confront the tendency to retreat to those few measures that are completely objective (such as code coverage), as we believe they do more harm than good.
James Bach (http://www.satisfice.com) is founder and principal consultant of Satisfice, Inc. James cut his teeth as a programmer, tester, and SQA manager in Silicon Valley and the world of market-driven software development. He has worked at Apple, Borland, a couple of startups, and a couple of consulting companies. Through his models of Good Enough quality, exploratory testing, and heuristic test design, he focuses on helping individual software testers cope with the pressures of life in the trenches and answer the questions "What am I doing here? What should I do now?"
Brian Marick has worked in testing since 1981. A consultant since 1992, he concentrates on developer testing, the interface between developers and independent testers, the criteria for test evaluation, and helping teams and projects understand and manage the tradeoffs inherent in software assurance. He is the author of The Craft of Software Testing (1995) and was the first editor of Software Testing and Quality Engineering Magazine.