Archive for May, 2010
Next week, I’ll be in Germany for the Testing and Finance conference, giving a keynote speech on how testing professionals and teams can satisfy their stakeholders. One of the key themes of that presentation is the following:
There are a wide variety of groups with an interest in testing and quality on each project; these are testing stakeholders. Each stakeholder group has objectives and expectations for the testing work that will occur.
When we do test assessments for our clients, we often find that test teams are not satisfying their stakeholders.
Why? Well, many times, what the testers think the stakeholders need and expect from testing differs from what the stakeholders actually need and expect. In order to understand the stakeholder’s true objectives and expectations, testers need to talk to each stakeholder group. Since in many cases the stakeholders have not thought about this issue before, these talks often take the form of an iterative, brainstorming discussion between testers and stakeholders to articulate and define these objectives and expectations.
To truly satisfy these stakeholders, the test team needs to achieve these objectives effectively, efficiently, and elegantly.
- Effectiveness: satisfying objectives and expectations to some reasonable degree.
- Efficiency: maximizing the value delivered for the resources invested.
- Elegance: achieving effectiveness and efficiency in a graceful, well-executed fashion.
The next step of defining these objectives, and what it means to achieve them effectively, efficiently, and elegantly, is often to define a set of metrics, along with goals for those metrics. These metrics and their goals allow the test team to demonstrate the value they are delivering. With goals achieved, testers and stakeholders can be confident that testing is delivering satisfying services to the organization.
Are you satisfying your stakeholders? Catch me in Bad Homborg, Germany, on June 8, to discuss the topic with me directly. Or, your can e-mail email@example.com to find out when we will post the recorded webinar on the RBCS Digital Library.
When I talk to senior project and product stakeholders outside of test teams, confidence in the system—especially, confidence that it will have a sufficient level of quality—is one benefit they want from a test team involved in system and system integration testing. Another key benefit such stakeholders commonly mention is providing timely, credible information about quality, including our level of confidence in system quality.
Reporting their level of confidence in system quality often proves difficult to many testers. Some testers resort to reporting confidence in terms of their gut feel. Next to major functional areas, they draw smiley faces and frowny faces on a whiteboard, and say things like, “I’ve got a bad feeling about function XYZ.” When management decides to release the product anyway, the hapless testers either suffer the Curse of Cassandra if function XYZ fails in production, or watch their credibility evaporate if there are no problems with function XYZ in production.
If you’ve been through those unpleasant experiences a few times, you’re probably looking for a better option. In the next 500 words, you’ll find that better option. That option is using multi-dimensional coverage metrics as a way to establish and measure confidence. While not every coverage dimension applies to all systems, you should consider the following:
- Risk coverage: One or more tests (depending on the level of risk) for each quality risk item identified during quality risk analysis. You can only have confidence that the residual level of quality risk is acceptable if you test the risks. The percentage of risks with passing tests measures the residual level of risk.
- Requirements coverage: One or more tests for each requirements specification element. You can only have confidence that the system will “conform to requirements as specified” (to use Crosby’s definition of quality) if you test the requirements. The percentage of requirements with passing tests measures the extent to which the system conforms.
- Design coverage: One or more tests for each design specification element. You can only have confidence that the design is effective if you test the design. The percentage of design elements with passing tests measures design effectivity.
- Environment coverage: Appropriate environment-sensitive tests run in each supported environment. You can only have confidence that the system is “fit for use” (to use Juran’s definition of quality) if you test the supported environments. The percentage of environments with passing tests measures environment support.
- Use case, user profile, and/or user story coverage: Proper test cases for each use case, user profile, and/or user story. Again, you can only have confidence that the system is “fit for use” if you test the way the user will use the system. The percentage of use cases, user profiles, and/or user stories with passing tests measures user readiness.
Notice that I talked about “passing tests” in my metrics above. If the associated tests fail, then you have confidence that you know of—and can meaningfully describe, in terms non-test stakeholders will understand—problems in dimensions of the system. Instead of talking about “bad feelings” or drawing frowny faces on whiteboards, you can talk specifically about how tests have revealed unmitigated risks, unmet requirements, failing designs, inoperable environments, and unfulfilled use cases.
What about code coverage? Code coverage measures the extent to which tests exercise statements, branches, and loops in the software. Where untested statements, branches, and loops exist, that should reduce our confidence that we have learned everything we need to learn about the quality of the software. Any code that is uncovered is also unmeasured from a quality perspective.
If you manage a system test or system integration test team, it’s a useful exercise to measure the code coverage of your team’s tests. This can identify important holes in the tests. I and many other test professionals have used code coverage this way for over 20 years. However, in terms of designing tests specifically to achieve a particular level of code coverage, I believe that responsibility resides with the programmers during unit testing. At the system test and system integration test levels, code coverage is a useful tactic for finding testing gaps, but not a useful strategy for building confidence.
The other dimensions of coverage measurement do offer useful strategies for building confidence in the quality of the system and the meaningfulness of the test results. As professional test engineers and test analysts, we should design and execute tests along the applicable coverage dimensions. As professional test managers, our test results reports should describe how thoroughly we’ve addressed each applicable coverage dimension. Test teams that do so can deliver confidence, both in terms of the credibility and meaningfulness of their test results, and, ultimately, in the quality of the system.
In a later post, I’ll talk about what software testing is, and what it can do. However, in this post, I’d like to talk about what software testing isn’t and what it can’t do.
In some organizations, when I talk to people outside of the testing team, they say they want testing to demonstrate that the software has no bugs, or to find all the bugs in it. Either is an impossible mission, for four main reasons:
- The combination of software execution paths (control flows) in any non-trivial software is either infinite or so close to infinite that attempting to test all of the paths is impossible, even with sophisticated test automation.
- Software exists to manage data, and these large dataflows are separated across space (in terms of the features) and time (in terms of static data such as database records). This creates an infinite or near-infinite set of possible dataflows.
- Even if you could test all control flows and dataflows, slight changes in the software can cause regressions which are not proportional to the size of the change.
- There are myriad usage profiles and field configurations, some unknown (especially in mass-market, web, and enterprise software) and some unknowable (given the fact that interoperating and cohabiting software can change without notice).
It’s important to understand and explain these limits on what software testing can do. Recently, the CEO of Toyota said that software problems couldn’t be behind the problems with their cars, because “we tested the software.” As long as non-testers think that testers can test exhaustively, those of us who are professional testers will not measure up to expectations.
As regular readers of my posts, books, and/or articles know, I like risk based testing. That said, it’s not without its own risks. One key project risk in risk based testing is missing some key quality risks. If you don’t identify the risk, you can’t assess the level of risk, and, of course, you won’t cover the risk with tests–even if you really should.
How to mitigate this risk? Well, one part is getting the right stakeholders involved, and I have thoughts on doing that in a previous blog post. Another part is to use the right approach to the analysis, as discussed in this blog post.
However, another key part of getting as thorough-as-possible a list of risks is to use a framework or checklist to structure and suggest quality risks. I’ve seen four common approaches to this, two of which work and two of which don’t work.
- A generic list of quality risk categories (such as the one available in the RBCS Basic Library here). These are easy to learn and use, which is important, because all the participants in the risk analysis need to understand the framework. It is very informal, and needs tailoring for each organization.
- ISO 9126 quality characteristics (for example of ISO 9126, see here). This is very structured and designed to ensure that software teams are aware of all aspects of the system that are important for quality. It is harder to learn, which can create problems with some participants. It also doesn’t inherently address hardware-related risks, which is a problem for testing hardware/software systems.
- Major functional areas (e.g., formatting, file operations, etc. in a word processor). I do not recommend this for higher-level testing such as system test, system integration test, or integration test, unless the list of major functional areas is integrated into a larger generic quality risk categories list that includes non-functional categories. By themselves, lists of major functional areas focus testing on fine-grained functionality only, omitting important use cases and non-functional attributes such as performance or reliability.
- Major subsystems (e.g., edit engine, user interface, file subsystem, etc. in a word processor). This approach does work for hardware, and in fact is described in some books on formal risk analysis techniques like failure mode and effect analysis such as Stamatis’s classic. However, as with the functional areas, risk lists generated from subsystems tend to miss emergent behaviors in software systems, such as–once again–end-to-end use cases, performance, reliability, and so forth.
Here’s my recommendation for most clients getting started with risk based testing. Start with the general list of quality risk categories I mentioned above. Customize the risk categories for your product, if needed, but beware of dropping any risk categories unless everyone agrees nothing bad could happen in that category. If you find you need a more structured framework after a couple projects, move to ISO 9126.
I want to depart a bit from the usual theme to share a cautionary tale about reliability that has lessons for system design, system testing, cloud computing, and public communication. Regular readers will have noticed that we had only one post last week, down from the usual two posts. The reason is that Friday’s post was pre-empted by a thunderstorm that knocked out the high-speed internet to our offices. We get our internet from a company called GVTC. In fact, the storm appears to have affected hundreds of customers, because we are only now (a full three days after the failure) finding out that we won’t have a GVTC service person here until Thursday.
Shame on RBCS for not having backup, you might think. But we did have backup. In addition to a GVTC’s fiber-based wired connection, we had a failover router (a Junxion Box) with an AT&T 3G wireless card in it. However, when the fiber-to-ethernet adapter failed, it created a surge in the ethernet connection (which ran through the router). That surge completely destroyed the router. So, no backup internet. Worse yet, because the router was also acting as the DHCP server, the entire local area network was now inaccessible.
Chaos ensued, as you might imagine, and we’re still recovering from it. I’ll spare you the details of what we have done and are still doing, and jump to the lessons.
- Testing lesson: Yes, we had tested the failover to the 3G router. We did it by disconnecting the ethernet connection to the GVTC fiber-to-ethernet hardware. That tested the “what happens if the connection goes dark” condition. It didn’t test the “what happens if the hardware is damaged and starts sending dangerous signals” condition. The lesson here for testers is, when doing risk analysis for reliability testing, make sure to consider all possible risks. Murphy’s Law says that the one risk you forget is the one that’ll get you.
- Design lesson: When you’re designing for reliability, don’t assume that single-points-of-failure can be eliminated simply by adding a failover resource. If the failover resource is connected in some way to the primary resource, there may well be a path for failure of the primary to propagate to the failover. Our particular problem is the kind of design flaw that the iterative application of hazard analysis could have revealed. I’ll be more careful with choice of contract support personnel as I rebuild this network.
- Cloud lesson: Cloud computing and software as a service (SaaS) are the latest thing, and gaining popularity by leaps and bounds. RBCS doesn’t rely much on the cloud, other than having our e-learning systems remotely hosted. That hosting of e-learning was a good decision, it turns out, because the loss of connectivity to our offices did not affect our e-learning customers. However, had we relied on our e-learning system for internal training over the weekend, we’d have been out of luck. A key takeaway here–especially if you run a small business like I do–is that, if you rely on the cloud or SaaS, those applications are no more reliable than your high speed internet access.
- Public communication lesson: For those who communicate to the public, GVTC’s handling of this problem is a textbook example of how not to communicate. They did not issue any e-mail or phone information about what to expect. My business partner spent over five hours on the phone with them in the last 72 hours, and it wasn’t until today that we got even the remotest promise of resolution. She was told conflicting stories on each call. The IVR system at one point instruted her to “dial 9 for technical support,” and, when she did, it replied, “9 is not a supported option.” Clear communication to affected customers when a service fails will have a big impact on the customers’ experience of quality. Conversely, failing to communicate sends a clear message, too: “We don’t care about you.”
Enough ruminations on the lessons learned. Later this week, we’ll be back to our regularly-scheduled programming. In the meantime, give a thought to reliability–before circumstances force you to do so.
Let’s suppose you’ve succeeded in convincing your project team to adopt risk based testing (e.g., using the pitch outlined in this previous blog post). Quality risk analysis is the initial step in risk based testing, where we identify the quality risks that exists for the software or system we want to test, and assess the level of risk associated with each risk item (see previous blog post here for more on risk factors). Obviously, it’s important to get this step right, since everything else will follow from the risks your identify and your assessment of them. Here are five tips to help you do the best possible job of it.
- Use a cross-functional brainstorming team to identify and assess the risks, ensuring good representation of both business and technical stakeholder groups. This is absolutely the most critical of these five tips. The better the quality risk analysis team represents the various perspectives that exist in the project, the more complete the list of quality risk items and the more accurate the assessment of risk associated with each item.
- Identify the risk items first, then assign the level of risk. This two-pass approach ensures that people consider risks in relationship to each other before trying to decide likelihood and impact, which helps reduce the risk rating inflation or deflation that can occur when each risk is considered in isolation.
- Only separate risk items when necessary to distinguish between different levels of risk. In other words, you typically want to keep the risk items as “coarse grained” as possible, to keep the list shorter and more manageable. Remember, this is test analysis, not test design. You’re trying to decide what to test, not how to test it. You’ll identify specific test cases for each risk item once you get into test design.
- Consider risk from both a technical and a business perspective. Risk items can arise from technical attributes of the system and from the nature of the business problem the system solves. Technical considerations determine the likelihood of a potential problem and the impact of that problem on system should it occur. Business considerations determine the likelihood of usage of a given feature and the impact of potential problems in that feature on users.
- Follow up and re-align your risk analysis with testing activities and project realities at key project milestones. No matter how well you do the risk analysis during the initial step, you won’t get it exactly right the first time. Fine-tuning and course-correcting are required.
If you apply these five tips to your quality risk analysis activities, you’ll be well on your way to doing good risk based testing. You might consider some of the other suggestions I have in the video on risk based testing available on our Digital Library here.
As I’ve mentioned before in this blog (and elsewhere), we work with many clients to help them implement risk based software and system testing. Two key steps in this process are the identification and assessment of risks to the quality of the system. In my 15 years of experience doing risk based software testing, I’ve found that the only reliable way to do this is by including the appropriate business and technical stakeholders.
When I explain this to people getting started, I sometimes hear the objection, “Oh, our project team members are all very busy, so I can’t imagine they’ll want to spend all the time required to do this.” Fortunately, this is an easily resolved concern.
First of all, to answer the “why would they want to participate?” implication of that objection, I’d refer you to my previous blog post on this topic here. Next, let’s look at the “how much time are we talking about” implication.
For most stakeholders, risk based testing involves only a little bit of their time to collect their thoughts. Risk identification via one-on-one or small team interviews requires about 90-120 minutes each, with risk assessment interviews either being separate follow up discussions of about the same length or even sometimes included in the risk identification interview. There’s typically a subsequent meeting to review, finalize, and approve the risk assessment.
It’s true that, by using project team brainstorming sessions, the workload on the stakeholders is higher than just 2-3 hours total. Risk identification and assessment via brainstorming sessions requires a single, typically one-day meeting. Most of our clients choose the “sequence of interviews” approach, because of the difficulty of scheduling all-day meetings.
Either way, the interview or session participants need to think about three questions during these steps in the process:
- What are the potential quality problems with the system (i.e., what are the quality risks)?
- How likely is each potential problem (i.e., how often do we find such problems during testing or in production)?
- How bad is each potential problem (i.e., what is the business and customer pain associated with such problems)?
By including the right selection of technical and business stakeholders, and thinking realistically (i.e., with neither excessive pessimism or optimism) about these three questions, the stakeholder team can produce a realistic and practical quality risk assessment.
If you’re interested in more information on risk based testing, you might want to take a look at the videos and other resources available here.
Good bug reports are important. For many test teams, they are the primary deliverable, the most frequent and common touchpoint between the test team and the rest of the project. So, we’d better do them well.
Here are ten steps I’ve used to help my testers learn how to write better bug reports:
- Structure: test carefully, whether following scripts, software attacks, or exploratory testing.
- Reproduce: test the failure again, to determine whether the problem is intermittent or reproducible.
- Isolate: test the failure differently, to see what variables affect the failure.
- Generalize: test the same feature elsewhere in the product, or in different configurations.
- Compare: review similar test results, to see if the failure has occurred in the past.
- Summarize: relate the test and its failure to customers, users, and their needs.
- Condense: trim unnecessary information from the report
- Disambiguate: use clear, unambiguous words and phrase, and avoid words like disambiguate.
- Neutralize: express the failure impartially, so as not to offend people.
- Review: have someone look over the failure report, to be sure, before you submit it.
If you apply these steps to your daily work as a tester, you’ll find that bugs get fixed quicker, more bugs get fixed, and programmers will appreciate your attention to detail.