software testing services
 Software testing training, consulting, and outsourcing from the experts: Rex Black Consulting Services (RBCS)
CALL US TODAY
(866) 438-4830
ISTQB certification testingISTQB certification testing ISTQB certification testing
PMI

Archive for the ‘software test metrics’ Category

Testing Metrics: Useful or Not?

Some of you who follow me on Twitter (@RBCS, by the way) may have seen the ongoing debate between myself on one side and what seems to be the entirety of the “context driven school of testing” on the other side.  I’ve been saying that testing metrics, while imperfect, are useful when used properly.  The other side seems to be saying…well, I’m not clear exactly what they are saying, and I’ll let them say it themselves on their own blogs.  Suffice it to say they don’t like test metrics.

If you have missed the Twittersphere brouhaha and you want to get more details on what I think about metrics, you can listen to my recorded webinar on the topic.  You can ask the other folks about what their objections are. 

Once you do, I’d be interested in knowing your thoughts. Are test metrics useful to you? What problems have you had?  What benefits have you received?  What different situations have you used metrics in (e.g., Agile vs. waterfall, browser-based apps, level of risk, etc.), and how did that context affect the metrics?  Let me know…

Estimation of Review Effort

I had an interesting question from a reader, Stas Milev:

Hi Rex

I hope you are well. I wanted to ask you a question about test estimation. I am sure you have been asked many of these before but the one I have is not really about the estimation techniques themselves (such as usage of historical data, dev effort, etc)

There is one area of test estimates which is always arguable, hard to estimate and finally explain to sponsors no matter how well you are prepared. This is a test analysis and design task which is vague by definition. If we quickly decompose it into smaller pieces we would end up with the following simplified list of activities:

1. Analyse the test basis (if exist).
2. Get ambiguities, inconsistencies and gaps in the test basis resolved.
3. Apply the test techniques to create the test cases.

While you can more or less quantify 1 and 3 in terms of the effort (let’s assume we at least have something to work with in terms of test basis), the issue is obviously with 2 where we are dependant on many people (Business Analysts, system Analysts, dev team, end-users, etc).

There are two obvious options we can choose from:

Option 1: Assume there will be no gaps, issues or they will be resolved immediately and all our questions will get answers with no delays and thus, simply estimate 1 and 3. Of course, we will make the assumptions documented to highlight the risk if they arise. The problem with this one is that we know straightaway we will overbudget and we will have to come back to business sponsors and ask for more money. Nobody likes doing this, especially if we start asking for an additional amount every single time our inadequate estimates deviate with the reality. Moreover, on some of the project the budget is fixed straight away once it has been confirmed.

Option 2: Get this slippage time or time spent on requirements clarification somehow estimated based on previous experience. The issue here is that this is an ‘unexplained’ effort to an extent it can’t be justified by a statement: “but we know there will be issues or something will not be ready”. Pretty valid scenario in this case would be: “Hey, we have just two requirements here. Why the hell it takes two weeks and not two days to create the tests for these two?”

To me getting the right questions raised, asked and answered is a part of test analysis and this activity is extremely important as it prevents defects. To a certain extent this is a very informal static testing or a QA activity which needs to be build into the process but nobody is willing to pay for it explicitly. From the other hand, the ethics does not allow you to simply ignore problems and test that a buggy software is buggy. In the latter case, I normally still try to squeeze in the static test and get decision makers to accept the risk that problems with the requirements may arise very late.

I wanted to hear for your recommendation on test analysis and design effort estimates and test effort negotiation with business sponsors and project managers.  It would be also great to hear your comments on both options or perhaps option 3 if it exists.

Thanks
Stas Milev
ISTQB Certified Advanced Test Manager (CTAL)

Hi Stas–

A good question.  What I would suggest is that the estimation for activities 1, 2, and 3 should be based on historical data.  So, if you know that you have some average number of test cases be identified quality risk, per specified requirement, per supported configuration, etc., you should be to estimate activities 1 and 3 based on the average number of hours effort associated per test case.  For activity 2, once again, if you have historical data on the average number of defects typically found per test basis document page, you should be able to estimate the number of defects you’ll find.  If you know the average time from discovery to resolution of such defects, and the average amount of effort for each such defect, you can then estimate the delay and effort.

The metrics gathered about test basis defects could be used not only for estimation, but also for process improvement.

Defect Metrics

I had another interesting question from a reader that I’m getting to in the blog today:

I was ready some of your excellent webinars, specially about the defect metrics, So my question is: Is there any benchmark data about defects severity, and also about reopen defects.?

I would really appreciate if you can guide me where I Can find this data.

Note: the purpose of this data is to measure and compare our results against the benchmark.

Tomas Gotes

Thanks for the question, Tomas.  In terms of defect severity, unfortunately there are no standard, common definitions for severity in the industry yet.  So, any metrics that showed aggregated data from multiple organizations would probably be rather questionable.  That said, you might check out Capers Jones’ excellent book, The Economics of Software Quality, which, as I recall from reviewing it, had some interesting analysis here.

In terms of defect report re-open rates, our target during assessments is 5% or less.  Some amount of defect report re-open seems inevitable, since test environments and data are generally more representative of production than are the developer’s environments.  However, there is a significant risk of schedule delay, as well as significant inefficiency (due to an additional layer of rework), when the defect re-open rate gets too high.

Measuring the Value of Unit Tests

I received an interesting, detailed query from a reader of my new e-book on testing metrics:

Dear Rex,

My name is Marcus Milanez and I’m a software developer who lives in São Paulo, Brasil. I recently bought a copy of your “Testing Metrics” ebook and found it really valuable for improving my understanding on testing processes, as well as hearing more words on the importance that metrics play on my daily tasks. Thanks for sharing all your studies and results, I appreciate it.

However, I still have some questions that I still couldn’t get any reasonable answers – maybe they’ve been presented to me already, but my limited comprehension is failing to absorb them. In order to exercise these questions correctly, I would like to add some general contexts, just to make sure that I’m using my words properly.

Forgetting a little bit about all the nitty-gritty details that are observed during the creation or maintenance of a software system, in general, in a regular software development process we have:

1) Customers require features that must be properly implemented

2) Developers along with specification teams, stakeholders and other parties, define the scope of that feature and write a couple of functional/acceptance tests that validate the analysis of the customer request

3) In a planned spring, developers implement the feature and get feedback from specs team, stakeholders and customers, to make sure that everything is fine.

4) Quality team verifies the quality of any version delivered by developers. Bugs are filled and latter fixed by developers in a sprint.

I clearly understand that the list above is incomplete, could be better and is definitely passive of change, but I’m just trying to give an overall context. Now, by reading your book, I could clearly see all the values that metrics and proper testing strategies add to a software project, and I’m glad I was presented to things like DDE and all those sample charts – that information is really gold. While reading the book though, I tried hard to think and use the same logic that you presented in your DDE formula, for somehow measuring the value of unit tests created by developers during development time. The thing is that I couldn’t get to any conclusion because looks like the value added by unit tests can’t be measured in the same way that you presented in your book. My reasons for that are:

1) Looks like unit tests are not making verifications in the same sense that exploratory or automated tests are – rather, we regularly use them for getting something built according to a set of premises, and later on, these set of premises can be verified.

2) Developers don’t usually count, or indicate, bugs found during development time. I don’t even think that this would make any sense.

3) They are not testing a use case, they are testing smaller pieces of it. Although these smaller pieces can directly affect a use case, looks like these observations are quite different from those observed by an automated functional/acceptance test.

So, my question is: can we measure the value/cost benefit added by unit tests, using metrics that indicate that they are definitely effective in a software project, or the value of unit tests are not related at all with the number of bugs filled by a validation team for example? Can the value added (or not) by unit tests be measured, qualified and improved? Is it possible to observe unit tests, alone, with a cost perspective?

I exercised this question on my own, and so far the following seem to be valid for me:

Looks like unit tests main intention is not to find bugs. Unit tests seem to be closer to a development tool (like an IDE is) than a tool for finding problems.

Since unit tests are part of the development effort, it is difficult to say how many hours have been spent during codification of unit tests and codification of production code.

Although a project with high a code coverage by unit tests tend to have less bugs filled, they are not a guarantee of a use case bug free code.Still thinking about number of bugs, I believe that unit tests tend to prevent run time errors more than use case errors.

Apparently, the real value of unit test comes in terms of maintenance, ability of adding or removing features without fear, and mainly as a means of communication with other developers working on the same code base.

If these first items were valid, it is likely that unit tests will certainly prevent the creation of a given number of bugs during validation phase. However, in order to assert that, we would need to collect all the metrics related to a given project that don’t have unit tests (like DDE, cost per bug fix, time per bug fix and others), properly implement unit tests for this project, and then get the very same metrics again for comparison. Is this rationale correct? If so, can we confidently say that unit tests, alone, were responsible for getting these numbers changed? Are there any study published that analyses these points?

Thank you so much for your attention and sorry for this long email. The thing is that I found your book extremely useful and I definitely want to improve my understanding of many aspects of my profession.

yours,

Marcus Milanez

Thanks for your kind words about my book.  I’m glad it’s proving interesting and useful to you, and that it provoked such a good and reflective set of questions.

Before I get into a detailed response, let me set up the background for what I’m about to write:

  • First, I agree with your brief, general summary of the software process as it applies in Agile development, and I’ll refine it further in a moment.  I would add, at this point, that in many organizations there is an additional test level that occurs, sometimes called “system integration test,” where the software produced in the Agile sprints is tested with other software which interoperates or cohabitates with it.  This level can occur once after the last sprint, but it’s better if it happens iteratively as each sprint concludes.
  • Second, since you mention sprints and since your description of the process seems very Agile-oriented, I assume that we are talking about unit tests in the context of Agile development.
  • Third, I assume that, when you talk about unit tests, since you say that developing the unit tests is “part of development,” you are talking about unit tests that are used for test-driven development (whether formally or informally).  That is, the unit tests are developed at the same time the code is being developed, and that coding, developing tests, and executing the tests proceeds in a tight loop until the unit is complete. 
  • Fourth, I assume, since you talk about unit tests allowing us to make changes without fear of regression, that you are talking about automated unit tests, ideally incorporated in a continuous integration framework such as Hudson.

With those assumptions in mind, here are my thoughts on the many good comments and points that you raise.  Let me start by reviewing the development process:

  1. Test-driven development is a great idea.  Indeed, I did that myself years ago when I was a programmer.  However, it’s a misnomer to call that process “test driven.”  The “tests” used in this process are actually more like executable low-level design specifications.  I know you are already aware of this, because of your comment about these tests “not testing use cases.”  What can be somewhat confusing about this is that the work product produced by test-driven development includes a set of automated unit tests that subsequently can be used for something properly referred to as testing, specifically regression testing. Since test-driven development, when it’s happening, is not really testing, the objective of the developer as he or she creates the code and the unit tests is not really to find defects.  (Finding defects is one of the typical objectives of most test levels, including unit testing.)  The main objective is to build confidence, as the coding proceeds, that the code being built works properly. So, if we want to measure the level of confidence that we should have, that is something better measured using coverage metrics.  In this case, the most relevant coverage metrics would be code coverage and, in some cases, data flow coverage. 
  2. Once the unit is completed, typically there is a missing step in most organizations.  Ideally the developer would apply structured test design techniques to augment the unit tests in order to make them true unit tests, and to find any defects that remain in the completed unit.  If developers were to do this systematically, the quality of code delivered for testing would dramatically include.  Capers Jones’ studies show that, typically, unit testing has a defect removal effectiveness of only 25%, but best practices in unit testing boost that to 50%. 
  3. In fact, there is another missing step in many organizations at this point.  The best practice would be for the programmer to take the code, the unit tests, and the unit test results to a code review.  A study by Motorola showed that, to maximize effectiveness, this review should include at least two people other than the author, so pair programming or simply having a lead developer review the code are not sufficient.  The unit tests might be augmented based on this review, and, if so, re-run.
  4. At this point in the process, the unit tests created in step 1 (and ideally steps 2 and 3) of the process are incorporated into a continuous integration framework, as mentioned earlier, and serve as ongoing guards against regression of the unit.  If the unit is refactored, the tests may have to be updated, as you mentioned.

So, let’s get back to metrics and measuring the value of unit testing, starting with not logging defects found by developers.  This is a mistake.  I agree that we don’t need to log failures that occur during step 1 above, because–as I said–we’re not really testing, we are developing the software.  However, once we get into steps 2, 3, and 4, I believe those defects should be logged.  Often when I say this, people have a horrified reaction and say, “Oh, no no no, we can’t have programmers interupting themselves and breaking their flow to log defects in some cumbersome bug tracking tool.”

To which I reply:  “I didn’t say they had to use a bug tracking tool.  I said they should log the defects.”  Here’s the distinction. I agree that there’s no need for a tracking tool here, because the bug is going to be immediately removed.  Its lifecycle will start and end at the same day, and so all the state-based stuff that the tracking tool does is unnecessary.  But we do need to log information about the defect, especially classification information, phase of introduction, etc., and we should use the same classifications as are used during formal testing.  This information can be captured in a simple spreadsheet. That way, we can do analysis of the defects found during unit testing (step 2), code reviews (step 3), and unit regression testing (step 4).

What can this analysis do for us?   A number of things.  First, it can help developers and the broader organization learn how to write better code.  Patterns in types of defects found during steps 2, 3, and 4 can show that training and mentoring is needed to reduce bad coding practices or habits that some developers might have. 

Second, to address your main question, it can allow us to directly measure the value of unit tests.  To do so, we need to do one more thing, which is to estimate the effort associated with unit testing and unit test defect removal as well as estimating the effort associated with other levels of testing (e.g., acceptance testing, system integration testing, etc.), removal of defects found in those levels of testing, and the effort associated with failures in production.  This is actually easier than you might think, and sufficiently accurate numbers can be obtained by surveying the people involved. 

The differences are often quite dramatic.  One client found that a defect found and removed in unit testing took three person-hours of effort, while a defect found and removed in system testing took 18 hours, a defect found and removed in system integration testing took 37 hours, and a defect found and removed in production took 64 hours.  So, for this client, each defect found in unit testing saved at least 15 hours of effort, and possibly as much as 61 hours of effort!

Notice that this approach allows us to avoid the “double blind study” approach that you mentioned earlier.  We don’t have to find two almost-identical projects, where the only difference is whether unit testing happened or not.  I expect that would prove impossible, so it’s good that we don’t need to do it.

Now, as you can see, we can measure part of the value delivered by unit testing, in terms of avoided downstream costs of failure.  (You might want to read more about cost of quality, which is the technique used above; check out my article here.)  However, I also agree with your earlier comments about unit testing having other objectives, such as easing the maintenance of code, and reducing the risk of breaking something when we do, as well as helping developers communicate about the code.  (That last benefit is especially applicable if you follow step 3 mentioned above.)  It is possible to develop metrics that allow you to measure these benefits as well, but, if the main objective of measuring the value of unit testing is to convince managers to continue to invest in it, then the “effort saved” metric mentioned above should be sufficient.

Measuring Technical Risk and Spotting Bug Clusters with Weighted Failure

A reader, Stas Milev, sent me a question which may be interesting to readers of this blog:

Hi Rex

First, let me thank you for the huge effort you put in writing the books and educating people on various testing topics – the materials are just excellent. I have been a test manager for 7 years already and still find a lot of useful things.

Thanks very much, I’m glad they are useful to you. 

I hope you don’t mind if I ask you a question related to weighted failure metric you mentioned in your Advanced Test Manager book. You did not focus to much on this in your book and just mentioned it measures technical risk and the likelihood of finding problems.  As such, can you please expand a bit more on how to analyse this metric, the value and meaning of it?

The weighted failure can be calculated both on a per-test basis and a per-test suite basis.  (A test suite is a logical collection of test cases, such as a functional test suite, a performance test suite, etc.)  The weighted failure counts the number of bugs found (either by test or across all the tests in the test suite), but each bug report is weighted based on the priority and severity of the bug.  In other words, a test suite that finds a moderate number of high priority, high severity bugs will probably score higher than a test suite that finds a large number of low priority, low severity bugs.

Probably the best way to learn more about this metric is to download and experiment with an Excel test tracking spreadsheetwith weighted failure included.  Feel free to work with this one a bit, and I think the concept will make more sense.

Many thanks
Stas Milev
ISTQB Certified Advanced Test Manager

You’re welcome. I hope this is useful.

Misusing Software Process Metrics as People Metrics

I had a query come in about a sample exam question in our Advanced Test Manager course.  Shukti asked me to confirm the answer to the following:

A given organization is using reviews for development work products like code, requirements and design specifications; test work products like test plans, quality risk analyses, and test design specifications; and, documentation and help screens. The review processes have been in place for two years and are delivering excellent financial, quality, and schedule benefits.

You are attending a management team meeting.  A senior executive raises the need to update the objectives by which the individual contributors are measured on their yearly performance evaluations.  He suggests using defect counts from review meetings.  He circulates a draft plan.  Under the plan, people will be rewarded based on the number of defects they find in reviews.  Further, people will be penalized if items they have produced incur too many defects during reviews.

Which of the following is a psychological factor affecting review success and failure that is likely to cause such an initiative to undermine the current success of the reviews?

  1. Scrutinize the document and not the author.
  2. Focus all participants on delivering high-quality items. [Correct]
  3. Try to find as many defects as possible.
  4. Assemble the right team of reviewers.

Here’s why that answer is correct.  Bonuses and other financial incentives/disincentives are based on the assumption that people are basically rational economic actors who will behave in a way to maximize their financial situation.  (Now, we can have a whole separate discussion about whether this assumption holds perfectly in all situations, but it really doesn’t need to be perfect, as long as more often than not it is true.)

So, in this scenario, what will the reviewers be focused on?  Not on delivering the highest quality items, but on finding the maximum number of bugs in each item and “claiming” those bugs for themselves (i.e., squabbling with other reviewers over who should get credit.)  What will the authors be focused on?  Arguing about every single bug that reviewers report, trying to insist that the document is perfect.  None of these behaviors is supportive of increased quality, and the authors’ behaviors are directly contrary to that goal.

In short, a really bad idea.  I wish I could say that I never saw organizations make this kind of mistake with process metrics (i.e., mistaking a software process metric for a people metric), but unfortunately it is all too common.

Effective, Efficient, and Elegant Software Testing

Reader Patricia Osorio asked another interesting question:

Hi Rex

In chapter 8 [of the Advanced syllabus], I was reading about Standards and Test process improvements. There you talk about Efficiency and Effectiveness. I have one definition about these terms, but what do I have in mind about these terms in order to be prepared for the advanced exam?

Thank you

Regards

Patricia Osorio Aristizabal

To answer, I’ll quote my explanation of the difference from Chapter 2 of Beautiful Testing:

Each stakeholder has a set of objectives and expectations related to testing.  They want these carried out effectively, efficiently, and elegantly.  What does that mean?

Effectiveness means satisfying these objectives and expectations.  Unfortunately, the objectives and expectations are not always clearly defined or articulated. So, to achieve effectiveness, testers must work with the stakeholder groups to determine their objectives and expectations.  We often see a wide range of objectives and expectations held by stakeholders for testers.  Sometimes stakeholders have unrealistic objectives and expectations.  You must know what people expect from you, and resolve unrealistic expectations, to achieve beautiful testing.

Efficiency means satisfying objectives and expectations in a way that maximizes the value received for the resources invested.  Different stakeholders have different views on invested resources, which might not include money.  For example, a business executive will often consider a corporate jet an efficient way to travel, because it maximizes her productive time and convenience.  A vacationing family will often choose out-of-the-way airports and circuitous routings, because it maximizes the money available to spend on the vacation itself.  You must find a way to maximize value—as defined by your stakeholders—within your resource constraints to achieve beautiful testing.

Elegance means achieving effectiveness and efficiency in a graceful, well-executed fashion.  You and your work should impress the stakeholders as fitting well with the overall project.  You should never appear surprised—or worse yet dumbfounded—by circumstances that stakeholders consider foreseeable.  Elegant testers exhibit what Ernest Hemingway called “grace under pressure,” and there’s certainly plenty of pressure involved in testing. You and your work should resonate as professional, experienced, and competent.  To achieve beautiful testing, you cannot simply create a superficial appearance of elegance—that is a con man’s job—but rather you prove yourself elegant over time in results, behavior, and demeanor.

In an upcoming newsletter, I’ll include this chapter as the featured article.

Measuring Software Test Processes (and Software Testers)

On the heels of the webinar last week, listeners have had lots of comments (all good) and some questions.  Here’s an interesting set of questions from listener Stephen Ho. I’ve interspersed my answers in his e-mail, with “RB:” in front to make it easier to follow:

Rex,

Thanks for such Webinar. This webinar did not talk about how to organize and build up a good testing Metrics.

RB:  There was a general discussion early on about how to go from an objective to a specific metric and specific targets for that metric.  Perhaps you were missing the part about implementing the metrics with specific tools? 

However, it provided some interesting points to measure the successful of a testing project, such as: BFE. Just for more realistic, how can we know that a testing metrics is good?

RB: One attribute of a good metric is that it is traceable back to some specific objective. That objective should relate to a process (e.g., finding defects is an objective for the test process), to a project (e.g., reaching 100% completion of all schedule tests is an objective for many projects), or to a product (e.g., reaching 100% coverage of requirements with passing tests is an objective for some products).  Another important attribute is that the metric supports smart decision-making and, if necessary, guides corrective action.  Yet another important attribute is that the metric have a realistic target.

At here, I have another topic for your interest.

“What is effective & efficiency testing?”

RB:  We have to be more specific than this.  What are the objectives for testing, as you mean it here?  Once you have defined those objectives, you can then discuss effectively and efficiently meeting them.  For example, if you define finding defects as an objective, then you can use the DDP metric (discussed in the presentation) as a metric of effectiveness.  Cost of quality (which is discussed in various articles in the RBCS web site, such as this one) can serve as a metric of efficiency.

-how to narrow it down to know that our existing testing job is in effective & efficiency ways.

RB:  You might want to read the chapter I wrote (Chapter 2) on this topic in the book Beautiful Software.  That’s a book worth reading, anyway, because there a number of other good chapters in it.

-What are the right way to measure the performance of a QA?

RB:  I assume you’re talking about an individual tester here. If we can define specific objectives for the tester, then we can use the same method to define metrics.  Keep in mind the rule about objectives needing to be SMART.

-How can we know that a QA is in competence level?

RB: Check out my book, Managing the Testing Process, 3e, for a discussion about how to use skills inventories to manage the skills of your test team.

-How to increase the productivity of a QA?

RB: This question is too general, I’m afraid.  Productive at what?  I suggest that you define specific objectives for the test team, and then measure the current efficiency with which those objectives are achieved.  At that point, you can make realistic (and measurable) goals for improvement of productivity.

You may have existing webinar or article regarding this topic. If yes, I am thirsty to study your material. Would you direct me how to access to this information. I would definitely provide my feedback to you.

RB:  Follow this link for another article on metrics that you might find useful.

Thanks,

Stephen

RB: You’re welcome.

Quantifying Testing Effectiveness with the Defect Detection Percentage

After the test metrics webinar held yesterday–link to recorded webinar coming soon in the Digital Library–we had an attendee ask a good question by e-mail (mailto:info@rbcs-us.com).  Linda Li wrote to ask,

Hello Rex,

 I just attended your free webinar about test metrics, you mentioned :

DDP=Bugs Detected/Bugs Present.

 So I want to know how can I get ‘Bugs Present’?  what’s included in Bugs Present?  Thank you very much.

You delivered a great presentation, that is really help me much.

Thanks for the kind words about the presentation, Linda. I do hope it provides useful ideas. 

As this metric, which is variously called defect detection percentage (DDP) or defect detection effectiveness (DDE), it is mathematically defined as Linda mention:

DDP = bugs found/bugs present

When we’re talking about testing at the end of the software development or maintenance process, we can say that:

bugs present = bugs found by testing + bugs subsequently found in production

So, to calculate DDP for testing, use this formula:

DDP = bugs found by testing/(bugs found by testing + bugs subsequently found in production)

In this equation, test bugs are the unique, true bugs found by the test team. This number excludes duplicates, non-problems, test and tester errors, and other spurious bug reports, but includes any bugs found but not fixed due to deferral or other management prioritization decisions. Production bugs are the unique, true bugs found by users or customers after release that were reported to technical support and for which a fix was released; in other words, bugs that represented real quality problems. Again, this number excludes duplicates, non-problems, customer, user, and configuration errors, and other spurious field problem reports, and excludes any bugs found but not fixed due to deferral or other management prioritization decisions. In this case, excluding duplicates means that production bugs do not include bugs found by users or customers that were previously detected by testers, developers, or other prerelease activities but were deferred or otherwise deprioritized, because that would be double-counting the bug. In other words, there is only one bug, no matter how many times it is found and reported.

To calculate this metric, you need to have a bug tracking system for all the bugs found in testing (and for those using Agile methods, yes, I do mean bugs found during the sprints even if fixed during the sprints).  You also need a way to track bugs found in production. Most help-desk or technical-support organizations have such data, so it’s usually just a matter of figuring out how to sort and collate the information from the two (often) distinct databases. You also have to decide on a time window. That depends on how long it takes for your customers or users to find 80 percent or so of the bugs they will find over the entire post-release life cycle of the system. For consumer electronics, for example, the rate of customer encounters with new bugs (unrelated to new releases, patches, and so forth) in a release tends to fall off very close to zero after the first three to six months. Therefore, if you perform the calculation at three months, adjust upward by some historical factor—say 10 to 20 percent—you should have a fairly accurate estimate of production bugs, and furthermore one for which you can, based on your historical metrics, predict the statistical accuracy if need be.

Note that this is a measure of the test processes’ effectiveness as a bug-finding filter. Finding bugs and giving the project team an opportunity to fix them before release is typically one of the major objectives for a test team, as I mentioned in my presentation.

Advanced Software Testing: Bug Isolation

Reader Gianni Pucciani has a good question about a question in the Advanced Software Testing: Volume 2 book:

I have another doubt for a question in Advanced Software Testing Vol. 2. It is about the first question in Chapter 7, Incident Management. The book says that the correct answer is C “Insufficient Isolation”. What does it mean? I had chosen B “Inadequate classification information”, because all the rest was not making sense to me. For B, I could justify it saying that more information could be added to the incident report, e.g the error message displayed by the application.

Here is the question from the book:

Assume you are a test manager working on a project to create a programmable thermostat for home use to control central heating, ventilation, and air conditioning (HVAC) systems. In addition to the normal HVAC control functions, the thermostat also has the ability to download data to a browser-based application that runs on PCs for further analysis.

During quality risk analysis, you identify compatibility problems between the browser-based application and the different PC configurations that can host that application as a quality risk item with a high level of likelihood.

Your test team is currently executing compatibility tests. Consider the following excerpt from the failure description of a compatibility bug report:

1. Connect the thermostat to a Windows Vista PC.

2. Start the thermostat analysis application on the PC. Application starts normally and recognizes connected thermostat.

3. Attempt to download the data from the thermostat.

4. Data does not download.

5. Attempt to download the data three times. Data will not download.

Based on this information alone, which of the following is a problem that exists with this bug report?

A. Lack of structured testing

B. Inadequate classification information

C. Insufficient isolation

D. Poorly documented steps to reproduce

The reason that the answer is “C” is because we don’t see any evidence of the tester trying some different scenarios to see if the data downloads properly.  The testing is clearly well-structured and carefully thought out, and the steps to reproduce are well-described.  The classifications are not given, so we have no way of saying, based on this information alone, whether those classifications are correct.



 
`