software testing services
Rex Black Consulting Services | software testing experts providing consulting, outsourcing, and software training
CALL US TODAY
(866) 438-4830
ISTQB certification testingISTQB certification testing ISTQB certification testing
PMI

Archive for the ‘software test management’ Category

Misusing Software Process Metrics as People Metrics

I had a query come in about a sample exam question in our Advanced Test Manager course.  Shukti asked me to confirm the answer to the following:

A given organization is using reviews for development work products like code, requirements and design specifications; test work products like test plans, quality risk analyses, and test design specifications; and, documentation and help screens. The review processes have been in place for two years and are delivering excellent financial, quality, and schedule benefits.

You are attending a management team meeting.  A senior executive raises the need to update the objectives by which the individual contributors are measured on their yearly performance evaluations.  He suggests using defect counts from review meetings.  He circulates a draft plan.  Under the plan, people will be rewarded based on the number of defects they find in reviews.  Further, people will be penalized if items they have produced incur too many defects during reviews.

Which of the following is a psychological factor affecting review success and failure that is likely to cause such an initiative to undermine the current success of the reviews?

  1. Scrutinize the document and not the author.
  2. Focus all participants on delivering high-quality items. [Correct]
  3. Try to find as many defects as possible.
  4. Assemble the right team of reviewers.

Here’s why that answer is correct.  Bonuses and other financial incentives/disincentives are based on the assumption that people are basically rational economic actors who will behave in a way to maximize their financial situation.  (Now, we can have a whole separate discussion about whether this assumption holds perfectly in all situations, but it really doesn’t need to be perfect, as long as more often than not it is true.)

So, in this scenario, what will the reviewers be focused on?  Not on delivering the highest quality items, but on finding the maximum number of bugs in each item and “claiming” those bugs for themselves (i.e., squabbling with other reviewers over who should get credit.)  What will the authors be focused on?  Arguing about every single bug that reviewers report, trying to insist that the document is perfect.  None of these behaviors is supportive of increased quality, and the authors’ behaviors are directly contrary to that goal.

In short, a really bad idea.  I wish I could say that I never saw organizations make this kind of mistake with process metrics (i.e., mistaking a software process metric for a people metric), but unfortunately it is all too common.

Independent Testing in an Agile World

My Agile Testing Opportunities webinar continues to stir up discussion.  A listener, Rana Zoghbi, commented:

Hello,

I have a small question concerning this presentation:

You have talked about the opportunities of agile in testing, but what about the pitfalls that a tester can encounter in an agile environment? I am pursuing the Professional tester course – Foundation level, and it mentions that usually an agile mode is hostile towards independent testing. Can you please elaborate further on this?

 Thanks,

Rana  

Thanks for the opportunity to clarify some points, Rana. First, in terms of pitfalls, yes, there are testing pitfalls associated with any lifecycle model, and Agile is no exception.  My presentation yesterday was focused on how Agile lifecycles offer testers cetain opportunities (also present with any lifecycle model), but, if you want to hear about the pitfalls, you can listen to my previous webinar, Agile Testing Challenges.

Second, whether Agile teams are “hostile” to independent test teams is something that varies widely.  We have a number of clients that are using Agile methods and have independent test teams.  In a number of those situations, the independent test team is well-respected and well-established as a complement to the Agile approach.  The way in which the independent test team interacts with the Agile teams tends to vary, so I’d recommend that you check out my previous webinar on Test Organization Options for more details.

Historically, there certainly was some bad blood between Agile methodologists and professional testers. One early sign of trouble occurred when Kent Beck, the originator of Extreme Programming, gave a keynote at the now-defunct Quality Week conference in San Francisco in 1999.  In his talk, he was reported to have said that the entire concept of independent testing was going to fade away, since it would be made irrelevant by the Agile approach to creating software. 

Thirteen or so years later, we (the professional testers) are still here and still relevant.  There are still some dogmatists on the extreme fringes of the Agile world who reject the concept of professional, independent testers, sure.  However, my sense is that pragmatic software practitioners who are adopting Agile–and often adapting it in the process–see the value of independent test teams, staffed with professional testers, and providing testing services to their Agile teams.  I certainly see a lot of clients that are successfully doing so.

Bug, Fault, Defect, Failure, Error, Mistake: What’s It All Mean?

We have lots of terms in software testing, but we’re not always clear what they mean.  A common confusion is about the terms bug, defect, fault, failure, anomaly, incident, false positive, error, and mistake. Reader Rodrigo Cursino asks:

I read [Foundations of Software Testing]… You defined bug as a synonymous with fault or defect. If we exercise the part of software that contains that bug we’ll have a failure.

Now I’m reading your article “The Bug Reporting Processes” and I’m a little confused. Should the correct [title] be “The Failure Reporting Processes”. It’s because we report failures not bugs. The developers, based on the steps to reproduce the failure will be able to debug it and find the bug.

Rodrigo, you are right.  This is a case of common usage prevailing of terminological rigor, I suppose.  Many people talk about “bug reporting,” so I used that term in the article and in my book Critical Testing Processes

If we want to be rigorous in our terminology, here’s the way to think about the sequence of events:

  • The programmer makes a mistake (also called an error).  This can be a misunderstanding of the internal state of the software, an oversight in terms of memory management, confusion about the proper way to calculate a value, etc.
  • The programmer introduces a bug (also called a defect) into the code.  This is the programmatical manifestation of the mistake.
  • The tester executes the part of the software that contains the bug.
  • If the test was properly designed to reveal the bug, the test can cause the buggy software to execute in such a way that the behavior of the software is not what the tester–who is closely observing the behavior–would expect.  This difference between expected behavior and actual behavior is called an anomaly.
  • The tester then investigates the anomaly to determine the exact failure. The failure may go beyond the obvious, immediately observable misbehaviors associated with the anomaly.  For example, data might have been corrupted, another process improperly terminated, etc.
  • The results of that investigation are a report, which is commonly referred to in the software business as a bug report, an incident report, a defect report, an issue, a problem report, or various other names. 
  • Whatever we call it, that report gets prioritized and (in some cases) ultimately routed to a programmer.  The programmer then debugs the program in order to repair the underlying bug.
  • Ideally, the bug fix (typically as part of a larger test release) comes back to the tester who reported the problem in the first place for a confirmation test.  If the confirmation test passes, the report can be closed as fixed.  If the confirmation test fails, the report should be re-opened.

Now, a few additional points are worth making:

  • In some cases, when a tester runs a test, she observes an anomaly, but not due to a failure.   This happens when the anomaly results from a bad test, bad test data, an improperly configured test environment, or simply a misunderstanding on the tester’s part.  This situation is referred to as a false positive.  Because some reports will inevitably be false positives–testers being human, they will also make mistakes–some people like to refer to them as incident reports.  An incident is some situation that requires further investigation, and in this case the programmer will investigate whether the incident was really caused by a failure.
  • Bugs (or defects) can be introduced into work products other than software.  For example, a business analyst can put a bug into a requirements specification.  Bugs in requirements specifications and design specifications (and code, for that matter) are ideally detected by reviews.  When a bug is detected by a review (or by static analysis), notice that the bug is what is actually detected; the software is not executing, so no failure occurs.
  • Some people use the word fault instead of bug or defect.  I don’t like that term, and avoid it.  When I talk about a fault, perhaps it sounds like I’m talking about something that is someone’s fault; i.e., implications of blame can arise.  Bugs happen for various reasons, and individual carelessness is not at the top of the list.  We should see bugs (and bug reports) as a way to understand the quality capability of the software process, not as a way to apportion blame.

So, back to Rodrigo’s bigger point:  Does it really matter what we call the report?  If you want to be 100% correct in your terminology, then incident report is probably the best name.  However, I think that failure report is fine, too.  I also think that, because these terms are so widely used, defect report and bug report are also acceptable.  However, when using the terms defect report or bug report, it’s important that people keep in mind the sequence of events laid out above, and that the report actually describes the symptom of the bug, not the bug itself.

Questions about Advanced Test Manager Course

We have some interesting questions about our Advanced Test Manager e-learning course, which also apply to the Advanced Software Testing: Volume 2 book (since it is the main source for the course), from Patricia Osorio Aristizabal. She asks the following questions, with my answers interspersed below.

Hi Rex

Could you please help me with more questions about chapter 3? I understand, this chapter is quite important for a test manager (for the certification exam and for his/her job) as you tell us:

RB:  Yes, this really is a critical chapter in the book, the course, and the syllabus.

 ·         Does the exam include questions about test effort estimation using test point analysis (TPA)? I mean, It is possible a question in which I have to calculate a test effort estimation using TPA? I would like if I have to do more practice about that (more exercises)

RB:  I make it a rule never to speculate about what specific questions will be on the exams.  I will say this: It would be very wise to make sure, before taking any Advanced exam, that you are ready to answer any question on any of the learning objectives defined for that module or for the Foundation syllabus.  That includes combinations of learning objecties (i.e., one question covering multiple learning objectives), cross-section questions (i.e., questions that cover material and learning objectives from two or more sections, including sections in the Foundation), and Foundation review questions (i.e., questions about any of the six chapters of the Foundation syllabus). 

For the specific technique you are concerned about, TPA, that is in Chapter 3 section 4.  There are two learning objectives:

·  (K3) Estimate the testing effort for a small sample system using a metrics based and an experience-based approach considering the factors that influence cost effort and duration

·  (K2) Understand and give examples to the factors listed in the syllabus which may lead to inaccuracies in estimates

TPA would fall under the first learning objective. 

·         In the slide 188 you talk about colors: green, may be red but it is in just in black

RB:  That is very strange if you are not seeing colors.  I can understand that the hardcopy would not show colors, but the web should.  I’d suggest checking your browser settings.  If that still doesn’t work, please send a screen shot of the offending slide to info@rbcs-us.com and we’ll open a defect report for it.

·         In the slides 190, the rolling closure period value. It is not clear for me, could you please tell me where I could find more information about this kind of charts? They are not familiar for me. It the same case for the graph in the slide 198

RB:  Rolling closure period is the average time from reporting to resolution for all defects resolved on and before the date shown.  The daily closure period is the average time from reporting to reslution for all defects resolved on the date shown.  This metric is described further in Managing the Testing Process, 3e.  You can download the bug analysis charts from the RBCS Basic Library and Digital Library to see how exactly this is calculated.

·         In the slide 195, I found the following definitions, could you please help me finding the difference between each of them?

·         Plan Effort (or Planned Effort): The number of person-hours planned for this test. That might be more than the test effort, providing time for additional exploration in this area, or it might be less, meaning the tester is supposed to triage the conditions covered during testing.

·         Actual Effort: The number of person-hours ultimately expended on the test. This might not match the planned effort, particularly if the test failed and the tester needed to expend significant effort to isolate and report the problems observed.

RB:  Planned Effort is the number of person-hours of effort planned by the test manager for a test.  It should be based on an estimate by the tester who designed and implemented the test or by actual effort from the last time the test was run.  Planned Effort is an estimated target that the tester should try to stay within when executing the test.  Actual Effort is the actual amount of person-hours of effort that was spent by the tester who executed the test.  For various reasons, especially test failure, the Actual Effort might exceed the Planned Effort.

There are a lot of questions, sorry about that

Regards, Patricia Osorio Aristizabal

 No problem. I hope these answers were helpful.

Psychology and Politics In Software Test Management

Last week, as some of you will know, I gave a webinar on the psychological and political aspects of software test management.  You can find the recorded webinar here (http://www.rbcs-us.com/software-testing-resources/163) and the PDF version of the slides here (http://www.rbcs-us.com/images/documents/July-6-2011-Psychopolitics-of-Testing.pdf).

Patricia Ensworth wrote me to comment on the webinar:

Just wanted to let you know I thought your webinar was superb. As someone who has experienced many of the pressures and dilemmas you described, I found your analysis insightful and your advice on target. Well done!

With all due respect, there is one point you made about which I’d like to offer an alternative perspective. I completely agree with you that when test managers are labeled Quality Assurance Managers and put in charge of enforcing standard practices in a vague quest for product quality it’s a one-way ticket to nowhere (except maybe martyrdom). However, I have occasionally seen situations where to strengthen the organizational position of the testing group and to leverage the testers’ holistic perspective senior IT management has given the test manager other kinds of Quality Assurance responsiblities. For example, the so-called QA manager might be aligned with Compliance/Security initiatives mandated by regulators, or with Business Analysis projects to re-engineer processes or services. With strong enough support from matrixed senior managers, it can sometimes be a workable arrangement.

In any event, thanks for a thought-provoking, useful session.

I agree with Patricia’s point. I have indeed seen that work successfully.  Thanks for mentioning it, Patricia.

I’d be interested in hearing from other readers and listeners to the webinar.  What is your experience with psychology and politics in software test management?

Upgrade Reveals Regression Bug of the Week

I almost titled this blog post, “Why Maintenance Testing Is Good (and Why Pair Networks Should Do More of It),” but decided it was the confluence of events that was the real story here.

Last week was a week for RBCs to get hit with a bunch of costs of external failure, foisted on us by various vendors and service providers.  First, we had the problem with the dying audio driver during the Q&A session of our metrics webinar (see here for details).  Next, and literally just as I was finishing the blog post describing that reliability bug, the consequences of an apparently-not-well-tested upgrade by Pair Networks (hosts of our website) hit our entire site like a grenade. 

At first, rbcs-us.com was completely taken out by the upgrade.  With some heroic efforts by our web team, we got most of the site back up and running within a couple hours.  However, the store was damaged more severely, due to database hooks apparently. The store is still down, and won’t be back up for a few days.  The irony of this situation is that, while we intend to send store discount codes to people inconvenienced by the webinar audio crash, there’s no point doing that until the store is back up.  Compounding costs of external failure.

If you’re thinking, “Well, ho-hum, these things happen,” consider this mind-experiment.  Imagine Ford Motor Company released an upgrade to engine control software and sent it to all of their dealers to install in cars during their next scheduled maintenance.  Imagine that this software caused thousands of cars to stop working completely, and to only be partly repairable with minor efforts, with complete restoration of function requiring a major service.  Who do you think would be paying for the service?  The customer?  Or Ford?

The answer, of course, is Ford.  Software and software services, however, remains among the few businesses that are allowed to transfer their costs of external failure, which is a cost-of-quality way of saying “deliver crap products and services to their customers without having to face the consequences.”  This isn’t the first time I’ve made this point on this blog (see here), and I’m guessing it won’t be the last. 

Anyway, a big Bronx cheer out to Pair Networks for their failure to properly test this update before putting it out there, and an even bigger Bronx cheer out to Pair Networks for their utter failure to even bother to contact us (or presumably anyone else) to express regret for the cost and inconvenience inflicted by their negligence.

P.S.  For those of you unfamiliar with the phrase, dictionary.com defines “Bronx cheer” as “a loud, abrasive, spluttering noise made with the lips and tongue to express contempt.”

Measuring Software Test Processes (and Software Testers)

On the heels of the webinar last week, listeners have had lots of comments (all good) and some questions.  Here’s an interesting set of questions from listener Stephen Ho. I’ve interspersed my answers in his e-mail, with “RB:” in front to make it easier to follow:

Rex,

Thanks for such Webinar. This webinar did not talk about how to organize and build up a good testing Metrics.

RB:  There was a general discussion early on about how to go from an objective to a specific metric and specific targets for that metric.  Perhaps you were missing the part about implementing the metrics with specific tools? 

However, it provided some interesting points to measure the successful of a testing project, such as: BFE. Just for more realistic, how can we know that a testing metrics is good?

RB: One attribute of a good metric is that it is traceable back to some specific objective. That objective should relate to a process (e.g., finding defects is an objective for the test process), to a project (e.g., reaching 100% completion of all schedule tests is an objective for many projects), or to a product (e.g., reaching 100% coverage of requirements with passing tests is an objective for some products).  Another important attribute is that the metric supports smart decision-making and, if necessary, guides corrective action.  Yet another important attribute is that the metric have a realistic target.

At here, I have another topic for your interest.

“What is effective & efficiency testing?”

RB:  We have to be more specific than this.  What are the objectives for testing, as you mean it here?  Once you have defined those objectives, you can then discuss effectively and efficiently meeting them.  For example, if you define finding defects as an objective, then you can use the DDP metric (discussed in the presentation) as a metric of effectiveness.  Cost of quality (which is discussed in various articles in the RBCS web site, such as this one) can serve as a metric of efficiency.

-how to narrow it down to know that our existing testing job is in effective & efficiency ways.

RB:  You might want to read the chapter I wrote (Chapter 2) on this topic in the book Beautiful Software.  That’s a book worth reading, anyway, because there a number of other good chapters in it.

-What are the right way to measure the performance of a QA?

RB:  I assume you’re talking about an individual tester here. If we can define specific objectives for the tester, then we can use the same method to define metrics.  Keep in mind the rule about objectives needing to be SMART.

-How can we know that a QA is in competence level?

RB: Check out my book, Managing the Testing Process, 3e, for a discussion about how to use skills inventories to manage the skills of your test team.

-How to increase the productivity of a QA?

RB: This question is too general, I’m afraid.  Productive at what?  I suggest that you define specific objectives for the test team, and then measure the current efficiency with which those objectives are achieved.  At that point, you can make realistic (and measurable) goals for improvement of productivity.

You may have existing webinar or article regarding this topic. If yes, I am thirsty to study your material. Would you direct me how to access to this information. I would definitely provide my feedback to you.

RB:  Follow this link for another article on metrics that you might find useful.

Thanks,

Stephen

RB: You’re welcome.

Quantifying Testing Effectiveness with the Defect Detection Percentage

After the test metrics webinar held yesterday–link to recorded webinar coming soon in the Digital Library–we had an attendee ask a good question by e-mail (mailto:info@rbcs-us.com).  Linda Li wrote to ask,

Hello Rex,

 I just attended your free webinar about test metrics, you mentioned :

DDP=Bugs Detected/Bugs Present.

 So I want to know how can I get ‘Bugs Present’?  what’s included in Bugs Present?  Thank you very much.

You delivered a great presentation, that is really help me much.

Thanks for the kind words about the presentation, Linda. I do hope it provides useful ideas. 

As this metric, which is variously called defect detection percentage (DDP) or defect detection effectiveness (DDE), it is mathematically defined as Linda mention:

DDP = bugs found/bugs present

When we’re talking about testing at the end of the software development or maintenance process, we can say that:

bugs present = bugs found by testing + bugs subsequently found in production

So, to calculate DDP for testing, use this formula:

DDP = bugs found by testing/(bugs found by testing + bugs subsequently found in production)

In this equation, test bugs are the unique, true bugs found by the test team. This number excludes duplicates, non-problems, test and tester errors, and other spurious bug reports, but includes any bugs found but not fixed due to deferral or other management prioritization decisions. Production bugs are the unique, true bugs found by users or customers after release that were reported to technical support and for which a fix was released; in other words, bugs that represented real quality problems. Again, this number excludes duplicates, non-problems, customer, user, and configuration errors, and other spurious field problem reports, and excludes any bugs found but not fixed due to deferral or other management prioritization decisions. In this case, excluding duplicates means that production bugs do not include bugs found by users or customers that were previously detected by testers, developers, or other prerelease activities but were deferred or otherwise deprioritized, because that would be double-counting the bug. In other words, there is only one bug, no matter how many times it is found and reported.

To calculate this metric, you need to have a bug tracking system for all the bugs found in testing (and for those using Agile methods, yes, I do mean bugs found during the sprints even if fixed during the sprints).  You also need a way to track bugs found in production. Most help-desk or technical-support organizations have such data, so it’s usually just a matter of figuring out how to sort and collate the information from the two (often) distinct databases. You also have to decide on a time window. That depends on how long it takes for your customers or users to find 80 percent or so of the bugs they will find over the entire post-release life cycle of the system. For consumer electronics, for example, the rate of customer encounters with new bugs (unrelated to new releases, patches, and so forth) in a release tends to fall off very close to zero after the first three to six months. Therefore, if you perform the calculation at three months, adjust upward by some historical factor—say 10 to 20 percent—you should have a fairly accurate estimate of production bugs, and furthermore one for which you can, based on your historical metrics, predict the statistical accuracy if need be.

Note that this is a measure of the test processes’ effectiveness as a bug-finding filter. Finding bugs and giving the project team an opportunity to fix them before release is typically one of the major objectives for a test team, as I mentioned in my presentation.

What is Test Control?

I received an interesting question from a colleague in Malaysia, Dhiauddin Suffian.  He wrote:

Hi Rex, I have one simple question with regard to Fundamental Test Process. As we aware, the process involves Planning & Control, Analysis & Design, Implementation & Execution, Evaluating Exit Criteria & Reporting and Test Closure. My concern is on the Test Planning and Control, since it goes along the way of the whole process. I have no issue on the “Planning” portion. My question is directed to the “Control” part. What are “Control” activities involved in subsequent phases, i.e. “Control” activities that happen in Analysis & Design, Implementation & Execution, Evaluating Exit Criteria & Reporting and Test Closure, respectively. Thanks. Regards, -Din (CTFL, CTAL-TM)-

Test control can be thought of as the test management tasks required throughout the test process in order to keep the testing aligned with the software development process, the needs of the project, and the needs of the organization.  These tasks occur as needed, based on the judgement of the test manager or other members of the project team, and can also occur on a planned basis.

For example, we might plan to regularly check our risk analysis to see if we have discovered new risks, or uncovered information that tells us we should revise the risk levels for the existing risks.  As another example, if we find that a key piece of testing hardware will be available earlier than we expected, we might re-work our test execution schedule to accelerate the tests that use that hardware. 

Yet another example could be if we discovered, during test execution, that a key test staff member will be leaving the team.  In this case, if we did a thorough job during test planning, we might have identified a contingency plan for loss of a key staff member.  This is a classic project risk, after all, and a good manager should consider all such risks.  If we do have a contingency plan, triggering that contingency plan would be an act of test control.

Here’s an analogy:  Think of the test plan as a roadmap, with the starting location and the final destination clearly indicated.  This roadmap will help you drive to your chosen destination.  However, throughout your drive, you should plan to stop at traffic lights, mind your lane and speed, adapt to unexpected events (such as pedestrians stepping into a crosswalk), and even adaptively overcome errors in the roadmap (such as discovering a planned route is closed due to roadwork).  While a good test plan makes test control easier–just as a good roadmap makes driving easier–the smart manager remains ever alert to the possible need for test control.

Calculating Defect Detection Effectiveness

Reader Gianni Pucciani has another good question in his review of the Advanced Software Testing: Volume 2 book.  He asks:

My doubt is on question 4 of chapter 10 (People Skills and Team Composition).

The correct answer is B, and I had chosen B by excluding all the others which were for sure wrong.

However, my question is: how do you know that your team found 90% of defects by the time you need to give bonuses?

You know for sure the number of defects found prior to release, but how do you know the total number of defects if not after an agreed period (1 year?) of production use?

How would you implement this approach in a real life situation?

Here’s the question from the book:

You are a test manager in charge of system testing on a project to update a cruise-control module for a new model of a car. The goal of the cruise-control software update is to make the car more fuel efficient. Assume that management has granted you the time, people, and resources required for your test effort, based on your estimate. Which of the following is an example of a motivational technique for testers that will work properly and is based on the concept of adequate rewards as discussed in the Advanced syllabus?

A. Bonuses for the test team based on improving fuel efficiency by 20% or more

B. Bonuses for the test team based on detecting 90% of defects prior to release

C. Bonuses for individual testers based on finding the largest number of defects

D. Criticism of individual testers at team meetings when someone makes a mistake 

Gianni is of course right, the answer is B.  He is also right that there is some lag time after release required to calculate the defect detection effectiveness.  Defect detection effectiveness is calculated as

DDE = (defects detected)/(defects present).

In the case of the final stage of testing, you can calculate this as

DDE = (defects detected in testing)/(defects detected in testing + defects detected in production).

The bottom side of that equation (the denominator) is a reasonably good approximation for “defects present” is you wait long enough. 

So, how long is “long enough”?  Most of our clients find that they can determine the typical period of time in which 90% of the defects will be reported on a given release, usually through analysis of the field failure information.  In some organizations, this is as short as 30 days, though 90 days seems a more typical number.



 
`