How to measure Defect Detection Efficiency/Rate?

Hi All! First post here so be gentle! :slight_smile:

I’ve been working on some quality metrics and was advised to focus on defect containment numbers. One recommendation was to provide formula for “Defect Detection Efficiency” and “Defect Detection Rate” metrics.

The initial formulas I’ve come up are below, which I would like to ask in this group if correct or not:

  1. Defect Detection Efficiency = [No. of defects found by test team / (No. of defects found by test team + No. of defects found by client)*100]
    Target: The higher the percentage, the more effective the test team is, meaning test team found a lot of defect than the client.

  2. Defect Detection Rate = (No. of defects found by test team / No. of Test Cases executed) *100
    Target: The higher the percentage, the more effective the test team is, goal is to find 1 defect per test case.

Honestly, I don’t feel confident in the formula I mentioned above especially no. 2, as there is a low chance getting a high percentage if lesser defects is found for a bulk of test cases.

Hope you can shed some light if I am on the correct path or have some better formula to use.

1 Like

Welcome, I suppose the first thing that would be useful is how are these metrics going to be used and what does whoever has asked for them actually want to know? Also what kind of working environment is this for, do testers and dev work closely together?

The issue with metric 1 is it encourages testers to raise lots of issues and deincentivizes working closely with devs to sort issues before they get to test which probably isn’t what you want.

Again with metric 2 it is encouraging me to raise lots of bugs (regardless of value) and write less tests. Would also question why the goal is 1 bug per test.

2 Likes

First of all, welcome. I shall try to be as zen as I can.

You have come with a technical question. You shall leave with a moral one.

I know those algorithms to be wrong. Not because the calculations are incorrect, but because they describe the unknowable. Worse, they describe a value that will be weaponized against our fellow humans. To discover peace, we must understand not what we measure, but what measuring does to our world. When we shine light upon a particle to measure it we change what we came to measure, and so it is with metrics.

Here are a smattering of issues:

  • The number of defects your client finds is not necessarily the number they report
  • A defect report is not a defect
  • A defect is subjective. It is a problem in the eyes of a human. Which humans found the defect? Is it a defect?
  • Letting the test team know the algorithm will change their behaviour. They could report more defects that do not exist or are minor problems to increase their efficiency. This will distract them from finding problems that really matter to people that matter
  • One test case is not like another. They are non-fungible. Counting them is silly - it is like counting clouds; it is not the quantity of clouds that matter but our memory of their effects, good and bad, before they left us. Test cases will have different coverage. They will have different sizes. They will have different run times. They will be run differently depending on who is running them. They may be dependent on something else in the project. They may be written at different levels of quality. They are but leaves of paper on which we inexpertly and naively engrave the predicted actions and observations of the full rainbow of human emotion and action. They are abstractions of artifacts, nothing more. See through the abstraction and a test case evaporates, and we are left catching clouds with a butterfly net.
  • When you submit these metrics you will be creating a game. A game that you will encourage your test team to play, and win. If winning that game means doing poor work or upsetting them without cause then that is what will happen. You must be sure that the apparent innocence of such an action does blindfold you to cruelty.

Let us turn to mathematics to find insight into the language of our creation. One defect is not like another. Let us say that the test team report 100 minor problems - perhaps typos, UI alignment issues, things that the test team thinks a problem but the client doesn’t, and so on. The client reports one incident where installing the software wipes their hard drive. Your defect detection efficiency is 100 / (100+1) * 100 ~= 99%. Do you feel that this number was helpful? Did it give insight into the quality of your team? What will you do with your 99%?

You are setting a goal to find 1 defect per case. Here is how to achieve your goal: have the programmers write in exactly one defect, execute one test case you know will find that defect. You have a 100% defect detection rate. If your software actually has no defects that no client ever sees and your client is very happy then you should punish your test team, because they failed to find the problems. You could argue that your developers are worthy of your ire for failing to write any defects.

You must find a way to realise the harmony of your team. Humans are not built for measurement and judgement and serfdom under a game of numbers. They must work together to build what they can. Let us not measure so, let us explore and play! Let us dance in the spirit of our work as testers and ask questions of our software, our processes and ourselves.

Metrics brings worry. Perverse incentives, inexpert measurement, inaccurate judgement of our fellow companions on this earth, the crippling constraints of working under numbers that we have picked from the air and to which we build idols, looking down upon us like Gods on the mountain as we fear their wrath and permit them to play with us for their sport.

Be as a pebble in a stream, and let the worries of metrics wash away.

Take the human approach. Let your testers explore. Take the locks off the cages of test cases and allow them to roam, free range, across the software. Give them enough guidance to achieve the sorts of coverage that matter to you, and then let them express their humanity and intelligence to work with your programmers to not only find more important problems faster, but problems not covered by the attempted repetition of identical tasks, and to find problems before they are even coded or conceived of by design.

Is it the flag that moves? The wind that moves? Or the mind?

Wind, flag, mind moves,
The same understanding.
When the mouth opens
All are wrong.

9 Likes

Why focus on bugs?
The purpose of testing isn’t to find bugs. It’s to discover information.

You should measure the quality of your testing…

Here’s another way to look at it: If you and your team did some amazing testing, but didn’t find any bugs, the metrics you suggest for focusing on bugs would show a picture of failure… But you know that you did great testing and have built a picture of confidence that the software had high quality due to your good testing…
So why not focus on that with your measurements?

Doing this will encourage the testers to focus more on doing great testing, rather than be fixated on bug count and preying that the developers make mistakes so the testers don’t look “bad” by not finding bugs.

6 Likes

If you prefer a ‘prevention rather than cure’ approach then defect metrics are meaningless. If you clarified ambiguous requirements with business representatives and developers before they were written as bugs in code, then they don’t even get recorded. And the software quality is much greater. Of course developers can make coding mistakes, (its one of the reasons why we test), but my experience is that many more bugs are caused by misunderstandings than poorly constructed software.

For me, these metric are a thing of the past and a measure used largely by testing services or consultancies by way of measuring something tangible to waive in front of the client. They do not reflect quality or happiness of the client with the final outcome.

Hi Rogel,

Thanks for posting to the forum. The formulas are similar to things I’ve seen other companies use to measure defects. One measurement I like using, if a company is particularly keen on getting defect metrics, is a “How much money did it cost to fix pre-production vs post-production” metric. Showing defects based on dollar amounts and the approximate time it took to resolve the defect, if it was resolved at all, are really good things to show. This allows numbers to be framed in something everyone can understand and also moves the conversation away from defect numbers, which can be a bit ambiguous depending on the defects being reported, or the defects the company really cares about.

The narrative also moves to: How can we work to make defects cheaper to fix? This question can allow for conversations around, staffing, tooling, and automation to take place.

Another thing you can do is give them the numbers but try to isolate the concern the numbers are trying to address. Do folks need better equipment? Better automation? Better pipelines? Are customer reports spiking for some reason? Can you do usability testing and get more customer feedback? - Those are some ideas/questions I’d ask besides only handing over data which might not give a clear picture of the issue.

1 Like

Welcome.
I don’t mean to be unhelpful and hope my suggestions are more helpful than the advice you were originally given.
Quality metrics, as the previous replies mention, should never include defects counting, unless you want to challenge the inventiveness of your teams. Your biggest, potentially, problem is that whoever initially advised you to count defects thinks that they know about quality and doesn’t. Perhaps you could help them learn by suggesting they read ‘Perfect Software and other illusions’ and/or Quality is free’.

@kinofrost raises some good points but I think a lot of comes from the assumption that having a metric means that your goal should be getting that metric to 100%. It shouldn’t be.

Let’s say you do have a way of measuring a defect detection rate (after taking into account some of the caveats and warnings from previous posts), and you’re consistently around 80%, then you should only endeavour to move that number up if your customer is unhappy with the number of defects they’re encountering. A rate of 10% might be ok if you’re catching the right things! If your customers or users keeps saying “why is this software so buggy!?” then yeah aiming to raise that number might be a way of measuring progress towards resolving that, but it would just be a proxy for customer sentiment that has to be checked against reality. Once defects in production are at an acceptable level, then it can serve as an early warning system if things start to lag again.

There’s little point in gaming the numbers if you don’t make the number the goal.

I disagree with that. It’s like saying if you prefer a prevention rather than cure approach, then measuring disease rates is meaningless. Of course it’s not; how do you know whether your prevention efforts are working?

I think I’d have to disagree here too, in part. You’d likely be doing a poor job if you said “yeah ok I didn’t find any bugs and the customers have been complaining every day since we pushed, but look at all this information I compiled!” “Information” is too vague for me, it has to be useful information. Like Maslow’s hierarchy of needs, we all want to be “self-actualized” testers, but we can only do that if we are addressing basic survival first. Maybe I would say “the higher purpose of testing isn’t to find bugs, but it is a prerequisite.”

On the other hand, I do like the idea of measuring the quality of your testing, but that is awfully similar to measuring the quality delivered to customers. I wonder if there’s anything you could label under the former that isn’t also the latter.

1 Like

Surely you measure the quality of the product, not the testing effort? You could have 0.01% defect slippage rate but the one that got through was a howler (think o-rings on Challenger for example).

I see your point regarding measuring disease/bugs but all it tells you is that the measure has it improved/stayed the same/reduced. How do you correlate that change to anything specifically preventative?

I stopped producing these sorts of metrics for several clients and engaged them in the build/Test process much more. I believe they felt much more informed of the product quality than reading metrics.

But doesn’t that reflect the testing effort? What was missing from the testing effort that let that get through?

I agree that defect slippage rate (especially one that doesn’t take into account the different between typos and explosions) does not stand alone as a good measure of “quality of testing”. I think that was @danashby’s point when saying finding zero bugs looks like failure if all you care about is counting defects found.

That makes sense. While I like metrics and find a lot of value in them, they get less useful (and more prone to misuse) the further away from the development team they get.

Well said @kinofrost! I was floored about a week back when a tech lead said we don’t have enough defects. I knew better than to ask what “enough” meant. The question betrays some opportunities for learning of advancements in testing over the last few years. In my opinion, insights like @kinofrost’s lead to the questioning of old practices and their benefit, and open the door to people working with people to deliver cool stuff.

Indeed, remove the shackles of process, the banality of scripted execution, the stigma of defects and let your testers rise with the information they learn to inform your teams! Go forth together in product construction and evaluation and deliver quality products at a good pace! In your collaboration, ye shall discover the nirvana of team contributions and accomplishments.

1 Like

I’m 100% on board with collecting useful information in a reliable way as an informed human trained in metrology, statistics and the scientific method. Anyone who’s been involved with a double blind medical trial knows how incredibly difficult and expensive it is to collect even the simplest data without deliberately or accidentally skewing them or misinterpreting or over-stretching the results. We have to collect accurate data in a fair way and interpret them with cautious humility. Well-informed people who do it for a living struggle with it, especially if the Cochrane group and CEBM are to be believed. I understand that comparing bug rates and medical trials may seem a little dramatic, but sometimes these numbers are used to affect peoples lives and useless numbers can be counterproductive.

A metric is fine, if it means something and is accurate and has known causes and we understand the limitations (such as error margins) and we use it responsibly. Without that we could see the number increase and misattribute it to our actions when it was caused by something else, and then we’re just a pigeon in a Skinner box twirling in a circle for the food pellet. It’s very simple to invent a number and then give it some power or introduce it into our process without due care - but that is superstition, not statistics. Whatever the number is, or if it goes up or goes down, all we have is attributable correlation. I think if that spurs us onto useful action we might defend the metric, but only in the same way that we might defend a doctor lying to a patient to get them to exercise. We must remain vigilant to the idea that the number actually represents the name we’ve given to it, just like we treat the name of an automated check - as a somewhat helpful lie. “Test_Login_Failure” - yeah, maybe.

I don’t know how you’d invent a defect detection rate that is sensible. We must discard the idea that bugs are bug reports and that either are fungible. When we know the variance in these artifacts that represent subjective ideas we can no longer count them with a straight face. In the imaginary world where defect detection rate is possible to measure objectively and accurately enough for real world use then it would seem to be, essentially (and as permissively as I can describe), a measure of how much perceived badness we find per unit time. So it’s still dependant on how much time it takes to look, the size of the team, the nature of their process, if they’re working on something new or otherwise of high risk (third party systems, for example), the testability of the product, the nature of the design, the skills and teamwork and knowledge of the programmers, what’s considered a bug, if bugs get fixed, how tired people are… it’s not really pointing to anything specific and it relies on many things that are changing. In your example the indicator of a buggy product is an unhappy customer - so the number didn’t indicate a problem, it was communication from the client. Working on getting a less-buggy product is a matter of tackling the problems the number might represent, and success can be the same as the indicator: a less-buggy product in the eyes of the client. The number didn’t help me to do any of that. In fact, I’d say that if we find no defects and the client is complaining we probably have a communication issue with the client that goes a lot deeper than testers finding bugs.

Moreover, the closer the team works, and the tighter the feedback loop, and the faster things get done and investigated and repaired, the less they tend to be written down and the harder it is to collect this information. The ideal situation is to find bugs before they’re in trunk, and when a tester pairs with a programmer and the problems are resolved quickly without reports they become invisible.

Also I’d say that finding bugs is discovering information of a specific sort. I’m going to guess that Dan means “The purpose of testing isn’t just to find bugs. It’s to discover information.” I’d say that finding fault, in testing, is a byproduct of investigation. As we close the epistemic risk gap we also happen to discover things that may be problems to someone that matters that may need attention.

Again, I’m happy with a good metric that is reliable enough for real-world use (not just perceived to be) and that has enough value to offset its cost. It’s a rare find, in my experience.

1 Like

In our imaginary world, I don’t see anything wrong with it being dependent on many things. That just means many ways you might endeavour to change it, and many leads to look more carefully at what’s driving it.

The trouble with success being the same as the indicator — although ideally true — is that it depends on the assumption that the client’s opinion isn’t also subjective. You might deliver a less buggy product but they’re still just as unhappy with it because they’re now unhappy with other things (the colour scheme, a new competitor with better features, that jerk you hired in sales, whatever), or because they’ve adjusted their own expectations. It’s the same as feeling just as poor no matter how much you make because your lifestyle inflates to match your salary, or thinking that crime is so terrible now even though it’s actually gone down relative to previous decades. Qualitative and quantitative metrics both have caveats, but they can help keep each other grounded.

The other aspect I’ll point out is that it’s ok for metrics to have limited lifespan. I can definitely imagine being in an environment where some flavour of defect detection rate is easily measurable and the effort to decrease it ends up making it impossible to measure any more. (Yet another reason not to let metrics like these float too high up the management chain, lest they get addicted to it.)

1 Like