I want to implement some test metrics to measure and monitor the quality of the system and productivity. But, I haven’t done this before. Could anyone tell me what test metrics do your company use and how do you do it please?
I’m about to start looking at this as well Bonnie. My starting point is going to be to measure defects found against releases as a measure of our test strategy. I plan to find a way to determine if we should have captured those defects in our testing (e.g. would it have costed too much etc).
We track issues found in production and categorized them by severity. They are also categorized by the group that owns and maintains the application. In this manner, individual groups review the information and decide direction for their teams.
I think this, more than anything, demonstrates that quality is a team sport and NOT the responsibility of any one individual or team. Testers, engage your teams to write clear and testable acceptance criteria, advocate for testability, collaborate on creating and evaluating great products! With a focus like that, perhaps ground level metrics are not needed.
When you measure defects found against releases, do you measure the introduced bugs only or all bugs (existing/known bugs, design bugs and introduced bugs)?
In my case, I can see the number of bugs raised against each release has been dropped significantly. There are still bugs coming in after each release. Most of them are existing/known and design bugs.
I categories bugs by grouping them per module and severity. But, I don’t know how to measure them. I can see the number of bugs raised has been dropped on each release.
I understand quality is a team sport. Unfortunately, I am the only tester in my company. And, the developers are doing any testing. I constantly put new testing strategy in and try to improve the product quality. Therefore, I want to create some metrics and monitor that strategy is working or not.
Sorry for the delay in responding! Great questions!
I categories bugs by grouping them per module and severity.
If I understand correctly, bugs are groups by module and sorted by severity. Perhaps like this:
Most Severe Severe Least Severe
Module A 4 5 2
Module B 2 8 0
Module C 1 4 4
If this is correct, it seems like a first draft of your measurement. I would conclude that the team on Module A may opportunities for improvement. An analysis of where and how the bugs occur could be good feedback. Have you taken the BBST Course called Bug Advocacy? It was my favorite of the four and very, very immediately applicable.
The team on Module B is doing better and may have lessons for the Module A team. Module C has many little bugs which may mean little details are missed during product definition, construction, or evaluation.
I reviewed the document in your post. Frankly, I’ve never been a big believer in many of those metrics. Many can be gamed. You might search the Club for other posts on metrics. I recall a healthy discussion a few months back.
Being the only tester is certainly a challenge and you are asking the right questions. Frequent change may not present a consistent message from you to your team members. I wonder if there might be a list of items you would like to see improve. From that list, prioritize them and write down your suggestions. Get feedback from the Club or from your peers, and then pitch your improvements to your manager or managers. This is easier to write than to execute and it may take some time. Perhaps you could, over time, maintain a table as shown above to help determine effectiveness.
I have found that testing and test leadership requires a lot of experiments. Starting somewhere, even a small start, is better than not starting at all. Choose something small, get feedback on it, and build on small successes.
Why do you want test metrics? This is the most important question to answer before you start.
Metrics are a complicated and sometimes dangerous game. Quality is subjective - my quality is not your quality. In productivity sometimes value does not match “number of artifacts made”. Systems can be gamed by teams. Some metrics reflect the value of teams - and when the metrics are wrong or misleading you are attributing human value to some number that someone made up.
The suggestions you have so far are perfectly sound, if approached reasonably. Knowing what sort of bugs are being reported by customers, or found after release, categorised by some measure of severity or what part of the product, is a good thing to know. You want to know what your customers value, what parts of your product are considered buggy, what testing you should be doing that you’re not doing. This can go wrong in several ways, such “severity” being gamified - as a customer reporting a bug you may call everything severity 1 in the hopes that it gets looked at over other people’s reports. You might find one problem and over-react, introducing expensive processes to prevent it happening again when it wasn’t likely in the first place (sometimes bad things just happen and you must remain cost-effective).
When facts really matter we use secure, multiple, randomised, double-blind, peer-reviewed, controlled trials subject to statistical proof and of sufficient sample size. That’s the effort we go to to ensure we don’t game the system or affect the outcome or find some way to lie to ourselves. I’m not saying “don’t measure”, but I am saying “measure with humility and humanity”. You are measuring numbers, not human worth. You are finding suggestions for further investigations, not crime scene evidence. You are measuring inexpertly, so leave room for the error bars.
So know why you want to measure, and answer that question first. Maybe you don’t need hard metrics to achieve your true goal. You’ll be much happier if you don’t, metrics, and especially communication of their findings, are really hard work.
The information reminded me of that I found in the book “How to Measure Anything”. The author (Hubbard) has the same cautions and some cool suggestions for putting some objectivity in measurements.
When I started my new position as Test Manager, I was faced with the same question: What do I want know and about what items or processes? So what to measure?
I started with
how many defects with which component/module do exist in a certain release? (quality of the system)
Tools like Jira allow to report how many defects were opened and how many were fixed or closed in certain period of time. - In my opinion, there should be regular defect fixes (not only implementation of new fancy features) (quality of the system?)
How many Requirements and how many Test Cases exist? - Just to get a feeling for it, not as a regular metric.
It could be nice to know
How many Test Cases were executed and how many defects (of which severity) did they found? - Did we use the right TCs? (quality of the system & productivity)
How many TC exist for which test level (unit test, component test, sxstem test; testing pyramide)? (productivity and quality of test strategy)
How many automated TC run regularly? (productivity) How many of them break regularly and have to be fixed? (productivity)
Ratio of manual TC to automated TC (maybe per test level) (productivity)
How many TC per Req?
How long does it take to execute a TC? - We have TCs taking one minute (if you already had installed the SUT). Other ones take 20 mins or more. Useful information for test effort estimations.
There are a lot of nice metrics. And as devtotest and kinofrost mentioned: the metrics to be used depend on the questions you want to be answered.
It also depends on the available information. Maybe you need to enhance documentation first.
Start small and add more metrics, if needful and reasonable.
Bonnie: Would like to read about your experiences so far.