How should we measure Test Coverage?

I’m a stats nut. I love putting objective measures together that engineers buy into that show we’re improving or highlighting areas we could be better.
However, there is a stat that has eluded me in my career…Regression test Coverage. The restriction is always, the data I can get my hands on. Most of the time it ends up being more quantitive than qualitative. So, how do you measure your regression test coverage?

3 Likes

When its the more generic testing coverage rather than the very specific regression risk coverage I like to talk about risk coverage.

Almost traffic light level can help, taxonomy of risks combined with lists of features, views and user flows often works, The light colour is often a guided qualitative opinion.

Regression risk sort of follows that but its usually scripted and automated so more data of what is covered and and extra layer of scenario’s can make sense.

In reality we normally decide up front what we want our regression coverage to be and then measure against desired coverage. And yes doing this you can hit 100% of agreed desired coverage but with moving goal post accepted as and when you decide to expand/reduce desired coverage.

The other idea of covering regression risk in its entirety and measuring coverage as part of that entirety, I’ve never been able to establish what that entirety is in the first place so that idea I’ve never found of too much of value.

Flagging that general coverage targets should also consider how and where in the stack that coverage is done. I’ve seen teams boasting about their UI layer high coverage whilst blind to the massive waste, inefficiency, maintenance and consideration that the risk could be covered much better in many ways elsewhere in the stack.

4 Likes

The way I know it, coverage is the number of requirements covered by tests. Should you have 100% test coverage you covered all your requirements with tests (no matter if this is really useful). Also the number of functions (in lieu of requirements) covered by unit tests.
What I’ve never heard of is Regression Test Coverage. But coming from the the exeamples above you would need a requirement in some form, eg. number of tests rated critical, failed or simply important to the client vs the number of tests that you actually ran?

2 Likes

This is a complicated topic. I generally talk about coverage as two things: code vs test coverage.

Most tools measure code coverage. What lines of code, branches were hit during execution time.

Test coverage is what percent of your requirements + parameterized scenarios. As test scope (what implementation code is being called) increases, test coverage becomes harder to achieve.

I think this is counter to how a lot of folks talk about coverage as a lot of devs want to write higher level tests thinking you’re getting higher coverage. All those possible logic branches patches must be tested though! Not writing those tests is missing test coverage.

Another challenge for coverage is that test cases don’t map directly to lines and branches of code. Encoding logic in to maps/dictionaries are very common pattern for interpreted languages. Code coverage is of no use here.

Of course code coverage says absolutely nothing about the quality of the test itself other than what implementation code it called. Tests with no assertions will produce the same code coverage as tests with assertions. Tests can have bad assertions. Tests can be wrong because our knowledge of the desired behavior is wrong.

There are tools to help here. One of them is mutation testing but even that’s not fool proof. I wrote about an example where it missed some test coverage but reported 100% code coverage with all mutants “killed”: @testingrequired.com on Bluesky

This is also why you should have tests with different scopes. Unit, integration, end to end is just a spectrum of test scopes. Bigger test scopes identify when issues happen. Smaller test scopes identify where issues are happening. Tests are like slices of swiss cheese, you have to layer them together to fill the holes. Every sandwich will be different.

I’ll finish with this thought on code coverage. It might seem counter intuitive but as soon as you define a minimum code coverage threshold, code quality will start to drop. This requirement changed how you implement code. As soon as there’s any kind of gamification, you’re writing tests to pass coverage requirements. Even if you’re not aware of it. It’s like how people will walk in big circles without a map or compass.

Code and test coverage are topics very near to my heart. I didn’t mean to drop a short talk on it :sweat_smile::face_with_spiral_eyes:I’d love to hear others thoughts please!

3 Likes

@ghawkes First its important to know What would 100% of a Test Coverage looks like…

Shallow testing aims to find every easy bug, providing coverage in specific product areas. It’s quick, inexpensive, and helps developers make rapid progress by finding bugs early.

Deep testing maximizes the chance of finding every elusive bug that matters. It is more expensive and time-consuming than shallow testing. Deep testing is valuable when the effects of a feature aren’t well known, when there is substantial risk, or when shallow testing isn’t enough

The decision to use shallow or deep testing depends on the context, and the risk gap…

Its a tradeoff.

You can read more about it here :

3 Likes

Loved the short talk :grin:, I think the “spectrum of test scopes” is definitely a concept I’ll pursue with the wider team a little deeper :metal:

1 Like

Thanks for all your responses. It’s all feeding my next steps to see what I can do to get a better read on the situation. My specific problem right now may be “regression testing” but I really appreciate your response casting a wider net. :+1:

2 Likes

I think this is a good question but I do have a question for you @ghawkes can you define what you mean by regression ?

Because Unit Tests and Integration Tests are a form of regression and we can use tools like Sonarqube for example.

However, if you’re wanting to E2E coverage reports, that can be tough to quantify it like we’ve done “70%” regression coverage. I could see myself pushing back on it for a couple reasons. First of all, What value is this going to bring us. If we have 100% coverage, what does that actually mean? What actionable item can come from it being at 90%, 50%, how do we fix it?

I think a way to approach this is to let the business partner decide what needs to be covered. Ask them what the critical user journeys(CUJs) are, what’s important to be covered or not? They hopefully know their product/business well enough to be able to give you an list of things that should be covered. Then from that list of CUJs we can now say “we’re xx% covered”.

2 Likes

My view is that it is always hard to get simple measurements for complexity. And when you do set up simple measurements, there is a big risk of those measurements having negative effects instead, because people tend to focus on what they are being measured on.

If you are measured on the number of test cases you execute, then you can make sure that you only execute short/small test cases so that you can execute as many as possible, or even restructure your test cases so you break up big test cases into many small ones. If you are measured on bugs reported, then you can start reporting even the smallest “bugs” that you know will not add any value, and so on.

That is why I think that while you can have many quantitative measures, in the end, you need to have a final qualitative assessment of the coverage. I wrote an article about it a while back:

Best regards,

Johan

1 Like