Code coverage. Is it worth measuring at all?

I’ve done all sorts of test automation - integration, unit and end to end and have at one point or another have generated code coverage reports for all of them.

With unit tests it is often quite easy and one switch can give you detailed reports and % number. This often functions as some sort of target. You’re not supposed to hit 100%. Youre supposed to hit 80%. Why 80? Finger in the air.

On teams where we have done this somebody often had the bright idea of putting in a quality gate and if your pull request slid below the magic “80%” mark it would fail you until you checked in a test. This is not uncommon, I think.

This had a curious effect. The devs who had just completed their task would start writing two new types of test:

  • Does the bare minimum to run the new code without asserting anything.

  • Asserts that the result of the calculation is 13443234.821. Why 13443234.821? It’s the number that came out of the code when it was run with a bunch of arbitrary numbers. Is it correct? No clue. Was the code coverage threshold met? Absolutely. Was it better than no test at all? Eh. probably not.

The % number also didnt really tell us anything about how fragile the code base was. Worse, if used as a target its value as a measurement went even lower - thanks to goodharts law and these “mimetic” tests.

In another situation I was writing and running a mountain of end to end tests on an app that was riddled with technical debt. Somebody had the idea to generate code coverage reports on it.

I thought that this was not a bad idea because it could tell us which areas were missing test coverage and hence where the bugs might be. We could then use it to write test scenarios.

The results were interesting. It told us that about 50% of the code base was not covered. When I looked at the incoming bugs and the bugs that were recently fixed, though, and which areas of the code base they were in, I discovered something odd. About 90% were in the half of the code that was already covered by tests.

These were, incidentally, tests that were doing really well. They caught tons of bugs.

This was, as far as I could tell, thanks to some pareto rule of code importance. The criticality of the code followed a pareto rule. It wasnt evently distributed. 10% of the code base needed to have an incredibly high density of tests to prevent bugs slipping through - 100% coverage is not enough. Meanwhile 50% of the code base apparently didnt even necessarily need one test.

Back to code coverage.

What do we measure it for?

Is it to drive good practices? Because the behaviour Ive seen it drive is all bad.

Is it to help find bugs? Because from what I can see code coverage reports can give a very misleading view into where they are.

Or (and I really hope this isnt the answer) is it so we can give upper management a number they can put in a spreadsheet to measure our “performance”?

2 Likes

We experimented with something else that did well for a while. 100% unit test consideration.

Code coverage reports always read as 100%, because everything is either tested or explicitly marked as intentionally not tested, with the latter class being regularly scrutinised at PR reviews and other times.

It fell by the wayside when “important project with daft deadlines” was happy to accrue tech debt to achieve said deadline, but it was a different approach that I’d try again.

Not a direct answer to your question, but perhaps a different lens?

Your description of unit test coverage vs bugs resonates highly with me. We naturally prioritise exploratory testing in three areas:

  • that which can never fail
  • that which fails often
  • that which failed recently

We don’t often translate that back up the chain into unit test rigour though :thinking:

1 Like

I always say code coverage is bullsh*t :stuck_out_tongue:

Basically how high or low the code coverage is, doesn’t matter to me what I want to see if the Mutation Coverage.

People know you can just write a simple test that does nothing to increase your code coverage. But when you add mutation testing towards your project… Oboy!

Mutation testing is basically a way to test if you have good unit tests. Meaning if you have good unit tests you’ll also have a decent code coverage bar.

What is Mutation Testing? | Definition from TechTarget.

Example of Mutation testing:

I measure mutation testing because I want my developers to write decent unit tests.

Why do we measure code coverage? I don’t know :stuck_out_tongue:

4 Likes

With code coverage you find untested code. Nothing more, nothing less.
It is not a judgement of quality of the coverage.

4 Likes

It depends a lot on how much spare resource you have, code coverage is not evil, it’s just a maintenance drain that smaller teams cannot afford to have - by smaller teams, I mean 5-8 devs per team or when there are fewer than 8-12 teams working on one product. It has a lot to do with maturity, and let’s face it, todays maturity of SDLC is very different to what maturity looked like 20 years ago when code coverage was more desirable. I mean what’s the point of code coverage if you are re-writing your Unity based game in some other framework now?

Code coverage as a way to locate parts of the system that may need more tests: perfectly acceptable. Code coverage as a quality gate: bad move.

You can get 100% code coverage by asserting that the unit does what it does, without testing the code logic. It won’t test any of the paths through the software - and those are infinite.

As an analogy, consider a small town. You can map every street. You can plot a path from any one address to any other address. That doesn’t mean you know every journey that can be taken within the town: you don’t know how many people are going to circle around a given block while they wait for someone they’re picking up. Or how many kids are going to cruise part of the town for kicks (okay, less common in the days of widespread internet, but it used to be a prime form of entertainment in small country towns). Or who will stop at half a dozen places in a trip.

Most software is like that. You can move back and forward any number of times. Open and close a dialog any number of times. Use keyboard shortcuts or not use them, or a mix. Code coverage, with tests that actually test is an indicator of software modules that probably won’t cause problems when used with other modules, not an indicator that it’s all working.

Like any other metric, it can be useful in the right context to give part of the picture. Relying solely on any metric is a bad idea.

1 Like

No, that’s exactly what I was looking for - stories about how code coverage reports and metrics were used. I’m skeptical of this approach because but it’s definitely an interesting example and I’d like to hear how it pans out if you try it again.

It depends a lot on how much spare resource you have, code coverage is not evil.

I don’t think it is evil but I’m surprised at just how popular and yet simultaneously useless it has ended up being for driving any kind of decision in my experience.

The trigger for me making this post was when I realized that I was setting it up for the 20th time and I wasn’t doing it because I thought it would help. It was purely because I figured that people expected or wanted to see it.

I’m still curious to hear stories about good decisions people have made with either code coverage reports or metrics and ways that they have worked it into their process which resulted in a good outcome. @dancaseley cited one example of a process which uses it which might yield some good outcomes but I’d count it as a potential for now.

1 Like

if you want a discussion to be active and lively, sometimes all you have to do is start with a wrong answer. I like that thinking, I’ve only ever seen code coverage help when it’s used as a help, not as a metric. It’s far too much work to maintain it properly long term.

This topic comes up once in a while. I think I saw this being discussed on Reddit recently.
I think people being people, some will always game the system making the system ineffective.

Back to the question, it is worth measuring! The key is what the organization do with the information.

2 Likes

Based on my personal experience, it’s essential to incorporate code coverage analysis as a part of your testing strategy. Code coverage provides a valuable measure of the effectiveness of your testing efforts. When working with a regression test suite, understanding the depth and breadth of your testing becomes crucial, and this is where test coverage analysis comes into play. While it may not always guarantee precise results, it does provide insights of application code ,your test cases was covered.

For instance, when dealing with API microservices and aiming to create automated regression tests, it’s advisable to instrument your API services. By doing so, you can execute your automation scripts and subsequently analyze the code coverage they achieve.

I thought I might have something to add on this topic, but everyone seems to have a very similar stance on this issue :grin:

Whilst it isn’t pointless, I view it as one of the less useful metrics (and I’m not a fan of metrics).

I think it is more useful if you know the areas of code that aren’t covered and their riskiness. For example if one project had 90% coverage but the brittle code wasn’t touched and another had 40% focused on high risk areas, I’d be more confident dealing with the latter.

A key thing for me is why do the numbers matter? If it’s part of an exercise where you are looking to solve tech debt, great. If you have an arbitrary target, I don’t see merit.

2 Likes

Well, the tools don’t just spit out numbers they can also spit out reports highlighting lines of code which are under test and lines which aren’t.

Although I personally don’t value those either after I saw witnessed bug density clustering around code already under test according to a kind of “bug pareto rule”.

I prefer to write tests that replicate 1) customer/exploratory tester reported bugs, 2) exercising features under development (e.g. TDD), 3) “surprise features” and simply ignore coverage reports entirely.

Others might have different experiences though.

Of course 100% test coverage doesn’t tell you something is tested well, but 0% test coverage does tell you that it isn’t. Coverage is a necessary condition, but not sufficient. That doesn’t mean its useless.

That said, this is a good argument for taking coverage more seriously than most do: https://www.youtube.com/watch?v=hJ4XHQHhmw8&list=PLZ66c9_z3umMtAboEKsHXQWB0YMJje7Tl&index=6

This is my answer too! We experimented with code coverage (and still do) but after utilizing mutation testing I found unit tests are written to pass and not actually test anything.

Mock.Verify => ThisMethod.IsCalled(Once) … cool but does the method do what we want it to do?

Using that data we were able to start the initiative of implementing Mutations across the board. This will give us so much better data to push for a quality culture change. So our idea is to have 80% code coverage and ‘80’ Mutation Score. That is our ideal. It’s been a great learning tool to see how and where we can improve our development.

2 Likes

Welcome back @gpaciga :slight_smile:
A really good clip there. that explains what a lot of us are thinking. It also explains why most of us don’t think unit testing is really that useful at the same time. For me, the fact that where I work, the big bugs that cost most money, something Isaac speaks about in his clip, cost is biggest across components, interface boundaries. And so it’s often that tools that let us cover the interfaces means handling disparate software languages all at once. Which makes the cost high, to get decent coverage, but unless we can use the right tools to help us find dark corners in our software. But there are other ways to find dark corners, and it’s probably too easy to use those other cheaper to run tools.

I still think a lot of teams are copping out, but guilt over a lack of full code coverage needs to be taken with a pinch of salt, just think of all the devs who used Unity to build their game, and are now much more focused on swapping the engine than fixing code coverage. So there is that, emergencies… or is this really just telling us that we need to design our code for testability and portability. It’s a seatbelt, augment your brain.

I also find boundaries between modules, code bases, teams, etc. to be the biggest source of bugs.

On the project where we had 50% code coverage and all of the bugs in the tested code I also found that there was a weird phenomenon where the majority of bugs would cluster in boundaries where the testing framework couldn’t mock properly until we amended it to be able to do so and then it’s like the bugs moved on to colonize some other boundary.

For example:

  1. We couldn’t test email sending. Tons of bugs there. Then we could and those bugs dried up.
  2. There was a particular API that we called that was highly specific and we couldn’t test that except with the staging server. Then after building a mock of that API and building a few scenarios those bugs dried up.
  3. The testing framework ran all tests with firefox. Guess what?

We never did get it running with other browsers, but I’ll bet if we did we could have squashed all those bugs too.

And so on…