Mutation Testing: Getting Started And Tools To Use


(Heather) #1

I’ve started to see mutation testing be mentioned more and more in the meetup and slack spaces. It’s not something I’ve ever had the chance to try myself yet so I have a few questions about it.

Q1. How did you get started with mutation testing? Did a developer help you or did you use some online resources?

Q2. Is it something you have kept doing? i.e. are you seeing the benefits of doing it and if so, would you be able to explain them?

Q3. When it gets mentioned I see things like Stryker and pitest mentioned. What tools have you found useful in your own situation for doing this? Are there any you’d avoid? If so, why?


(Gerald) #2

Although it’s often called Mutation Testing it’s not actually Testing, it’s more an analysis – therefore it’s sometimes also referred to as “Mutation Analysis” – because it analyzes your Unit Test Suite, if it’s capable of finding bugs by injecting modifications based on a ruleset (mutation operators) into your product code and run your unit test suite against it to see if it’s capable of finding this modification (= killing the mutant).
It’s completely automated, so you don’t have to do anything for this, but you need to interpret the results and eventually derive actions from them.

There are two statistics,

The first sets the goal for the Unit Test foundation of every test automation pyramid. And the second validates, that mutation testing is a valid way of verifying how close we are to that goal (if its capable of killing a mutant, it’s also capable of killing a bug coupled to that mutant).

Q1

If you’re on Java, pitest.org is certainly the way to go, mature tooling, well documented, active community. Applying it to your project is pretty straight forward:

And you’ll get a nice report, which mutations your unit test suite was not able to kill. Further you get a “Mutation Score”, which is a percentage of how many mutants got killed. This is a far better indicator of your test coverage than line or branch coverage. If you achieve a high 90%, you’re pretty close to the 77% goal mentioned above.

In case your project is big (i.e. > 20kloc) you should consider limiting the scope (i.e. by applying a package filter) because Mutation Testing requires much processing power which increases exponentially with the size of the code base.

If you’re on JavaScript or TypeScript, https://github.com/stryker-mutator/stryker-mutator.github.io is the tool, though I don’t have much practical experience with it (I don’t do much JS dev)

Q2

Yes, once I started with mutation testing, I use it in all my (Java) projects for some reasons.

  • It’s conducting an automated review of my unit test. No human could provide me the same information in the same amount of time.
  • Because it’s automated, it’s very easy to apply (at least for Java, don’t know about the other languages), apart from a strong CPU it doesn’t need much effort from my side.
  • It totally changed the way I create my Unit Test Suites, how I write those tests. I do it in a much more systematic way than I did before.
  • It effectively helps me close the gaps in my Unit Test Suites, I use Line/Branch Coverage only as weak indicators, once I’m at ~70-80% Line or Branch Coverage, I start mutation testing, sometimes even a bit earlier, especially if I have some unit tests for more complicated code.
  • In some cases it actually found a gap in my unit test suite, that didn’t find a bug which was actually there. Although I had Line/Branch Coverage on that part.
  • When I’m on a tester role, it gives much better confidence on the quality of the unit test suite the programmers wrote, so I can concentrate on more interesting matters to test
  • I don’t do code reviews for unit tests anymore. If I do, I only look on style aspects, but not the functionality (brains are not made to run the code, we computers for that)

Q3

see Q1, stryker and pitest are the tools do conduct mutation testing in JavaScript respectively Java.

My (incomplete) List of Mutation Testing tools

Java

  • Pitest http://pitest.org/ <-- use this one, best you can get for free, also for Scala & Kotlin

For Java, I would avoid the following one, because there are more from academic research and most of them are no longer maintained

  • muJava
  • Jester
  • Judy
  • Javalanche
  • Jumble
  • Major

JavaScript

Ruby

.Net/Mono/C#

PHP

Haskel

(and sorry for the link obfuscations, as newbie I’m only allowed to add 2 links)


(Heather) #3

Awesome thank you so much! I edited it to remove the link obfuscation, really appreciate this :grinning:


(Gregory Paciga) #4

Thanks @gmuecke, I’m definitely putting Stryker on my todo list for this month as things slow down for the holidays and I have time to play around.

It helps that there’s lots of fun imagery in killing mutants and finding survivors.


(Dave) #5

Awesome post and source of info! Thanks


(Del) #6

I got involved in mutation testing about 18 months ago when I was asked to try and convince a group of developers that unit testing was actually a good thing (that’s another story). The language in question here was python 2.7.x, and the most popular MT tool (cosmic-ray) only catered for python 3.x, so I wrote my own.

It’s a basic, command line driven tool that uses selected mutations (as opposed to doing a ‘full whammy’) to cut down on time and for the most part, worked fairly well in changing their views on unit testing in general.

You can find it here - https://github.com/deefex/mutate

Interestingly, while I worked at NCR, I was sitting beside Henry Coles, the author of pitest. Good guy, very smart…


(Douglas) #7

I’m using pitest on a large Java project at work. I find its output interesting but ultimately difficult to structure any action around.

The reason I think is that I have not yet seen how it fits nicely into a pipeline. It takes too long to run (4-5 hours in our case) to run on every check-in, meaning the only way to consume its output is for interested people to go and examine it periodically.

My ideal use case for it would be if mutation testing could be run only on the code and tests affected in each commit, providing direct feedback about how effectively that code has been tested. However the tooling for that does not exist, and it isn’t hugely up on my list of potential personal projects.