How do you measure the quality of automated tests?

I saw @louisegibbs ask this on LinkedIn, and was curious about it too. :slight_smile:

Question for people who develop Automated Tests
How do you measure the quality of the tests?

Or, another way of putting it, what makes a good automated test?

5 Likes

what makes a good automated test?

It’s really mostly about what the tester is trying to achieve. If you ask “what makes a good microscope”, you’d ask people that use them and get different answers. But they’d all be trying to use that tool to serve their purposes. Test automation is two words that are both lies, it’s just a fact checking tool, and you want it to check useful facts that merit the cost of it existing.

A good automated test serves the test strategy. Otherwise it’s a very limited check with a cost and unknown or little reason to be.

4 Likes

Its generally good if you define goals for automation and assess against those goals, however at times I’d like to see it have more relation to what makes a good test.

I think I’m loosely remembering some of Cem Kaner’s thoughts here but it could be somewhere else.

A good test reveals something new of value about the product.
A good test has a high likelihood of discovering a specific type of issue should an issue of that type issue exist.

With automation they tend to be more focused on known things so often the something new of value element of good tends to get diminished.

The second part say “a high likelihood of discovering a specific regression issue should that specific regression issue exist” seems valid.

So is it actually “catching things of value that would not be caught otherwise balanced with is it missing things of value that are later revealed?” *This would my abbreviated short measure.

Then you have your goals, we could list say 10 goals and measure against those. Examples rapid feedback loops, health checks for deeper testing, big data, broaden coverage, machine strengths, combinatorics, device/os coverage etc etc.

The third part is the more technical element, is your automation code of high quality, structure, readable, optimized, reusable, this is more aligned with product code measures of quality.

I’d also throw in the stack optimised consideration, perhaps this should even be the first question when considering the quality of your automation.

3 Likes

This is really difficult, especially as someone who love their measures.

I kind of see the key objective of automated testing as “proving something works”.

So you do have to get into “how many issues does it find per execution” to get some kind of defect hit rate. I inherited an entire framework with 300 tests that found 2 bugs in 3 years, but production issues were still coming in so it was clear the framework was focusing on the wrong areas. Based on that measure we started again with a clean slate because we were losing nothing. 10 good automated tests in the right areas is better than 300 “safe” tests.

I would also look at the success rate of the test execution itself, how often does it complete execution without needing maintenance? With good automated tests there is a balance needing to be struck between successful execution and maintenance.

I think the fundamental problem with answering the question is that everything I’ve said is a reactive measure, but maybe that is the only way of knowing if your automated tests are. The only proactive ones are around good code management practice and peer review, so maybe all those combined could give an insight.

Bottom line, is there is no silver bullet answer - thats why we’re still here :flexed_biceps:

3 Likes

A factor to measure is the length of the test process. The automated test process needs to be short. If the automated test process takes too long, developers will complain, may check in their code less often and may even ignore test results.

2 Likes

I think a good automation test needs to be reliable and valuable.

If the tests are so flaky that you just keep clicking run until it passes, it’s not either
If the failure of tests sends up an immediate flag that something is wrong with the code base, it’s both.

This becomes most important when picking which UI tests to automate.
Automating that a modal has buttons or that a static field exists might be reliable, but isn’t valuable.
Similarly, if there is a part of the app that isn’t really used or is in the process of being rewritten, spending time writing automation for the old code is a waste of time.

When selecting UI tests for automation, I always ask if it’s something that we would test with every release from that point on. If the answer is no, then I don’t recommend it.

2 Likes

We do business-driven test automation (BDTA). If the most critical business use cases are robust automated to validate that they normal business could run with the new delivery we are happy.

2 Likes

depending upon how many true positives issues it finds…and less false positives meaning test failing even without code changes

3 Likes

My metrics of test automation effetiveness,
Quicker feedback about the build stability
Time saved by Functional test engineers from doing repetitive automatable tasks
Number of actual issues uncovered in the functional areas automated
Less time spent on maintaining the test automation suite

1 Like

At least when it comes to automated UI testing, I think a measure of a good test is “does it do exactly what a human would do if they were testing it manually?” and “does it do this consistently?”
Our automated test suite is consistent unless there are 1. bugs or 2. environmental blips. The latter can be massively reduced with good infrastructure for testing.

1 Like

There is no single answer for this. So I asked my team this very question to see what quality is for them in the context of check automation, and this is what I gathered (in no particular order):

  • Zero false positives (always re-run and verify failures before flagging them to maintain trust in the suite)
  • Result consistency (check should yield the same results unless the app behavior changes)
  • Fast feedback (reduces context switching and speeds up response time)
  • Coverage Depth (prioritizing high-value business cases over broad, shallow checks.)
  • Checks Logging and Readability (clear intent and useful logs to make debugging easier and faster)
1 Like