How do you measure automation success?

(Alastair) #1

I was wondering how you guys measure the success of your automated test scripts.

In my current role we’re measuring ‘% of test cases automated’ and it’s in our goals to have 80% of our API test cases automated (mixture of checking data integrity and HTTP response codes are correct).

Other ways I can think is:

  • defects found by test scripts
  • time saved by running test scripts

Anyone got any other ideas?

(Gabe Newcomb) #2
  • How fragile is it? (Time spent fixing/updating it whether due to test developer bugs or changes in AUT)
  • Is there enough coverage so that I have confidence in it as regression (guard rails to protect for updates to AUT – not sure there’s an easy way to measure this)
  • Definitely time saved, as you noted, is major. Particularly easy to see if it’s for tests that you would run all the time.

(Shawn) #3

Time saved is probably one of the main metrics used for justifying automation efforts because it is easy to turn that into “money saved”. You should adjust the time based on maintenance as Gabe suggested in the fragility metric. I prefer to stay away from “# of tests” and “defects found” metrics because they are deceiving. Having a large number of tests doesn’t mean that the right things are being tested (similar issue with code coverage). “Defects found” can be even more discouraging because if there is a large number, someone looks bad (devs for bad code or automator for false negatives) but if there are none, you don’t know if that is because there weren’t any or if the tests don’t address the needed scenario.

If the tests directly represent business paths, I would consider the number completed to be more relevant. The difference is that the tests can be mapped directly to the business in those cases. For example, 80% of the common paths through the feature are covered by automation along with 5 high risk edge cases.

In general, I suggest avoiding metrics for the sake of metrics and try to find something that is an indicator of why you are automating and it’s importance to the business…

(Beren) #4

Thank you, Ali. This is a question I realise I hadn’t pondered enough on.

Test Automation can take many different forms. One form is to prepare test data another is to put the system into a certain state or you could have your automation scan for broken links. These can be immensely useful and their worth is rather easy to measure.

An automated regression set (on any level) is something different though.
This usually takes a huge amount of time to decide on strategy, implementation and maintenance so the cost vs. benefit exercise is quite important to do multiple times during it’s lifecycle. And readjust where necessary.

At first, measuring the success is as binary as the results of the automation.
Red/fail: a problem is found. The automation successfully prevented an issue to move over to the next step.
–> Measure: Number of bugs found.
Green/pass: The application passed with no problems found. You now have some added confidence in the stability of the product.
–> Measure: Number of checks passed.

However, this completely ignores the hazard of False Positives (there is a problem, but wasn’t picked up) and False Alarms (a problem was found, but it turned out not to be a problem).

Though, when we think further about both measures: they both actually give us more of an indication of how the stability our development/environment/… is NOT how good our automation is. The goal of the automation is not to find problems or to give confidence. It is to be faster and more reliable at menial tasks. The aforementioned are a by-product and should not be the main focus. Treating them as such may be counter productive at the least.

To know how successful the automation is one should consider coverage, reliability and mitigated risk.
Add to that the time invested in creation, maintaining and running the scripts (and chasing false alarms/false positives) and you’ll have a rather useful ‘measure’ of the success of your regression suite.

None of these parameters can be expressed in meaningful numbers though, except for time.

(this completely negates the psychological effects that ‘having a regression set’ has on a team & project.)

Therefore, the repeating question of “should we invest/keep investing in a rigorous automation regression set?” is often very hard to answer. Many different questions should go along, such as “How many times do we build or release to production?” or “What is the root cause of most of our issues?” or “What kind of risk should we tackle?”.

I’m often very grateful of having a good automated regression set and love discussing the strategy it should be part of.
However, I’d be very wary of any measure of success even if it looks incredibly crafty and sensible.

Hope that helps give you some insight.
At least, writing it out helped me get a clearer idea. :wink:

(Suresh) #5

I have around 8 Years of experience, Worked on different platform(Mobile/Web; Windows/Mac) and SAAS APP automation.

Following below are benefit of automation getting success:

  • One of key benefit of automation is less time consuming compare to manual testing
  • Second benefit is more scenario we can cover
  • Third benefit, We can concentrate on failed test cases and raise bug, so that avoid unnecessary testing on working scenario
  • Four Benefit, if we forget the scenario too it is fine, we can see script and understand flow(if it well documented)

(Chris) #6

Fine, give all the good answers first, see if I care.

Ali, I’d strongly recommend pondering the details in Beren’s already excellent reply, especially those about mitigated risk. My automated test scripts (“automatic check scripts” as I call them) serve the purpose of checking the probable validity of a fact because it serves some part of my test mission. If I know why I’m running the script and I know that the script reasonably serves its purpose to help me find out something about the product under test that is valuable to my mission within a reasonable cost then I consider it successful. The problem is the nature of those scripts changes depending on what they are and why you’re running them. Some of them I write in half an hour, run them for a test session, then pretty much throw them all away. Some of them are in a CI build to help reduce regression risk - a different purpose for a different mission that needs different reasons to justify its cost. So you need to know what those cases are trying to achieve in order to establish their success.

A useful heuristic here is to consider how measurements of automation fail to describe the purpose of your automation. You could measure the number of lines of code, for example, which would be never too far away from being pointless. The “number of bugs found” hides the nature of those bugs. The number of times a script does not find a problem is equal to the number of times a regression (that would have been detected by that script) has not occurred. The time saved by running scripts is balanced by the lack of more powerful testing that makes better use of the unique powers of inquisitive humans.

(Chris) #7

Interesting point to be made that the way to maximise time saved would be to do no testing at all, yielding a maximum money saved. If you mean time saved instead of humans doing the work then I think it’s really difficult to translate it into money saved - there’s all the costs of training, running and maintaining a code project, not to mention that automation doesn’t replace human testing so you have the costs of the problems not found by checks that would have been discovered by a person. You’d have to quantify the yet-unknown abstraction failures.

So I think the avoiding metrics for the sake of metrics point you made is extremely important. I’m also very much with you on the “why you are automating” point here, and I think it’s absolutely key - it’s hard to translate the cost-efficient but suitably thorough search for problems in an infinite space into quantifiable terms. I think it’s easier to present the case that your automation serves a purpose that justifies its cost.

(Shawn) #8

“Time saved” was intended as a reference to the different methods by which the tests could be run. The assumption is that if the discussion is happening, not testing is already off the table. As for estimating monetary value, the practice I have seen used is quite simple: calculate the difference between the automated and manual runs, multiply that by an hourly dollar amount considered to be the approximate cost for a manual tester. That value is then reported as a savings on manual testing costs. When viewing testing as a whole, the same process is performed for the creation and maintenance of the automation and that amount is deducted from the overall savings. If you are viewing the short term, manual testing will always appear cheaper. When you extend the time-frame viewed you will see that in the long term automation will provide better value. If being unable to quantify the unknown was a valid reason for not creating a metric or estimate then we wouldn’t be having this discussion at all.

(Dave) #9

Hi Ali,

Wow this is a big one :slight_smile: One thought I have on this this is to ask yourself the question of why are creating automated scripts? What is their purpose? The metric you mentioned (% test cases automated) is metric that doesn’t seem to look at their value. It measures how successful you are at creating automated test cases, but it doesn’t measure how much value they are adding to your company.

I will say that measuring stuff like this is extremely hard (and sometimes dangerous) as metrics are slippery things. We usually can’t actually measure the things we really care about (improved quality for example) and so we measure things that approximate them.

Some things that I have used and/or am experimenting with using are:

  • Maintenance time vs. number of defects found
  • Speed with which we can get code from a developers machine to a released branch without hurting quality
  • How quickly we can track down and fix defects found by automation (good indicator of how well targeted the test are)
  • Run time of the tests
  • etc.

(Chris) #10

The assumption is that if the discussion is happening, not testing is already off the table

Yes, my point was a reductio ad absurdum to show that measuring success by reducing the time taken means that the pinnacle of success would be to do no testing. Therefore success cannot be a product only of time reduction.

That is a simple practice, but I question its utility, because automated checks and manual testing (testing without large check suites) are not fungible. Measuring time is useful when we look at automated check suites as a way to support a wider testing effort, because we want to understand the costs of running repeated checks, but not as a way to replace or compare one with the other. A coded check is not the same as a human doing testing attempting a check which is not the same as a human doing testing in general. You can measure the amount of time taken in each case, but it doesn’t tell us anything about what we’ve learned about the product, what problems we find, what risks we uncovered, how we told our story, how we informed our test clients, how it helped them to make better decisions and the perceived quality of the product. It doesn’t take into account how each thing serves our test mission. All we know is that it might take less time - so we’ve saved money (maybe) but potentially lost value. Time estimation only shows us which is quicker, not which is better or more successful… not that it really matters because you can’t automate testing anyway. That’s what I was trying to get at there.

(monica paul) #11

Well to measure automation success we can take into account different benefits of using automation, apart from all those the two biggest factors which I consider for a automation to be successful are :slight_smile:

  • Tracking how much time it is taking to execute - how much time saving it is
  • quality results consistently releases over releases.

(Matt) #12

One hand-wavy metric I use to measure success is whether automation allows me to do things that would be infeasible to do manually. It’s not the usual quantitative metric that would look good in a status report, but it definitely scores some points if you can demonstrate it.

For example, my team was able to measure the runtime performance cost of different debug info formats for the Linux kernel. It required running a slew of benchmarks and tests across a wide variety of hardware and then analysing the results. If we had been unable to automate all of that, we never would have even attempted it.

(Augusto) #13

Interesting topic. To me the measure of success of test automation is how often can you release? How long does it take for a piece of code to go from check in to production.

If we talk minutes then it is a success