How do you measure automation success?

I was wondering how you guys measure the success of your automated test scripts.

In my current role we’re measuring ‘% of test cases automated’ and it’s in our goals to have 80% of our API test cases automated (mixture of checking data integrity and HTTP response codes are correct).

Other ways I can think is:

  • defects found by test scripts
  • time saved by running test scripts

Anyone got any other ideas?

5 Likes
  • How fragile is it? (Time spent fixing/updating it whether due to test developer bugs or changes in AUT)
  • Is there enough coverage so that I have confidence in it as regression (guard rails to protect for updates to AUT – not sure there’s an easy way to measure this)
  • Definitely time saved, as you noted, is major. Particularly easy to see if it’s for tests that you would run all the time.
6 Likes

Time saved is probably one of the main metrics used for justifying automation efforts because it is easy to turn that into “money saved”. You should adjust the time based on maintenance as Gabe suggested in the fragility metric. I prefer to stay away from “# of tests” and “defects found” metrics because they are deceiving. Having a large number of tests doesn’t mean that the right things are being tested (similar issue with code coverage). “Defects found” can be even more discouraging because if there is a large number, someone looks bad (devs for bad code or automator for false negatives) but if there are none, you don’t know if that is because there weren’t any or if the tests don’t address the needed scenario.

If the tests directly represent business paths, I would consider the number completed to be more relevant. The difference is that the tests can be mapped directly to the business in those cases. For example, 80% of the common paths through the feature are covered by automation along with 5 high risk edge cases.

In general, I suggest avoiding metrics for the sake of metrics and try to find something that is an indicator of why you are automating and it’s importance to the business…

3 Likes

Thank you, Ali. This is a question I realise I hadn’t pondered enough on.

Test Automation can take many different forms. One form is to prepare test data another is to put the system into a certain state or you could have your automation scan for broken links. These can be immensely useful and their worth is rather easy to measure.

An automated regression set (on any level) is something different though.
This usually takes a huge amount of time to decide on strategy, implementation and maintenance so the cost vs. benefit exercise is quite important to do multiple times during it’s lifecycle. And readjust where necessary.

At first, measuring the success is as binary as the results of the automation.
Red/fail: a problem is found. The automation successfully prevented an issue to move over to the next step.
–> Measure: Number of bugs found.
Green/pass: The application passed with no problems found. You now have some added confidence in the stability of the product.
–> Measure: Number of checks passed.

However, this completely ignores the hazard of False Positives (there is a problem, but wasn’t picked up) and False Alarms (a problem was found, but it turned out not to be a problem).

Though, when we think further about both measures: they both actually give us more of an indication of how the stability our development/environment/… is NOT how good our automation is. The goal of the automation is not to find problems or to give confidence. It is to be faster and more reliable at menial tasks. The aforementioned are a by-product and should not be the main focus. Treating them as such may be counter productive at the least.

To know how successful the automation is one should consider coverage, reliability and mitigated risk.
Add to that the time invested in creation, maintaining and running the scripts (and chasing false alarms/false positives) and you’ll have a rather useful ‘measure’ of the success of your regression suite.

None of these parameters can be expressed in meaningful numbers though, except for time.

(this completely negates the psychological effects that ‘having a regression set’ has on a team & project.)

Therefore, the repeating question of “should we invest/keep investing in a rigorous automation regression set?” is often very hard to answer. Many different questions should go along, such as “How many times do we build or release to production?” or “What is the root cause of most of our issues?” or “What kind of risk should we tackle?”.

I’m often very grateful of having a good automated regression set and love discussing the strategy it should be part of.
However, I’d be very wary of any measure of success even if it looks incredibly crafty and sensible.

Hope that helps give you some insight.
At least, writing it out helped me get a clearer idea. :wink:

5 Likes

I have around 8 Years of experience, Worked on different platform(Mobile/Web; Windows/Mac) and SAAS APP automation.

Following below are benefit of automation getting success:

  • One of key benefit of automation is less time consuming compare to manual testing
  • Second benefit is more scenario we can cover
  • Third benefit, We can concentrate on failed test cases and raise bug, so that avoid unnecessary testing on working scenario
  • Four Benefit, if we forget the scenario too it is fine, we can see script and understand flow(if it well documented)

Fine, give all the good answers first, see if I care.

Ali, I’d strongly recommend pondering the details in Beren’s already excellent reply, especially those about mitigated risk. My automated test scripts (“automatic check scripts” as I call them) serve the purpose of checking the probable validity of a fact because it serves some part of my test mission. If I know why I’m running the script and I know that the script reasonably serves its purpose to help me find out something about the product under test that is valuable to my mission within a reasonable cost then I consider it successful. The problem is the nature of those scripts changes depending on what they are and why you’re running them. Some of them I write in half an hour, run them for a test session, then pretty much throw them all away. Some of them are in a CI build to help reduce regression risk - a different purpose for a different mission that needs different reasons to justify its cost. So you need to know what those cases are trying to achieve in order to establish their success.

A useful heuristic here is to consider how measurements of automation fail to describe the purpose of your automation. You could measure the number of lines of code, for example, which would be never too far away from being pointless. The “number of bugs found” hides the nature of those bugs. The number of times a script does not find a problem is equal to the number of times a regression (that would have been detected by that script) has not occurred. The time saved by running scripts is balanced by the lack of more powerful testing that makes better use of the unique powers of inquisitive humans.

4 Likes

Interesting point to be made that the way to maximise time saved would be to do no testing at all, yielding a maximum money saved. If you mean time saved instead of humans doing the work then I think it’s really difficult to translate it into money saved - there’s all the costs of training, running and maintaining a code project, not to mention that automation doesn’t replace human testing so you have the costs of the problems not found by checks that would have been discovered by a person. You’d have to quantify the yet-unknown abstraction failures.

So I think the avoiding metrics for the sake of metrics point you made is extremely important. I’m also very much with you on the “why you are automating” point here, and I think it’s absolutely key - it’s hard to translate the cost-efficient but suitably thorough search for problems in an infinite space into quantifiable terms. I think it’s easier to present the case that your automation serves a purpose that justifies its cost.

4 Likes

“Time saved” was intended as a reference to the different methods by which the tests could be run. The assumption is that if the discussion is happening, not testing is already off the table. As for estimating monetary value, the practice I have seen used is quite simple: calculate the difference between the automated and manual runs, multiply that by an hourly dollar amount considered to be the approximate cost for a manual tester. That value is then reported as a savings on manual testing costs. When viewing testing as a whole, the same process is performed for the creation and maintenance of the automation and that amount is deducted from the overall savings. If you are viewing the short term, manual testing will always appear cheaper. When you extend the time-frame viewed you will see that in the long term automation will provide better value. If being unable to quantify the unknown was a valid reason for not creating a metric or estimate then we wouldn’t be having this discussion at all.

Hi Ali,

Wow this is a big one :slight_smile: One thought I have on this this is to ask yourself the question of why are creating automated scripts? What is their purpose? The metric you mentioned (% test cases automated) is metric that doesn’t seem to look at their value. It measures how successful you are at creating automated test cases, but it doesn’t measure how much value they are adding to your company.

I will say that measuring stuff like this is extremely hard (and sometimes dangerous) as metrics are slippery things. We usually can’t actually measure the things we really care about (improved quality for example) and so we measure things that approximate them.

Some things that I have used and/or am experimenting with using are:

  • Maintenance time vs. number of defects found
  • Speed with which we can get code from a developers machine to a released branch without hurting quality
  • How quickly we can track down and fix defects found by automation (good indicator of how well targeted the test are)
  • Run time of the tests
  • etc.
4 Likes

The assumption is that if the discussion is happening, not testing is already off the table

Yes, my point was a reductio ad absurdum to show that measuring success by reducing the time taken means that the pinnacle of success would be to do no testing. Therefore success cannot be a product only of time reduction.

That is a simple practice, but I question its utility, because automated checks and manual testing (testing without large check suites) are not fungible. Measuring time is useful when we look at automated check suites as a way to support a wider testing effort, because we want to understand the costs of running repeated checks, but not as a way to replace or compare one with the other. A coded check is not the same as a human doing testing attempting a check which is not the same as a human doing testing in general. You can measure the amount of time taken in each case, but it doesn’t tell us anything about what we’ve learned about the product, what problems we find, what risks we uncovered, how we told our story, how we informed our test clients, how it helped them to make better decisions and the perceived quality of the product. It doesn’t take into account how each thing serves our test mission. All we know is that it might take less time - so we’ve saved money (maybe) but potentially lost value. Time estimation only shows us which is quicker, not which is better or more successful… not that it really matters because you can’t automate testing anyway. That’s what I was trying to get at there.

3 Likes

Well to measure automation success we can take into account different benefits of using automation, apart from all those the two biggest factors which I consider for a automation to be successful are :slight_smile:

  • Tracking how much time it is taking to execute - how much time saving it is
  • quality results consistently releases over releases.
1 Like

One hand-wavy metric I use to measure success is whether automation allows me to do things that would be infeasible to do manually. It’s not the usual quantitative metric that would look good in a status report, but it definitely scores some points if you can demonstrate it.

For example, my team was able to measure the runtime performance cost of different debug info formats for the Linux kernel. It required running a slew of benchmarks and tests across a wide variety of hardware and then analysing the results. If we had been unable to automate all of that, we never would have even attempted it.

4 Likes

Interesting topic. To me the measure of success of test automation is how often can you release? How long does it take for a piece of code to go from check in to production.

If we talk minutes then it is a success

3 Likes

The SUT appears to be some form of web based system, so I would ask what platforms and operating system versions are used/tested?

Personally I would never automate to find defect, I would automate only for regression testing and for data injection for performance purposes.

In the case of web services at the user experience side you need eyes on AKA a tester checking that all is correct, you can then move on to automate this to cover all platforms and operating systems/browsers.

Great topic!

To me, successful automation is when everyone in the team has confidence that requirements of the product are fulfilled. When developers get fast feedback after their valuable code changes, when regression doesn’t reach environments where real testing happens, when testing is not about verifying that something that used to work still works. Simply put, when everyone in the team spends their time only on what is valuable and requires their skills and knowledge.

I agree that metrics can be a slippery slope. What is important to stress here is that once you start providing certain metrics, even if they are not the best, but your stakeholders/managers may cling to them, so it’s better to avoid metrics rather than provide something that becomes a burden and does not really reflect the situation. And all of what you provide, you should find valuable yourself.

What is automation success for you?
The usual case is that manual tasks were replaced. Then, time gets into picture. For teams working in Continuous Delivery this would lead to question: how long does it take to run the tests before releasing? We need fast feedback in agile, so would be great to do frequent deployments and get answers immediately.

If all the tests passed - this does not make me think that automation is successful. What I have heard of before is that if the test is passing all the time - it’s rather useless. If it passes for 5 years each time - do we really need it? Is it really passing or it does not work anymore? We forget. Test which is always green is nice to look at, may add to nice numbers, but what does it actually test and do we need it? That should be a question. Some extra logging with test results could be useful and then the analysis of logs.

Automation’s value is in being a helper for repetitive tasks. There was a saying somewhere that if you need to do something more than once in testing - you should automate it. I believe that is the biggest success.

Also, looking at the situation’s description in the original ticket, what could be useful is if the tests are created on the right levels. Is there a test pyramid? We don’t need exact numbers, but we need balanced ratios and this could help with fast feedback for CD.

In summary, metrics are a tricky topic and they may not answer automation success, but a few which I would consider:

  • How long does it take to run the tests in order to confidently deploy ready code to production/staging? (Or get feedback)
  • How stable is it? (# of false negatives, for example)
  • Are the results of test runs clear and their reporting automated? How long does it take from the issue trigger to actually get the issue solved? (If it’s CD, it should be triggered with each build run, but if the metric for first question is long - it may not be in the pipeline)
  • What tasks did the automation ease and how easier it is with it? (May not be a number, more of a show and tell)
1 Like

Hi Ali,
This is a good question when doing automation testing.
Below are some criterials to measure automation success based on my experience:

  1. The framework is clear/logic/easy to use and enough power to catch log/result/reports, can integrate to other CI tools, ex: jenkins…
  2. The stable of written scripts
  3. The scope of automation testing? ( how many percents TCs automated? how scale the automation TCs covered?)
  4. Performance of automation test suite? (how long does it takes to run?, how much time we need to have the final result after a run?
  5. Can release the code with high quality at the submitted time.

Hi all,

This is a regular question and always be asked by the client frequently
Below are some ideas to measure automation success based on my experience:

  1. We should analyze, estimate the automation difficult thing and the risk of your project. After that, you will have the number of test cases can be covered as much as possible by automation. I worked on some projects that the coverage was about 50%, and the others should be 90% or 100%. But all are successful. It’s depending on the project that you implemented automation for.

  2. The framework should be clear and has full support like log/result/reports/debugs/Data-Driven. Currently, the framework should be clean code, low cost of maintenance, easy to understand and maintenance by the others and should be able to integrate with CICD tools.

  3. Track the time when running the regression test by automation to make sure the automation framework decrease the time compare with the manual test. If the time is not decreased so much or be increased or the same, so we should consider doing automation or not, the framework and scripts that you implemented are not successful.

  4. For the framework team, should design the framework so that it is cleaner and easier to be used by the manual QA or scripting QA.

  5. The scripter should be trained the clean code and written the stable scripts. We must run automation regression test weekly or daily. So the stable scripts will help us decrease the time to invest, debug and fix the scripts.

When I think immediately after reading this question, I say I would measure it this way :

a. Value the tests bring.
b. Reliability of test results.
c. Ease of use.

1 Like

We may check the automation success on several parameters if we talk about different software testing companies approach. Most of the projects are now being tested by automation only as it saves lots of time and efforts.

  • One of the ways to check the automation success is by running the automated tests on daily bases with the help of Jenkins scheduled jobs which we call CI/CD/CM (Continuous integration/Continues development/Continuous Monitoring) process. On the basis of these results in Jenkins pipeline, the success of automation can be measured. If the pipelines are appearing as green consistently that means the code is stable because most of the time it happens that the same code is ran on different environments.
  • Next parameter, we can take here into consideration can be regression testing where we have to execute the automated code on different deployed instances/builds to verify the existing defects. If the code is able to produce/check the defects that is again can be counted as success of automation. As this automation process is also saving the time and reducing the manual efforts, it should be the priority that code is written with logic and should be robust so that automation brings the correct results. With such code, we will always be able to find new defects in the application, and finding defect with the help of automation is considered as a great thing.
  • The another important parameter can be noted as change of functionality in the application. For the little change in the functionality, the manual person need to execute the whole test cases just to make sure functionality is working fine, but on the other hand if we are having an adaptive framework, we just need to change few methods, locators and rerun the code. This will bring up the results and will save a lot of time.

Hope this is helpful for you.