How to achieve 100% pass rate for UI automation in the nightly build?

rghodke · 12 October 2023 19:27

We are closed to 95% pass rate for regression tests nightly build execution. However always get 4%-5% failure due to time out issues, environmental issues. Other problem is there is no fix pattern which 4% will fail. For example - today say a particular set of tests failed, tomorrow some other tests fail due to time out issues.

Any one is feeling the same pain?
Any resolution, suggestions are highly appreciated !!

jgfmachado · 12 October 2023 19:41

Are you guys doing a re run of the failed tests?

rghodke · 13 October 2023 06:43

Yes, we do. In small batch or individually these tests pass, but when we run in nightly build with other 1500 tests, we see these inconsistencies.

kristof · 13 October 2023 06:52

Remove them.

Simple as that. I DISLIKE false-positives and flaky tests in my pipelines.
I rather move the false positives and flaky tests and put them in the manual regression suite.

I don’t want to spend my time trying to debug why something failed to find the result to be an unstable environment or time out issues.

Hence why we remove such flaky tests (or re-write them) and test those scenario’s manually.

rghodke · 13 October 2023 07:29

Problem is I don’t have exact same tests each day. if I run nightly build today different set of tests failed due to time out issues then yesterday.

kristof · 13 October 2023 07:31

Remove them
I’m not saying you need to remove 1 flaky test, if 200 of your 1500 tests are flaky. Remove them and test them manually or fix them to be not flaky.

It’s not worth the time and effort to check them out daily to see if it’s a false positive or not. I rather remove 200 flaky tests and test them manual.

mikeharris · 13 October 2023 10:18

This is how I improved the reliability of our automated tests. I hope that it is helpful: Automated tests do not have to be flaky. – TestAndAnalysis

ipstefan · 17 October 2023 07:40

Trying to fix flakiness seems wasteful to me.

Initially, you’re not testing but you’re spending your time coding a separate product that you expect will tell you nothing (UI automation). See Cem Kaner’s definition of a good test which goes something like: reveals new information about a product. If you didn’t find anything new your test has failed.

Then instead of stopping there, and doing some testing or building automation that will aid your testing, you check the UI automation daily, going through the failures. More time is spent again there.

Then you’re not stopping there either, you want to ‘fix’ the failures. And have iterations of potential solutions. In the meantime, no testing happens as you’re busy.

Then you want to cover more, automate more because it’s fast. The coding time increases, maintenance increases, looking at failures increases. But then, is anyone left in the team finding problems in the product?

Then I’m assuming you’re not spending much time on the 95% of the automated UI checks. If no failure occurs, can you be certain you’re even looking for appropriate things that could fail and are important?

I believe the 95% should get more attention. As it has been recommended here, delete the flaky stuff. Reduce the time you’re wasting on not doing good testing.

joaofarias · 18 October 2023 09:22

Simplifying the SUT may help.

1 - Are you rendering unnecessary components?

E.g. you have a scenario about a search bar, but you end up rendering the the foot component?
By rendering only the search bar, you remove the fragility of your search bar scenario regarding the foot.

2 - Isolate the UI

Are you executing code that are not your target, e.g. business logic? By isolating the UI, you remove the fragility of your UI scenarios from other pieces of code.

3 - TDD

By moving in tiny steps driven by a failed checks, you will tend to have always a stable suite of checks in green state. If you make code something in the production code that makes the suite unstable, this code will be tiny, easy to identify and revert. The whole problem happens when you make big moves and only validate their stability after putting all this work - do the contrary, move small and check continuously.

sharmon · 18 October 2023 11:36

I would question if you need that many End to End tests. You can check out this blog from google where they talk about the need for E2E but showcase how and why you should limit them. If the tests are flakey I personally would remove them. They aren’t doing much for you if you’re having to go back in and check the tests anyway, constantly, ruining the point of automation.

If you do need them then I would remove them for now, work on isolating the issues and problems. Fix them and re-introduce them to the nightly builds.

Google Testing Blog: Just Say No to More End-to-End Tests (googleblog.com)

lgibbs · 18 October 2023 15:24

Yes, I inherited an automated test suite with this very problem. Reducing the failure rate of these tests was not easy.

First, any tests that do fail, we rerun - even if that test has a history of failing for no reason. This means we know if the failures are hiding a bug that may have been introduced.

If the tests pass on rerun, then I mark these tests as ‘flaky’ and aim to reduce flakiness in the future. Here are a few things I do to reduce flakiness:

Run the test multiple times, 20 times at least, to get an idea of how often this test will fail.
See if there is a common step within the flaky tests where the failure occurs. Its usually a step within the test that is flaky, rather than the test itself. Fix the step, then you may reduce flakiness on multiple tests.
For timing issues, consider introducing waits into the test (not sleeps) - for example, wait for a button to load. A common cause of a flaky test is the page or element on the page not loading quick enough, so allow the test to wait for an element which the test is interacting with to finish loading before continuing with the test.
(Check out section on explicit waits here: Waiting Strategies | Selenium)

Without more information about specific issues, I can’t provide you with any more pointers on reducing flakiness in tests. All I can say is reducing the failure rate takes time. Take it one test at a time, investigate, try to improve the tests pass rate, then move onto the next test. Over time, you will eventually get 99%+ tests passing each time.

At the moment, on my automated test runs, I get a 100% pass rate about once in every 4 test runs which doesn’t sound brilliant, but when I first joined the company we were lucky if we had all tests pass once a month. Things are gradually getting better.

conrad.connected · 19 October 2023 08:39

What is the functional coverage? Having some tests that are reliable is one thing, however you have only truly arrived when you can use just those results and no other acceptance or UX testing results, to authorise a release.

1 . Automatically re-running a failed test actually starts to only tell you one thing really. Your product is hard to test, address that.
2. UI testing is cool and important, but does it cover end-to-end, all the little things users hit when they first ever see your platform as-in that initial experience, the OOB experience? Turning off new users is not a business win. Aim for coverage, not pass rate.
3. Don’t beat yourself up over flakey tests, those flakey tests are a weakness, but also a strength, they teach you how to deal with hard to test features, and hard to automate environments. Most testers gradually dumb down their environments to make their flakey tests more stable, eventually the test environment no longer resembles the real world. Be careful of bubble wrapping flakey tests. Flakey tests are your Davy Lamp Davy lamp - Wikipedia

rghodke · 19 October 2023 13:03

Thank you all for the help. Current test coverage is about 70% and we will take out flaky tests to different test run and stabilise them by redesigning or converting them in to API or Unit tests.

hitchdev · 19 October 2023 15:04

My process:

Make the tests fully hermetic. Is there a network call going out over a network? Run it against a mock API instead. Is it using a database? Set the database up locally with fixed data and tear it down after each test.

In practice I think almost nobody makes end to end tests hermetic. It’s very, very hard. It is a worthwhile goal though, for more reasons than just flakiness.

Remove everything resembling a sleep and replace it with an explicit wait.

This is in practice fairly easy, but very common

Even after you do all of this, however, you will probably see flakiness. This is

Identify sources of non-determinism in the code and fix or eliminate them.

3 is really tricky because you need to either be a dev (like me) or you need support from devs to fix these things and there are always a lot of them. They will include things deep down in the code like:

Looping through a data structure without a deterministic order like a hashmap.
SELECT queries buried deep in the code that don’t have an ORDER BY.
Usage of time (this can often be fixed by mocking time).
Deliberate use of random numbers (this can be fixed by either fixing a seed on test runs or mocking the RNG).

The 2nd worst thing is when these things are buried in an open source library. The worst thing is when they are buried in a closed source library.

I’ve managed to achieve 1, 2 and 3 on small projects. I’ve rarely achieved it all on large projects - though that was mostly due to lack of time.

Topic		Replies	Views
Tips around Automated tests maintenance? Discussions automation , process	6	222	19 May 2024
What makes automated tests/checks flakey? Discussions automation , continuous-learning , mot-icta	20	1311	8 April 2024
Nightly builds for ui automation Archive automation	5	2218	4 August 2020
Flaky Tests - What Causes Them? Archive automation	5	686	22 October 2020
A rapidly and ever changing UI, is it worth having a UI based E2E Test? Discussions automation , process , ui-automation	27	1564	11 January 2024

How to achieve 100% pass rate for UI automation in the nightly build?

Related topics