What makes automated tests/checks flakey?

mwinteringham · 2 December 2023 12:49

Hello all,

I’m back again with another question that I’d love to get people’s thoughts on. Specifically:

What makes automated tests/checks flakey?

I think there are many reasons why tests/checks become flakey, but I’d love to hear individual perspectives and experiences. Do you have any horror stories you could tell about facing flakey tests/chats.

powercoder · 3 December 2023 06:11

When I worked for Elastic, we had automated “functional tests” but these were not functional tests but rather e2e integration tests. We had a custom test harness that spun up elasticsearch, spun up the application under test (Kibana), added test data and did some setup, and then started the Selenium style tests. As you can imagine, tests could fail at any point during any of these processes.

Unexpected latency starting the server.
Server crash.
Unexpected latency setting up test data.
Using hard coded sleep values to ensure state.
Trying to automate tests that are poor candidates.
And much much more!

seasidetesting · 3 December 2023 12:47

An oldie but goodie: Processing dates.

I saw an interesting one not too long ago. Some tests failed when using date differences, even though the server was in the same times time zone as the user at all times. – For some ‘date pairs’ the result was OK, for other combination of a start date and end date they were wrong.

While the user was in the same time zone as the server at all times, it turned out that both changed the time zone now and then – even without moving geographically. Thanks daylight savings time!

That meant an hour might be ‘missing’ in the difference, and therefore a day wasn’t completed. And that caused a month to be considered as not completely passed.

Processing dates and times is harder that you think, even if you think your took that fact into consideration.

frederickv · 3 December 2023 13:30

As our team finally got our automated test suites under control and quite stable, we merged with another team and “inherited” a bunch of tests which turned out to be flaky.

The flakiness is caused by various reasons:

Poorly written tests with static delays etc … which would always succeed on one system but not always on another
Regression test systems with wildly varying specs and performance, which exposed problems with the robustness of the tests
Issues due to running tests in parallel where one test does seem to impact another
Hard to catch and trigger software issues that occasionally generate test failures
…

The various sources of flakiness are gradually being tackled, but until we get there, this introduces a lot of time wasted trying to identify which failures are genuine, and which ones are flakiness.

brgibb · 3 December 2023 17:31

Latency and other networking/environmental issues are the biggest problem we face when it comes to test flakiness, so agree with you there.

We use Browserstack’s automate mobile product… which works well in the main but if there is an issue their side then you can get flakey results even though the tests are robust.

Poor candidate tests such that the test data or content on test changes unexpectedly is something we’ve had to dwal with also.

I’d hope we were past the use of thread.sleep() in this day in age. Those are always a recipe for disaster.

kristof · 3 December 2023 19:20

Nothing, there are no flakey tests, only flakey people implementing bad tests

powercoder · 3 December 2023 19:43

For the latency issue, I know that Google Chrome has a setting where you can artificially add latency to your tests to simulate possible latency in CI. We set that to an environmental flag and add some latency when we are working with tests locally to help simulate a barebones CI machine. It helps us to catch some tests that would have flaked in CI. Unfortunately, I think only chrome has this setting.

The code base has changed quite a bit since I worked there but I found it.

This is where we mapped it.

github.com

elastic/kibana/blob/main/test/functional/services/remote/webdriver.ts#L324C15-L324C15


      
          session.setNetworkConditions(networkProfileOptions);

and also:

github.com

elastic/kibana/blob/main/test/functional/services/remote/network_profiles.ts

/*
 * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
 * or more contributor license agreements. Licensed under the Elastic License
 * 2.0 and the Server Side Public License, v 1; you may not use this file except
 * in compliance with, at your election, the Elastic License 2.0 or the Server
 * Side Public License, v 1.
 */

export type NetworkProfile = 'NO_THROTTLING' | 'FAST_3G' | 'SLOW_3G' | 'OFFLINE' | 'CLOUD_USER';

export interface NetworkOptions {
  offline: boolean;
  latency: number;
  download_throughput: number;
  upload_throughput: number;
}

const sec = 10 ** 3;
const MBps = 10 ** 6 / 8; // megabyte per second (MB/s) (can be abbreviated as MBps)

This file has been truncated. show original

This is an article showing how to set the latency manually while running tests.

conrad.connected · 4 December 2023 07:59

Flakey tests, are perhaps as fundamental a time suck as you let them be. In my experience they stem from testing code-paths that our dev team did not write. Which is roughly what everyone else has just described above. And are the reason I continuously warn people for example not to test things like kaptchas in a end-to-end test.
Perhaps it’s unfair to say testing other peoples code builds flakey tests, but generally for mee it’s indicated by time.sleep(5) statements I see in our python scripts. I love these during code reviews for that reason. Delays. Which may have been added to allow an Android system setting to ripple through the device, or just for an element to appear after a javascript animation on a web page. Or even for a web service to talk to another before tha API call you make will work! I turned off implicit waits in our selenium connection start code as well, implicit waits have their place, but they hide the other cause of flakey tests, being temporally sensitive without being context sensitive. Never replace uncertain state, with uncertain time. Get the devs to expose the system state via a secure and debug-hidden API, you will be amazed at how much it will speed up and additionally how much it will stabilize your tests to change to using system state. I still have dozens of flakes, there is no silver bullet.

deament · 4 December 2023 09:54

Unstable test environments
Poorly written tests eg. Utilising the wrong waits
SUT in the wrong state at the start of the test. (Eg. Test Setup done wrong or another test’s cleanup isn’t done properly)

ipstefan · 4 December 2023 10:30

flaky = “(of a device) prone to break down or fail.”

People:

by mistake - when designing or coding it;
by choice - there’s a high chance of getting into unstable things, but the direction is kept;
by listening to or obeying others - someone else leads what and/or how some automation should be implemented;
by pride - wanting to keep some number high, or thinking one can fix a problem: not maintaining, cleaning, rewriting checks that make sense and patching or leaving flakey code exist;
by indifference - it’s accepted and acceptable for the company to have this.
by selfishness - not working with others or not being supported by others to increase stability, testability, infrastructure, code, approach, etc…

jon_thompson · 4 December 2023 10:44

Assumptions and change.

And Angular

cristiano.cunha · 4 December 2023 11:43

There are no flaky tests!

We have flaky systems and tests uncover that flakiness, I always wonder why we were so fast in calling tests flaky but I never heard of a flaky environment/product/App?

The exception is poorly written tests but again why do we allow that code to execute, we should have the same criteria for production code and for tests, at the end of the day if we are going to produce code we should do it with the same standards.

katepaulk · 4 December 2023 13:46

In my experience, it’s mostly due to various conditions external to the actual software. Things like latency, variable times to display web application information, and so on - these are almost always an issues in end-to-end automation and are almost always the result of factors outside the control of the developers or testers.

Unfortunately, sometimes there’s no choice about where to automate. Older software can be impossible to unit test because UI and business logic are intertwined. Web services may not expose a testable API - I’ve dealt with a web service where testing was a matter of dropping prepared files into the designated directory and watching what happened - automation consisted of building the files on the fly and parsing the XML dropped into the results directory.

In my experience almost anything in-app can be handled. Interactions with the computer running the app, the network, the internet, updates deciding to happen at an awkward time… These are usually the cause of flakiness in my experience.

ashish_te · 11 December 2023 11:29

IMHO and in no particular order, the obvious and probably most common candidates would be amongst these:

poorly performing test environments
poor choice of element locator strategy
limitless dynamic element state waiting - never materialising
equally troublesome - hard coded sleep waiting
excessive or encouraged use of test failure retries
tests that are not sufficiently independent, relying on the output or result of others before them
relying on external services and not mocking when appropriate
multi threading tests without considering ‘user’ overlap or session invalidation etc
application state not being set up or torn down correctly prior to test execution.

At one time or another, I’ve been bitten by all of these.

mwinteringham · 11 December 2023 11:30

Thank you everyone for your thoughts. Flakey checks are definitely an annoying issue for all of us and I feel that frustration in your comments

One small request, would any of you be able to expand your thoughts around this conversation in this question as well?

conrad.connected · 12 December 2023 08:49

Welcome to the community @ashish_te . Very insightful way to start off and actually nail it, where have you been all this time I wonder.
I’m not a web app tester, but element locator choices are a long running point of debate and stem from developers writing “untestable code” for web. On Native platforms, this problem is less common, but still causes test code confusion anyway. But your point about poorly performing test environments is still nailing it. I was pricing up chromebooks for testing yesterday and I’m totally choosing slightly higher spec devices for just this very reason, testing on a slow system is good for finding bugs of all kinds, but skimping on resources is not good for automated tests at all, ever. Cheers.

sebastian_solidwork · 18 December 2023 12:57

Often very basically: Using mostly GUIs for automation.

GraphicalUserIntefaces, as they are made for humans, can be very unreliable for automation.
2 technically different (under the hood) versions of a GUI can look the same for humans. No problem for a human, but a machine.

I created this picture to show that:

I find “End 2 End” being done by GUI very wasteful. I know that is often a escape path for check automaters without much support for testability of the product.
Welcome to automation hell if you lack time to adapt the automation to the changes in the UI … I have been there.

In my current project we check most business logic via API. And the GUI part is more about “Does the UI works at all”, which includes the connection to the server.
The API parts needs around 1,5 hours and is quite stable in terms of check code calling the application. The GUI parts runs in around 10 minutes.

What makes automated checks flakey?
Often the lack of testability of the product and problematic approaches to automation.
Which finally also relates to a lack of support by the organisation.

insert here a rant about letting one progam interact with another in the role of simulated users unsupervised for hours

insert here a plea for “semi-automation”, using it as tool to support exploration

bautistaeric · 2 April 2024 00:27

Visibility of those elements the test script will interact with, particularly in mobile automation. Due to all different screen sizes, sometime the expected elements show up since the beginning, some other times don’t.
The appearance of random modals or dialogs (ads, marketing campaigns, etc) that interrupts the script flow.

dank8 · 3 April 2024 23:39

are there rocksolid techniques for identifying the click listener triggered from a specific click-event?

ive tried manually checking dom events with minimal success. and frequently the tree of events is very deep.

conrad.braam · 4 April 2024 07:10

The moment the tester resorts to inspecting the implementation, they build flakiness into a test. So yeah, Daniel , try not to go down the rabbit hole, it only ends up making the test more flakey. Nothing is ever rocksolid, there is no silver bullet that bypasses a human actually talking to another human. Get the developers of the app to make the app easier to test instead, via hooks or other means.

Usually flakey tests are flakey because the tester has no way of asking the devs to make the app more “testable” and as a result both the system developer and the tester suffer the time-costs of not being on the same “team” to make this work better together.

Topic		Replies	Views
Flaky Tests - What Causes Them? 🗄️ Archive automation	5	595	22 October 2020
How to achieve 100% pass rate for UI automation in the nightly build? 🙋 Questions learning , automation	13	1444	19 October 2023
How do you "Maintain Automated Tests?" - Automation in Testing Curriculum 🗄️ Archive automation , career-development , curriculum-automation	1	3777	13 July 2022
I've Made Huge Mistakes in Test Automation, so You Don't Have To with Butch Mayhew 🗄️ Archive automation , testbash-new-zealand	6	630	19 November 2020
How do you “Investigate Failed Automated Tests”? - Automation in Testing Curriculum 🗄️ Archive automation , career-development , curriculum-automation	3	4112	28 January 2023

What makes automated tests/checks flakey?

Related Topics