What makes automated tests/checks flakey?

Hello all,

Iā€™m back again with another question that Iā€™d love to get peopleā€™s thoughts on. Specifically:

What makes automated tests/checks flakey?

I think there are many reasons why tests/checks become flakey, but Iā€™d love to hear individual perspectives and experiences. Do you have any horror stories you could tell about facing flakey tests/chats.

5 Likes

When I worked for Elastic, we had automated ā€œfunctional testsā€ but these were not functional tests but rather e2e integration tests. We had a custom test harness that spun up elasticsearch, spun up the application under test (Kibana), added test data and did some setup, and then started the Selenium style tests. As you can imagine, tests could fail at any point during any of these processes.

Unexpected latency starting the server.
Server crash.
Unexpected latency setting up test data.
Using hard coded sleep values to ensure state.
Trying to automate tests that are poor candidates.
And much much more!

2 Likes

An oldie but goodie: Processing dates.

I saw an interesting one not too long ago. Some tests failed when using date differences, even though the server was in the same times time zone as the user at all times. ā€“ For some ā€˜date pairsā€™ the result was OK, for other combination of a start date and end date they were wrong.

While the user was in the same time zone as the server at all times, it turned out that both changed the time zone now and then ā€“ even without moving geographically. Thanks daylight savings time! :exploding_head:

That meant an hour might be ā€˜missingā€™ in the difference, and therefore a day wasnā€™t completed. And that caused a month to be considered as not completely passed. :crazy_face:

Processing dates and times is harder that you think, even if you think your took that fact into consideration.

As our team finally got our automated test suites under control and quite stable, we merged with another team and ā€œinheritedā€ a bunch of tests which turned out to be flaky.

The flakiness is caused by various reasons:

  • Poorly written tests with static delays etc ā€¦ which would always succeed on one system but not always on another
  • Regression test systems with wildly varying specs and performance, which exposed problems with the robustness of the tests
  • Issues due to running tests in parallel where one test does seem to impact another
  • Hard to catch and trigger software issues that occasionally generate test failures
  • ā€¦

The various sources of flakiness are gradually being tackled, but until we get there, this introduces a lot of time wasted trying to identify which failures are genuine, and which ones are flakiness.

2 Likes

Latency and other networking/environmental issues are the biggest problem we face when it comes to test flakiness, so agree with you there.

We use Browserstackā€™s automate mobile productā€¦ which works well in the main but if there is an issue their side then you can get flakey results even though the tests are robust.

Poor candidate tests such that the test data or content on test changes unexpectedly is something weā€™ve had to dwal with also.

Iā€™d hope we were past the use of thread.sleep() in this day in age. Those are always a recipe for disaster.

1 Like

Nothing, there are no flakey tests, only flakey people implementing bad tests :wink:

3 Likes

For the latency issue, I know that Google Chrome has a setting where you can artificially add latency to your tests to simulate possible latency in CI. We set that to an environmental flag and add some latency when we are working with tests locally to help simulate a barebones CI machine. It helps us to catch some tests that would have flaked in CI. Unfortunately, I think only chrome has this setting.

The code base has changed quite a bit since I worked there but I found it.

This is where we mapped it.

and also:

This is an article showing how to set the latency manually while running tests.

Flakey tests, are perhaps as fundamental a time suck as you let them be. In my experience they stem from testing code-paths that our dev team did not write. Which is roughly what everyone else has just described above. And are the reason I continuously warn people for example not to test things like kaptchas in a end-to-end test.
Perhaps itā€™s unfair to say testing other peoples code builds flakey tests, but generally for mee itā€™s indicated by time.sleep(5) statements I see in our python scripts. I love these during code reviews for that reason. Delays. Which may have been added to allow an Android system setting to ripple through the device, or just for an element to appear after a javascript animation on a web page. Or even for a web service to talk to another before tha API call you make will work! I turned off implicit waits in our selenium connection start code as well, implicit waits have their place, but they hide the other cause of flakey tests, being temporally sensitive without being context sensitive. Never replace uncertain state, with uncertain time. Get the devs to expose the system state via a secure and debug-hidden API, you will be amazed at how much it will speed up and additionally how much it will stabilize your tests to change to using system state. I still have dozens of flakes, there is no silver bullet.

  • Unstable test environments
  • Poorly written tests eg. Utilising the wrong waits
  • SUT in the wrong state at the start of the test. (Eg. Test Setup done wrong or another testā€™s cleanup isnā€™t done properly)
1 Like

flaky = ā€œ(of a device) prone to break down or fail.ā€

People:

  • by mistake - when designing or coding it;
  • by choice - thereā€™s a high chance of getting into unstable things, but the direction is kept;
  • by listening to or obeying others - someone else leads what and/or how some automation should be implemented;
  • by pride - wanting to keep some number high, or thinking one can fix a problem: not maintaining, cleaning, rewriting checks that make sense and patching or leaving flakey code exist;
  • by indifference - itā€™s accepted and acceptable for the company to have this.
  • by selfishness - not working with others or not being supported by others to increase stability, testability, infrastructure, code, approach, etcā€¦
1 Like

Assumptions and change.

And Angular :smiley:

1 Like

There are no flaky tests! :slight_smile:

We have flaky systems and tests uncover that flakiness, I always wonder why we were so fast in calling tests flaky but I never heard of a flaky environment/product/App?

The exception is poorly written tests but again why do we allow that code to execute, we should have the same criteria for production code and for tests, at the end of the day if we are going to produce code we should do it with the same standards.

2 Likes

In my experience, itā€™s mostly due to various conditions external to the actual software. Things like latency, variable times to display web application information, and so on - these are almost always an issues in end-to-end automation and are almost always the result of factors outside the control of the developers or testers.

Unfortunately, sometimes thereā€™s no choice about where to automate. Older software can be impossible to unit test because UI and business logic are intertwined. Web services may not expose a testable API - Iā€™ve dealt with a web service where testing was a matter of dropping prepared files into the designated directory and watching what happened - automation consisted of building the files on the fly and parsing the XML dropped into the results directory.

In my experience almost anything in-app can be handled. Interactions with the computer running the app, the network, the internet, updates deciding to happen at an awkward timeā€¦ These are usually the cause of flakiness in my experience.

1 Like

IMHO and in no particular order, the obvious and probably most common candidates would be amongst these:

  • poorly performing test environments
  • poor choice of element locator strategy
  • limitless dynamic element state waiting - never materialising
  • equally troublesome - hard coded sleep waiting
  • excessive or encouraged use of test failure retries
  • tests that are not sufficiently independent, relying on the output or result of others before them
  • relying on external services and not mocking when appropriate
  • multi threading tests without considering ā€˜userā€™ overlap or session invalidation etc
  • application state not being set up or torn down correctly prior to test execution.

At one time or another, Iā€™ve been bitten by all of these.

1 Like

Thank you everyone for your thoughts. Flakey checks are definitely an annoying issue for all of us and I feel that frustration in your comments :smiley:

One small request, would any of you be able to expand your thoughts around this conversation in this question as well?

Welcome to the community @ashish_te . Very insightful way to start off and actually nail it, where have you been all this time I wonder.
Iā€™m not a web app tester, but element locator choices are a long running point of debate and stem from developers writing ā€œuntestable codeā€ for web. On Native platforms, this problem is less common, but still causes test code confusion anyway. But your point about poorly performing test environments is still nailing it. I was pricing up chromebooks for testing yesterday and Iā€™m totally choosing slightly higher spec devices for just this very reason, testing on a slow system is good for finding bugs of all kinds, but skimping on resources is not good for automated tests at all, ever. Cheers.

1 Like

Often very basically: Using mostly GUIs for automation.

GraphicalUserIntefaces, as they are made for humans, can be very unreliable for automation.
2 technically different (under the hood) versions of a GUI can look the same for humans. No problem for a human, but a machine.

I created this picture to show that:

I find ā€œEnd 2 Endā€ being done by GUI very wasteful. I know that is often a escape path for check automaters without much support for testability of the product.
Welcome to automation hell if you lack time to adapt the automation to the changes in the UI ā€¦ I have been there.

In my current project we check most business logic via API. And the GUI part is more about ā€œDoes the UI works at allā€, which includes the connection to the server.
The API parts needs around 1,5 hours and is quite stable in terms of check code calling the application. The GUI parts runs in around 10 minutes.

What makes automated checks flakey?
Often the lack of testability of the product and problematic approaches to automation.
Which finally also relates to a lack of support by the organisation.

insert here a rant about letting one progam interact with another in the role of simulated users unsupervised for hours

insert here a plea for ā€œsemi-automationā€, using it as tool to support exploration

4 Likes
  • Visibility of those elements the test script will interact with, particularly in mobile automation. Due to all different screen sizes, sometime the expected elements show up since the beginning, some other times donā€™t.
  • The appearance of random modals or dialogs (ads, marketing campaigns, etc) that interrupts the script flow.
3 Likes

are there rocksolid techniques for identifying the click listener triggered from a specific click-event?

ive tried manually checking dom events with minimal success. and frequently the tree of events is very deep.

The moment the tester resorts to inspecting the implementation, they build flakiness into a test. So yeah, Daniel , try not to go down the rabbit hole, it only ends up making the test more flakey. Nothing is ever rocksolid, there is no silver bullet that bypasses a human actually talking to another human. Get the developers of the app to make the app easier to test instead, via hooks or other means.

Usually flakey tests are flakey because the tester has no way of asking the devs to make the app more ā€œtestableā€ and as a result both the system developer and the tester suffer the time-costs of not being on the same ā€œteamā€ to make this work better together.

2 Likes