There are no flaky tests! Only flaky people who implement bad tests
But for real, if a test is somewhat flaky or unstable for some reason, either:
refactor it so it’s stable
remove it from CI/CD and test it manual.
It’s not a challenge, it’s just bad practice to have flakyness and people tend to not address it and rather check every single time the pipeline fails.
Refactor or Remove, it’s super simple
I think that the concept of “flaky tests” suggests that the goal of automation tools is to have all the checks pass.
I think that that goal of automation is to provide us with information that we can use. When you treat them that way then a “flaky test” is showing you a valuable gap in your understanding. That may be understanding of risk, the product, the automation tool, the intent of the check, your mental models, the result, third party systems, async relationships, whatever, but if it’s worth having the check (which it may not be) then it’s worth it providing useful value that a responsible tester can make use of.
If we do not understand why a check isn’t working how we want it to work then how do we know what value is it providing? The goal of a stable automation suite is fine, but not at the expense of the assistance to testing that it provides. Otherwise we could not run it at all and say that we did, for the same value at lower cost. Automation is expensive and checks need a good reason to exist. I don’t really know why we spend so much time and money writing automation suites and then treat them merely like an obstacle to overcome to get the product out. If a tool or artefact isn’t assisting us (or is entirely pretending to) then get rid of it. Automation is not a gatekeeper for release, it’s a warning system to help us know where to look to make valuable qualitative evaluations about the product.
Because not all software is equal, and a check’s purpose can be highly variable, I’d say that it’s a tester’s responsibility to evaluate checks with unusual output, and to put in the requisite effort and action based on their perceived value of this check to the strategy in context. A weird check is saying “there’s something you don’t know” - and it’s up to you if knowing that thing might be worth the effort of learning it. Maybe the investigation will reveal bugs in other checks in the automation suite. If we have a good idea of why it’s failing and the check doesn’t matter then get rid of it if it’s not providing value for its cost.
Interestingly it also suggests that there may be a whole bunch of other checks with problems that we haven’t looked at because those problems don’t result in a fail. A warning light where the bulb needs replacing. How do we know that these checks are providing value, too?