Should False-Fail Automated Tests Be Marked "Ignore" or Remain In Your Face?

We have an automated regression test suite consistently failing about 400 tests out of 1600. These are false-fails, failing b/c of problems with the tests and/or environment. The tests have some dependencies that are missing. An obvious problem is: It takes humans several hours to analyze the results and determine if any test failures are indications of problems in the product-under-test.

A prior solution was to standup a tiger team of three test automation engineers to fix those 400 tests. After about a year, that effort has not moved the needle much.

I think the next step is to stop the bleeding and mark those 400 tests “ignore”. That will bring us to a state where a test failure is likely something we care about. Those 400 can be brought online as they are fixed. An argument against this is, if the 400 failures are not in your face, you will never deal with your environment or test tech debt.

What can you suggest to solve this problem?

4 Likes

These regression suites grow because it’s very cheap to manufacture and execute an extremely specific check at the expense of long-term costs, and people want to profit from that saving without dealing with the debt. If nobody can figure out what the benefits are then I don’t see a problem with binning the whole thing.

These kind of issues are also symptoms of other sicknesses. How do you know that the tests that are consistently passing are even doing anything? After all by ignoring passing tests passing you’re essentially using natural selection to only have passing tests. There has to be more to testing than superstition, or you might as well not run the suite and say you did.

I’d say this is a context-sensitive decision based on levels of understanding of the software project, the other software project (automated suite), how the team(s) are organised and how much money you want to hurl at it. There’s lots of questions to ask here - which tests are failing, why, and who wrote them. What do we think these tests do - provide coverage, make managers sleep better, appease the customers, follow policy, abide by laws, etc? How are they serving the testing mission?

Marking as ignore will make everyone do just that - ignore the problem entirely. If those checks were providing less value than dealing with them that’s a fine idea, but you need to understand their purpose (not just the abstraction loss in their stated purpose). Perhaps even delete them. Nobody’s fixing them, nobody’s running them, get the hard drive space back. This has an added emotional impact - it’s more meaningful to say you’re deleting code. Imagine the difference between putting your possessions in a box in storage, even though you know you’re never going to even look at them again, and choosing what to put immediately in the bin. Ooh, y’know, having to throw this stuff out, maybe I will take up watercolours again.

Leaving everything as is might be a way to provide enough pain to cause people to do something to stop it hurting, provided the pain is sufficient and the right people are feeling it. You could also delete all problematic tests and get those who wrote them to write them again. You could mandate each team is responsible for its own coverage and breakages, and they have to investigate any problems each time. The tiger team is a short-term solution but it’s shifting the cause of the problem away from the problem creators, and you can only clean up after people for so long before they have to learn to do it themselves.

One option might be to shred the whole endeavour. I think that helps to frame the ideas nicely - well, what are we actually destroying? Doesn’t it have value over making me feel all comfy inside? What problems are getting into production, out through the users and back to us? What actually is our coverage? Do we even need to run these any more? Who’s paid to make this work, and can we ask them why it’s not working? What would it cost to fix? What would it cost to replace? Really? Ooh, maybe I will take up watercolours again.

Edit: To be more solution-oriented I’ll say that looking at purpose can be a great way to facilitate change. If it’s just there because it’s cheap to run, in a “hey, who knows?” kinda way, then any cost is important. If it’s there because our testers know what they’re doing (including if those testers aren’t called testers), then that person holds the purpose. Sometimes tests go in because “eh”, sometimes because “ooh better check”, sometimes because “if this fails again we lose our biggest customer”, sometimes “if this goes wrong people die”. Purpose gives you a sort of nexus to make business decisions about projects like automation suites.

3 Likes

Maybe I asked in a confusing way. I’ll reframe the question.

If you find a bug in an automated check, do you prevent that check from executing until it is fixed?

1 Like

Without knowing your situation, I have a bunch of questions that may be worth considering.

  • When you say that those 400 failing tests are false-fails, have you actually checked all of them are false fails or are you assuming that they are false fails and actually half might be real fails?
  • If you’ve got 1/4 of your tests false failing, have you checked the other 3/4 or real passing and not false passing? Maybe some of the problems with your tests and environments are giving you things that erroneously look like they’re working.

Answering your reframe question. No, I wouldn’t advise preventing that check from executing until it was fixed unless there was a very good reason, I’d get people to try and invest in fixing the test before work can continue on the product. And if I was going to prevent it from running I would go through and make careful decisions and get a commitment to get it passing in a defined part of the future or to just delete it. But… saying all this, I am a strong proponent for “don’t have things alerting that you wouldn’t fix” so I would go for “don’t have red tests that you know are red because you’ll ignore the reds that you shouldn’t be ignoring”.

1 Like

Context will define the value of that choice, so without it the question is impossible to answer. I can give examples where I would and examples where I wouldn’t. All I can do is give some heuristics to help evaluate the problem.

2 Likes

How long would it take you to fix those tests?

It’s one thing to disable tests for a few days… it’s another thing if it takes months…

I’ve disabled tests before while I fix them, but usually that’s a day or so - max a week and a half if there are a fair few and I have other tasks I need to prioritise as well.

1 Like

I would say yes, if that is possible.

The tests do not add value if they’re reporting false fails. This noise builds an air of complacency “oh its just another false fail” - the one time you ignore it will be the one time it’s real.

Get the tests parked up, fixed, and put back into the suite.

At the same time I’d analyse if the tests have any value - you’ve 1600 tests, and 400 are currently ‘not working’ so would you really miss them if they were gone for good? A good way to find out is this exercise - you might open a test to fix it and find, hey, we don’t even need this any more - or - this is a bad test, too flaky, lets just chuck it.

Good luck!

2 Likes

On first glimpse it seems to be false-fail but untill you analyse you want be able to decide whether its false false & definitely its time consuming & defeat the purpose of getting ROI from automation.

I am not sure if this is also correct or not , out of 1600 , these 400 false failures should not be the same , in iteration-1 some 400 fail and in iteration-2 fail .

I will suggest that I have worked on some reports which provide grouping for example if 50 test case fail from login then all 50 test case grouped in bit bucket which is easy to analyse and to decide quickly if you can skip those test case or need fix.

Other then that if these test cases are flaky then I would suggest to implement re-run feature so failed test case will trigger automatically and you will have less fail and less debugging.

@allcapstester, yes there was/is an analysis effort after each test suite execution run. Said analysis examines the fails and either applies tacit knowledge or manually re-runs the check to determine the bug is in the test rather than product-under-test.

I like your comment “don’t have things alerting that you won’t fix”.

My current situation is probably a-frog-in-the-boiling-pot-of-water situation. Fixing a false-fail test (and blocking other flow) when it first arises sounds smart and feasible. Fixing 400 false-fails (and blocking flow for months) does not sound smart or feasible.

And what @whitenoise said resonates with me, the recurring false-fails are having a bit of the-boy-who-cried-wolf effect. “Oh no, a test failed!..probably just a false-fail”

1 Like

Pretty much summarizes why you should delete ignored tests. Chris is always so clear and direct. If skipped tests have no unique knowledge or value today, they are unlikely to still be valuable tomorrow in a world where things increasingly move towards obsolete at greater pace. Failing or ignored tests only point to maintenance culprits when you play blame games, or metrics games and I’m no fan of those either. Let the boat sail, it will sink in the right spot if you stop bailing.

Unless you are following some laws or compliance coverage. In which case I suggest a small kick-off meeting to find resources, and make a Jira epic to go and fix it. Then ensure that your project manager gives all teams enough time in the project budget.

1 Like