On the 12th February, João Proença will spend an exciting hour on The Club tapping away on his keyboard answering any of your questions related to understanding when you should delete that failing automated test. Ask your question to get it answered.
You can ask any question on the following areas
Understanding why people are resistant to deleting automated tests.
The perils of not deleting any tests.
Criteria to use when deciding to delete a test (or not).
What we can do to feel more confident in deleting tests.
Get your questions in here before 12th February at 7pm UK Time, and I’ll answer as many as I can in the Power Hour. Join me at TestBash Brighton 2020 to hear my talk: Should We Just… Delete It?
I hate disabled tests, but they do tell a better story than deleted tests that nobody that’s in the office this week, even know existed. An automated test can often be better than a requirements document that nobody can find. I assume we mean “skipping” in most cases? I am also not going into test duplication, kill all duplicates is my motto.
When you move to a new code repo, all history of a deleted test (assuming we even know where to look) is gone. So I am a fan of sometimes deleting (disabling a test using a comment-block or renaming the files; but still making it searchable in my code), and sometimes disabling/skipping the test using config. Just archiving it effectively.
Skipping a test implies you intend fixing it, but making it invisible to the test-system means you don’t want to fix, it, but want to leave a record. I think you can tell, that this week my team just unfroze a test case that we buried 3 years ago. Without these deep frozen bits of code, a newbie tester has no way on knowing what things don’t work.
I’m ignoring the archiving of unstable tests, tests that never really were stable in all environments. Those ones require a “bazooka”, early on, in my opinion.
(Edit: to clarify, I’m only talking about automated checking here.)
Shouldn’t each check in an automated suite be explicit in its purpose and significance, so you can determine its worth and tell what it’s doing just by reading its name, the name of its group, classification, etc.?
I’m asking this in the context of inheriting a large suite of 1300+ checks, where we as the testers are not sure of the value they’re bringing, since it’s hard to determine what they’re actually checking, and sometimes we’ve found they’re not actually checking what their name implies they’re checking.
Why might that situation happen?
What problems arise which might cause a suite to get that way?
Is this problem relatively common in your (and others, if you care to chime in) experiences?
Is the first paragraph in my comment a realistic, feasible goal?
For situations like the suite of 1300 checks where it’s hard to determine purpose/value, how do you go about determining a solution to the problem? What questions do you ask yourself? Do you start over and implement a new suite aiming to avoid the same problems? Do you wade through the muck and try to reorganize, reclassify, rename, and potentially thin out the existing suite of checks?
I would use different criteria for deleting manual and automated checks. Manual checks cost much more time, so I would be aggressively refactoring manual test cases in “mega-cases” that track a user journey or task. Automated tests are cheaper, but can suffer from hidden duplication, and often will age badly creating high triage costs. Nothing new there.
Your test management system for Manual Test cases should let you delete tests, and still track the entire process of editing and deleting (I assume you don’t use excel.) There are a lot of people who don’t like test cases being captured at all, but if you have more than 2 or 3 exclusive environments to cover, tracking the cost per environment gives you some metrics that you can use to manage resource allocation.
Automated test names are super critical, but I suggest using namespace and not allowing anyone to check a test called test_login_purchase_logout() , that’s 3 things at once in my books!! I would rather have purchasing. test_login() , and have a built in “assumption” as a teardown fixture that every login test will assert on a logout failure anyway. Once again, no rocket science.
Hi everyone and welcome to this Power Hour on deleting automated tests!
Over the years I’ve thought a lot about this topic and my experience tells me that the context each one of us comes from can really change the way we see things. I’ll be sharing my ideas through the answers to your questions and hopefully you can apply them in your professional day-to-day. Just note that sometimes what makes a lot of sense in a specific context may change completely in another.
One more thing: I’m focusing on test automation mostly, even though some of the ideas may make some sense even when we’re talking about deleting scripted manual tests.
Deciding to delete tests can be hard and the discussions around the topic with stakeholders can be equally hard.
First, I believe we must acknowledge that we, engineering professionals, are naturally resistant to deleting tests. This is highly tied into to cognitive biases that most humans exhibit such as loss aversion and the endowment effect. Understanding these biases can help us to better deal with our natural behavior (and also the behavior of others) when going about deleting tests, and not fall prey to fallacious ways of thinking.
Then, if a team overly relies on automation and is really adverse to the idea of deleting tests, maybe they are experiencing a false sense of security over their automation set that we should put into question. So if you have a test or a suite that you believe should be deleted, start by laying out what are the risks that are being covered by that test or suite. What is the worse that can happen if we delete the test and consequently a bug is released into production? What’s the impact for our users? How probable is it?
The risk being covered is just one side of the coin. In my experience it’s easy for teams to believe that the more automated tests the better (so why even think about deleting some, right?), but that’s not necessarily true. Each test has a total cost of ownership which includes the resources you use to run the tests (servers, network, etc.) as well as a “people cost” if there’s any sort of maintenance involved (mostly due to test failures that do not imply a bug being discovered).
Also, we should keep our feedback loops short so that tests are providing their value to the team in the best way possible. The tendency when we have more and more tests is usually to make the feedback loop from automated tests longer (even if we’re doing parallel test runs in our CI/CD pipelines to some extent).
So if you are able to quantify these costs in some way, use that information to further justify your decision to delete a test.
I believe it is worth reviewing already existing tests (disabled or enabled)!
Assuming we have been doing things right and creating tests for the right reasons, that will mean that they are mitigating risks that are important to us. But let’s think about “risk”: it can be expressed as a possible negative impact that may occur with a certain probability in the future.
The key thing here is “probability” - it doesn’t necessarily stay the same over time. Various factors may make probability change. Some examples:
Technology evolves and some things become less relevant to test. For instance: nowadays browsers are much more alike than they were a few years ago, so should you perform a UI test over all of them?
Features go through their own lifecycle. Maybe they have been slowly replaced by other features or they have become really stable.
The likelihood of something bad happening may be estimated by us at the beginning, but then as time goes by we start having evidence that we misjudged that likelihood in the first place (it may be higher, it may be lower, but that tells us something about the risk itself).
So we should analyze the reasons (risks) why we created the test in the first place and think about what has changed ever since, in our context and our assumptions, to realize if it still makes sense to run the test or not.
About the “delete vs disable” questions, I believe that what’s key here is what “disable” means for each one of us in our specific context.
For me, what makes sense in a lot of situations is our ability to recover old tests in the future, if we decide to delete them in the present. If those tests live in source code and you have source control in place, then you can just delete them because it will be easy to go back in the revisions to recover them. If tests do not live in source control but rather in an automated testing tool that doesn’t have source control, then disabling them in that context may make sense instead, because deleting them will be a permanent thing.
However I have seen several situations in which “disabling” and not “deleting” has its own problems. In a lot of places disabled tests have this tendency to come back to life because someone without context sees them and decides to re-enable them. Also, if your ability to search and visualize your test set as a whole becomes worse as you grow the number of tests you have, then having disabled tests in the mix can hurt you on a daily basis.
I’d also like to point out though that disabling tests can be useful to make us more confident about deleting them. If we are facing a though decision to delete a test or not, the team can decide to disable it for a few releases and see what happens. After a while, when we look back and realize that we didn’t really miss having the test being run, then it’s clear that we can just… delete it (or not)!
To be honest, given my experience, I’ve mostly thought about the topic of deleting automated tests. I believe that for manual tests some of the ideas may be the same, but probably there will be some differences in some areas.
As for the criteria, for me it’s about revisiting the “value vs cost” ratio of the test. In other words: the risk being covered by the test vs what it costs us to own that test (especially what it costs to run and maintain the test).
If we are able to somehow quantify these and evaluate the test as being high / low risk and high / low cost, then deciding becomes more obvious (or at least it becomes easier to discuss with others):
If a test has a low cost and covers a high risk, then it’s probably a keeper.
If, on the contrary, it’s high cost and low risk, then we should definitely consider deleting it!
If it’s high cost and high risk, then it’s time for us to figure out what we can do to move the test into the “low cost” area, or what alternatives to testing we may have in order to manage the same risk.
An automated test (and suite) should absolutely be explicit in its purpose and significance! I totally agree with what you’re saying and in my experience a lot of organizations suffer with unclear automated tests!
A test may be unclear in its purpose because the person who created it wasn’t very experienced or because there was a greater concern regarding shortening the time it takes to build the test, at the expense of the time it takes to maintain the test.
Test maintenance requires more effort when the goal of a test is not clear. People who are addressing a failing automated test will usually have to, first and foremost, understand what is being tested! If just that basic task is hard to accomplish, then that is a test that for sure will require lots of maintenance effort in the long run.
Another important aspect of tests not being clear in their purpose is that they’re usually hard to delete - we are all much more afraid to delete something we don’t fully understand than something that is clear in its purpose.
In these situations I believe we should first investigate and define the goal of the test. Once we have that clarified, then we must decide if it makes sense to delete the test or not, given its goal. But even then, sometimes the purpose of the test doesn’t make much sense, but still is has found us bugs in the past (so we feel it is still valuable). This sort of tests tend to be “end-to-end” in nature, exercising a lot of areas of the software and thus being likely to find problems along the way, even if completely unrelated to their purpose. When we face such a test, it’s really useful to have historical data (usually obtained from test management solutions) that allows us to understand how many times the test has failed in the past and which of those failures made us identify a bug in the software. If we cover all of those bugs with smaller, more focused tests that are clear in their purpose, then we will feel more confident about deleting the original test.
I would like to add to your idea of automated test names - another problem of creating a test that’s testing 3 things at a time is that usually, when it fails, it won’t be immediately clear what part of the test is failing, which results in more time taken by someone just understanding that. That extra overhead figuring test failures multiplied by the number of times the test fails can really add up!
I would also like to ask you something. You talked about automated checks aging badly; do you believe automated checks age differently than manual checks (better or worse)?
OK everyone! Thank you so much for your questions!
Don’t forget that I will be delivering my talk “Should we just… delete it?!” at TestBash Brighton 2020, where I will be going into a lot of these ideas and more! It’s going to be an awesome event, so I hope to see you all there!