Are we running too many regressions than required?

Most teams keep adding tests to their regression suite, which results in thousands of test cases over time. We rarely remove tests. Eventually, regression cycles take hours to run and require constant maintenance.

This situation raises an interesting question - Are we actually improving the quality or just accumulating more tests?

Do we really need such large regression suites, or can there be a better approach for regression testing.

You’re not improving the quality, unless those cases are generating useful information which is then acted upon by those with more direct influence over quality.

My guess is that a long regression suite will provide some value. The question really is: Is the regression suite worth the cost?

So you can look at the costs of regression, which are mainly in terms of time. The time taken to decide what cases to add, the time taken to write them and add them and debug them and test them, the time taken to run the case multiplied by the number of times you run it, the time taken to review the results, the time to update them when anything changes with the software or the suite or the tooling, and all the other conversations and decisions and meetings and training that a software project (like automation) takes.

If the suite is run by people then
 well I have what I hope to be well-known opinions about explicit test cases. I think they are not good. But the costs of that are astronomical in modern software, one simply cannot assign the monetary, emotional and morale expense to a person to pretend to sort of follow some instructions for reasons not best explained. It would take an extreme situation for me to think otherwise.

Either way the costs will also depend on how often the regression suite has to change, and so on. If the software and market doesn’t change then the tooling won’t have to change much either.

There’s also the cost of not knowing what it does. Does anyone really know how each of these checks serves the greater test strategy, given how limited they are?

Then you have the benefits. That really depends. If your software simply has to perform some basic, logical steps, or people get hurt, then you will likely need a regression suite to help cover them.

That being said I think there are many bad reasons that people have large regression suites, and the chief amongst them is fear. People are scared that if they don’t run the suite they’ll miss something. This tends to come about because those people do not understand how hideously limited a check is. What actually happens is that some system that simulates the data and state and interaction with a piece of software (in ways that often don’t reflect real world use), in very limited ways and make extremely limited “observations” that are processed in a rigid, logical way. Then we call that “test_login_success” and pretend that the name describes what happened - that we’ve tested successful logins. Which we have not, we’ve done some things and simulated a click and looked for one or two signs of success, while ignoring any other issues or bugs.

So people go around thinking “I’m sure glad there aren’t any problems! Look at all the green!”, while not realising that doesn’t mean the software is tested, or, gods help us, “100%” tested. Often they don’t know what the suite says it does. If they know what it says it does they don’t know what it actually does. And if they know what it actually does they don’t know why it does that. Not always, of course, but it’s pretty common. And if the expense of doing something nobody understands is getting that high without someone doing something about it
 that says to me that it’s serving a purpose, even if that purpose is mainly ceremonial in nature.

If you’d like to sort this out and you have the political power to influence such things, then I recommend going through the suite and asking “what if we just threw this bit away?” Often suites are covered by unit tests, or repeated in other testing, or pieces are added “just because” and nobody puts an expiry date on them. Sometimes they are important. Sometimes they’re important and not covered by the check - and may require actual investigation of the release candidates. You could consider taking anything that’s lived there a long time and deciding on some sort of category - a golden check that will always live on because it’s important and unchanging and well-written, all the way down to “why is this still here?”. Purpose is a great nexus for this - what risks are these checks actually mitigating? Are they thought-through, or just added like another teaspoon on a mountain of washing up? Do they actually check anything close to what they claim to check? Are they defensible in light of their cost? Do they actually find important problems?

Those will be the real answers to your questions. Regression suites can be very valuable. They are frequently not.

You might also solve the problem with grids and parallel tests and so on. But you might just be laying expensive offers at a shrine to a force that doesn’t exist.

Still, and either way, best of luck.

Further reading:

@kinofrost Your point about Cost Vs Value is a critical consideration that I missed.

Many teams assume that a growing regression suite automatically means a higher quality, but they miss on evaluating whether each of these checks are still serving the purpose they are meant to.

In real-time scenarios, I have seen regression suites grow because tests are easy to add but hard to remove. Over time, suites become a mixture of critical validations, duplicated checks, and legacy tests that no one is confident enough to remove. The result is longer cycles, higher maintenance effort, and sometimes a false sense of coverage, causing not just loss of time and effort but significant amount of money.

What I find interesting is how few teams actively manage regression suites as living assets. Questions like these are rarely asked: Which tests actually catch defects? Which ones are redundant with unit or API tests? Which ones protect high-risk areas of the system? Which ones are simply historical artifacts? This is where I think approaches like risk-based regression, intelligent test selection, and observability of test outcomes become important. Instead of running everything every time, teams can prioritize tests based on change impact, failure history, and system risk. In other words, the goal should probably shift from “more tests” to “more signal.”

Curious how others approach this. Do you actively prune your regression suites, or do they mostly grow over time?

Interesting, I need the more perspective from antoher people

For us, we don’t sit still with our regression testing. We continually use risk based testing and include the automated tests in that assessment. So each release as asks the same question, “What do we need to test?” and be selective.

The benefit of doing that is you start to accumulate very interesting information about your test packs. What tests haven’t been run for more than x releases? Which tests require the most regular maintenance? As time evolves you can start archiving (not deleting) test cases that one day were very important, but today, we haven’t needed them for the last 10 releases and the roadmap isn’t showing any changes in the area under question.

Its quite liberating as you feel true ownership of your testing assets and that you are evolving them with the product.

I think, this is a very mature way of managing regression testing; here you are treating test cases as an evolving asset. This is a practical way to keep your suite relevant while preserving historical coverage.

One challenge I’ve seen in larger systems is that maintaining this level of visibility across test packs becomes difficult when tests are spread across different tools and frameworks. It becomes harder to track things like test usage, maintenance effort, and actual value over time .

Curious how you are tracking those insights today. Is it something you manage through your test management system, or through custom reporting?

Picking just one angle of this.

I’ve encountered a lot of teams running all their tests on all browsers.

The alternative to this is to research the risks associated with browser differences and create a smaller specific set of tests to cover these risks.

The risk focus can be a big part of this, when a change is made do you actively consider the specific regression risks of that change or do you run everything every time. The latter will carry a lot of waste from some aspects but its often a valid choice when teams opt not to spend that time on risk analysis.

Quick question worth asking your team: when did you last check which tests actually caught real bugs?

Most teams have this data somewhere like pass/fail history, defect logs, but never really look at it. If you’re running 2,000 checks and the same 40 keep finding actual problems, that’s worth a conversation. It doesn’t mean you axe the other 1,960 overnight, but it moves the discussion from gut feeling to something concrete.

The point probably isn’t to have fewer tests. It’s to have tests you can actually justify, where each one has a clear reason to exist and someone who genuinely owns it. Most suites are nowhere near that bar, and that’s the real issue.

Systems thinker Russ Ackoff helped think about which tests to automate. I wrote this blog about what I learned: https://testandanalysis.home.blog/2024/06/18/how-do-you-decide-which-tests-to-automate/

@KavithaR now with AI assistance we could useit summarize the risks against those PR to identify the impacted code and components and smartly pick those tests for regression rather than running entire suite of regressions which takes lot of time and efforts

I came across this thread a while back and it stuck with me — @kinofrost’s framing of the “fear of large suites” was the part that landed hardest, and it eventually pushed me to write a longer reflection in Mandarin. Bringing the core framing back here as a small contribution to the conversation.

What I ended up calling it: **the Sieve Theory**.

Imagine bugs as gravel and sand falling through a funnel:

- **Unit tests** — finest mesh. Should catch ~80% of small logic errors. Cheap, fast, caught at dev time.

- **Integration tests** — medium mesh. Catches the cracks between modules.

- **UI / regression tests** — coarsest mesh. *Should* only catch the “fish that slipped past” on critical paths.

Most bloated regression suites I’ve seen are bloated because the earlier sieves had holes. We didn’t trust our unit/integration coverage, so we built a steel-wire net at the most expensive, slowest, and most fragile layer to compensate. We end up paying for the failures of every prior stage at the UI layer.

A second cost worth naming directly: **a bloated suite erodes RD’s trust in QA**. Flaky failures from over-tested UI scenarios get reported as bugs, can’t be reproduced, and slowly QA’s credibility drains. That 1% irreproducible UI failure you keep filing — it’s costing more than the suite costs to maintain.

I now ask myself four questions before adding any regression script:

1. **Diagnosability** — If this fails, can I locate the broken layer within 1 minute?

2. **Redundancy** — Is this function point covered at API or unit level?

3. **Maintainability** — Am I confident I can fix it quickly when the UI changes, without it becoming flaky?

4. **Significance** — If this test disappeared tomorrow, would anyone feel the quality drop?

The hardest part isn’t the analysis — it’s the **accountability problem**. Deleting a test is technically easy, psychologically much harder. Whoever proposes the cleanup tends to inherit the cleanup work, and if anything goes wrong post-deletion, they own the consequence. Without explicit ownership of “this test exists because X”, every test eventually becomes technical debt no one can responsibly remove.

Full piece here: æˆ‘ć€‘çœŸçš„è·‘äș†ć€Șć€šă€Œç„Ąæ„çŸ©ă€çš„ć›žæ­žæžŹè©Šć—ŽïŒŸćŸž MoT è«–ćŁ‡çœ‹æžŹè©Šçš„çČŸç°Ąèˆ‡ćƒčć€Œ | Elijah’s Quality Lab — happy to dig into any of these threads further.

One angle I would add is that “too many regressions” is partly a question of what signal each run produces.

A lot of regression suites only answer: did this scripted path still pass?

That can be useful, but it is a narrow signal. If we already pay the cost of running an important E2E or API flow, I would rather get more information from that run: what changed in the API responses, which endpoints were touched, whether the response shape drifted, whether the same user action now causes extra calls, etc.

So for me the goal is not just fewer tests or more tests. It is better gates.

Some checks should probably be archived. Some should run only when the impacted area changes. But the few flows we do keep in the core regression pack should be treated as high-signal probes of system behavior, not just pass/fail scripts.

That also makes pruning easier. If a check has no clear risk, no recent signal, and no owner, it is hard to justify. If it protects an important behavior boundary and gives useful change information, it earns its place.

Are we talking E2E?

Coz we have to talk costs first. Regression should happen closer to the code than the UI. That said, regression management should be a recurring activity, an audit you can say. Tech debt is bad.

But then, like everything else in software, its all context driven. Any bug should not be added to regression, I’d actually go further in depth to resolve root causes of issues so the process handles regression itself.