One challenge we are facing is finding the right balance between test stability & pipeline speed in our CI/CD workflows. As we have added more automated tests ; the pipeline has slowed significantly to the point that developers hesitate to push frequently.
When we trim tests to speed things up, we risk letting bugs slip through. We have tried test parallelization & smarter triggers but flakiness still sneaks in; especially under load.
We’re currently reviewing how DevOps and QA can better collaborate to prioritize which tests truly belong in the CI/CD flow and which can be offloaded.
Is anyone here using adaptive testing strategies, like running different test suites based on commit type or test history? Any success with AI-based test selectors or smoke-test-only pipelines that scale well with agile delivery? I have checked The Community's Guide to DevOps and Software Testing | Ministry of Testing documentation guide related to this and found it quite informative.
Interestingly, when onboarding a new team member, the question came up: what is a DevOps Engineer and this exact trade-off between velocity and reliability came up as a core part of the role.
I would love to hear how others are tackling this balance without compromising quality / developer productivity.
Do you know which tests are the most flaky ones, or the slowest?
Consider fixing those first.
Do tests fail at random, or do all tests pass in some runs, and do many tests fail in the same run? If you have runs that are both 1) slow AND 2) have a lot of failing tests, you may have reached the capacity limits of your test server.
If you have a lot of e2e tests: How do you do your data setup? How time consuming is that? Is there room for optimization there. If you do, say, login or data preparation via the API rather than the UI.
It can be that your product, or test environment, is unstable under load. What happens if you put your environment under load on purpose (with a load testing tool)? What do you see when you play around with the product while it is on load. Is there instability now? In that case, there might be issues with the product.
I myself run all the tests upon all pushes, and in the develop branch, I don’t differentiate. I never needed to. However, we are considering a separate pipeline for performance measurement.
In a different team, the e2e tests ran much more slowly and they ran in the develop branch during the night, but then they didn’t provide any feedback to developers when they introduced a bug. I would not recommend that.
What is happening when you increase parallelism and load that is causing flakiness?
Does your test infrastructure scale out or are you using a single machine?
In your CI/CD pipeline, which environment are your tests running against? Is it a fully integrated environment, partially or fully mocked backend?
Is there a configuration where the tests are stable? At which point is that? (Related to question 1).
Are you writing too many integrated tests? Can you be more efficient in the automated testing approach (assuming you are talking front end web here btw)