A rapidly and ever changing UI, is it worth having a UI based E2E Test?

Quick background. I have been tasked with solving the issue of our UI E2E selenium test failing constantly and paging managers every morning upon failure. The issue is, there aren’t any failures, it’s just the UI changed, or the selenium test broke elsewhere un-related. With 1000+ developers making changes every day, it seems like this is a never-ending battle. So how would someone approach this?
-also quick note, the E2E regression is managed by another team outside the product and will always be behind the product teams.

My first thought is, if we can’t get a solidified UI, why even have a E2E test and just keep on using our API-Integration tests. Much more stable and aren’t tied to the UI.

My 2nd thought is to make the case that the other team shouldn’t own this as they’re always going to be lagging behind by a good margin. But this then opens up that someone on one of my teams will have to manage this full time with how rapidly it’s changing. While this would be faster turn times and make it more stable. I feel that it’s a band aid fix for a long running issue.

Thanks for any and all the help!


If the UI changes, then yeah, there is no point of having UI automation.

I’d say if the E2E automation is a must, have the devs run E2E automation on their branches before merging changes to main and fix any broken tests.


Just my opinion here Sean… But, probably you have a culture disconnection, born of pain. The pain you have and how API tests are seen as more useful than UI tests are common experiences of any large product testing effort. But we all know unless most customers use the API and not the UI, then those tests are not giving testing confidence in the thing that you actually deliver. If managers are paging you, they clearly care, but they clearly also do not care about how quality is delivered, because if they did, teams would abandon trunk based development and silo thinking which gets you to such a kind of pain.

You are struggling with a friction problem, because all those dev teams are doing is shifting responsibility. Teams are also struggling to deliver because they do not own the entire pipeline. They get blocked regularly, and they are probably often stopping to wonder, “hey, what if we did not have to wait for team X to fix the @%$£ tests?”, then we could merge this code to master today, but now we have to wait. And then nothing happens. It’s a basic abuse of the SDLC to not make every coder responsible for their tests.

But wait Conrad you might be saying, this is easy to say, but our teams also depend on other teams for changes. It’s not just the web pages that change, it’s the back-end and services too that are dependencies that have to move in lock-step too. And that is probably why you need to look at stream based development as a solution. But this does mean you will need to spin up a much larger test farm, your test farm capacity will have to grow to almost as many times as your company has features or branches in flight at one time. Additionally test code will have to live in the same repo or stream as the product code that it covers.
So yeah, it is painful, but if repairing tests can be done by the people that break them, you will discover 2 things happen.

  1. UI changes that break the tests will actually occur less often because the coders will think before they hack. Devs will also help make sure the tests test the correct thing more often.
  2. Releases will speed up because each team can “fix” their UI changes without having to wait for test team to phone them up, waste time on the phone.

You will still need testers who maintain tests overall, but the friction will be reduced a lot. You will need to second testers to the dev teams as part of this change I suspect.


A couple brief points:

Failures are good. Failures are information. If a UI element was changed and the regression test that validated it didnt fail, I would be concerned.

End to end UI tests are notoriously finicky and prone to failure. (but…wait…didnt you just say…?) yes I did, but Frequent failures can result in a “cry wolf” scenario where the test is just ignored.

So…fail some but not always? yeah…kinda.

A couple suggestions:
have a look at process. Is there a way to acknowledge the delay in test updates so that the failures are useful? say, for a given UI story, part of the story refinement might involve listing the tests that are expected to fail. then whoever examines the report can understand what failures are actionable and which are expected. (yes it would be more valuable to actually regression verify the change but…thats some other team)

This might be outside your purview but can those tests be reduced in scope? Is there mocking in place to allow for a more targeted UI test? I had a protracted fight about some automation where one tests success set up the data for another test. I never did succeed in convincing anyone it was a bad idea. so we would have cascade failures. sigh

I think there is more efficiency to be gained organizationally but I’m guessing you are limited in your ability to influence that sort of a change.


I re-read this and something off topic jumped out at me.

With containerization and manageable services why would you need a big server farm? Unless there is a need for all that horsepower that will probably sit idle any time testing of developing isnt being done directly against it, why not invest in on demand environments? We had a system known as “OPRAH” (as in “everyone gets an envronment”) that would create minimal services and resources in Azure required to run the product. each was tagged to the user that created it. Then one was free to bash away without getting elbows in each others way. The configuration was set to self-destruct at the end of the day unless someone changed the flag, greatly reducing cost of zombie resources.

you also noted that test repo had to live with the product code. I may be jumnping to a conclusion but how do you keep the test code out of prod? I get the sweats about test code making it to prod (see also Knight Capital) does it get held back by the CI/CD pipeline?


Good points Michael. Everyone’s context is king, and there is never an easy road. If there was, your competitor will usually beat you to it, and you would have to jump quickly. Everything revolves around managers taking a regular birds eye look at how efficient their teams are. Yes, the tester is often powerless, but if you do your job well, then people will ask you for “suggestions” and these 2 suggestions to ramp up test capacity and make testing the responsibility of the person committing product code into trunk/main take years to implement in reality in a large company where this does become harder. I once worked to scale up using streams, we had testers running up $2K/month AWS bills, so they wanted us to move the environments in-house and dockerise, not an overnight job. We pretty much built “OPRAH”, just like you did @msh .

A Bigger test farm, yes just using containers and cloud will give you a bigger farm, but all those containers use up more compute “resource”, and the extra logging they create, which testers and devs need access to, will use up much more storage. And of course we all know that test farms are not just VM’s or containers, they include large fully populated test databases, which will need replicating on some scenarios. More machines requires, yes a proper dashboard and more OPS discipline.

Test Code in same Repo Test code has to somehow track product code. In reality it can never live in the same Repo, only tiny tiny projects can do that. That is why “Streams” were invented, a configuration management concept which creates branches for all the impacted consumable repos only for a release train or feature, and then kept in sync mostly manually. Not all repos branch either, this branch data needs to be kept in your CI/CD system. When it comes time to merge, the fun starts, but the testers at least have working tests at that point if they branched their test along with the product-code feature branch. This does mean that testers will be switching between branches when repairing and writing tests. So all your testers will have to know how version control works pretty intimately. So no you don’t ever ship test code, another configuration management topic of it’s own, and you don’t have to really put test code into the same “version-control project”.

Thanks for forcing me to clarify Michael. :sunglasses:

1 Like

Thanks for clarifying that Conrad!
Is there a place where I can learn more about this “Streams” concept? I suspect its something I was trying to implement at my former position (until layoffs…Boo!) but in isolation and unaware someone might have codified it into something formal

1 Like

(I actually struggle to find anything via Google these days, search has been degraded by AI and Ad revenue biases.) I’m casting my mind back to a previous job here, but basically, Streams is a completely different workflow idea to trunk based and component-based development. Basically it asks you to fork every repo that will be affected by a feature. You then feed this list of branches into your CI/CD instead of hard coding in repos and branches for each stage of your CI/CD pipeline. The collection of branches is called a “stream”. Everyone must work with the stream, not with random branches anymore, this requires writing a few scripts to automate much of the stuff you probably did manually before. It’s a change in mindset, and it only works if you automate your entire pipeline properly. Hence the process changes take months to implement.

This article about built in Streams development support in Helix kinda explains it https://www.perforce.com/blog/vcs/parallel-development but note this was implemented in git in a large corp I worked for a few years back. You will want to first start by having a pipeline Pipeline (software) - Wikipedia , and then creating development streams is simple you just clone the pipeline and give it a configuration blob with all the branches, ALL the branches.

When a set of branches (called a Stream now, because it’s a collection of branches) gets merged into the develop branch (which is in reality itself a stream too now, then all the branches in that stream get deleted all at once to remove clutter. You will also want to “bookmark” them all as well so that you can rebuild the exact build again. Remember I said you will have to automate a lot of your manual processes, creating a stream, merging, bookmarking and deleting it, in fact all the version control tasks, now needs to really be pre-scripted. So a stream is a list of every branch and the changes on them. Merging is done in the same way, difference is you need to do all merges in a stream on the same day…which in reality is not unlike the pain you would have if you have component based development.


I have 3 options to consider.

  1. If your E2E tests are failing regularly for each release, look at what is being tested. Is the test checking too many things? Consider reducing the extent of what the E2E test checks. Focus on the critical areas/paths.
  2. Start including (if not already) the automation team in the planning phase and have them call out that X change will break the automation and the time to recode the automation needs to be included in the planning. I have also been successful in the past with getting an environment with early development code (prerelease to testing) where I can exercise the automation and start making changes to the automation before I get the code to actually test. This is usually a developers branch and not the main branch.
  3. Get agreement that the development team’s DONE criteria includes the successful run of the E2E test by them.

Wow so much information. Thank you all and I see I need to add a couple links to my reading! Thank you.

Quickie update: As I looked into this, things didn’t add up and I learned I was armed with half the information so during my research on this issue I was tasked. It actually isn’t failing that often. In the last 150 days the E2E failed 16 days… which is actually a lot less than I expected. However, those 16 days are split into chucks that occurred (4 days in a row, then 3 days, then 5 days in a row etc…) which gave the perception of it failing all the time. I also had to set the expectation that these times were during deadlines and major changes/updates. So, we should expect that E2E to fail and need updating.

Currently we have shifted a lot of quality automation onto the Devs and it has been working well. Since we have a lot of APIs we do leverage integration a lot more than E2E(which makes sense following the test pyramid.) So that’s where our focus has been since we had 0 integration tests.

To the other points @conrad.connected , we definitely have a disconnect where quality equals pain and heartbreak but everyone recognized that. It’s become a goal for IT in 2024, Quality First Approach, we’re building metrics and actual things to track for 2024 and see how we can improve or change. So, I agree a culture change is needed and we’re actually building out roadmaps to figure out how we’re going to meet our goals. Part of my new role is help identify problem areas and skill up others in quality processes and mindsets.

@msh I love the Oprah Service. I’ve actually scheduled a meeting to do something similar, so it’s awesome to hear that others have done this so that really helps and it should help as we’re in the process of containerizing our systems.


I think of it from an engineering perspective and would get some engineering lead/manager to figure it out.
When building automation code, a product is built. This product depends on the main product.

It’s not more different than using the service of another team/department, and they make breaking changes ahead of the team using that service integrating the new changes.

Ideas that were considered in the past where I worked:

  • reduce reliance on UI checks
  • stop for a while, and wait for the UI to be stable
  • code only the scenarios that rely on stable UI
  • trash the UI automation and increase API automation
  • make devs do the integration fixes, either fixing the main product and re-adding the IDs or the automation product
  • let the automation engineers go(move, fire, transition to diff role) and use better testers who can identify problems quickly and efficiently by exploring, experimenting, and investigating in the app.
  • add more automation engineers to work on it to try and keep up with changes
  • integrate the developers and automation people in the same process and make them communicate and work together with similar goals.
1 Like

Thank you very much for the detailed reply. I’ll use that article to launch my own exploration of this concept.

excellent! I love getting devs involved. One experience that I had was that developers think in a particular mindset about solving the coding problem in front of them. testing that solution with the mindset of a Quality engineer doesnt come naturally. and thats to be expected. So one activity that needs to happen is reviewing those tests they create and iterating on best practices. I started out small and worked with development leads to make a successful smoke test a requirement for issuing a pull request. then as they were adding unit tests we imolemented a “positive, negative, null” paradigm for tests - that is there had to be a success expected test, a fail expected test and a null input properly handled test at a minimum for any tested code block. this was practice for and carried over into other tests, Integration, API, etc.

Well if the devs are testing why did they keep me around? Well because like many QA folks, I knew where the bodies were buried and having become a seriously SME of the product, I could identify likely issues just based on backlog refinement of stories. Knowing all of this stuff and being that SME with a Quality midnset is where QA expertise will always be needed.


@msh totally agree! that is what I did with my last team. I got the devs all spun up on quality thinking and approaches and trained my replacement who will keep the push going with automation, since got moved to a new team that supports multiple teams now and the expectation of me is that I should be working to support multiple teams to get on that approach and mindset.

1 Like

I think since the UI keeps changing and causing false alarms, it might make sense to focus more on stable API-Integration tests. But we should also talk about who takes charge and think about a smart way to automate that’s not too complicated. We can look at testing just the important parts to find a good balance.


Your team is working backwards.

One should first change the test code in order to drive the production code, not to baseless change the production firstly and then change test code to validate the changes you already did.

The later approach doesn’t give you a base to drive development and creates need for work that don’t deliver any value (since the product behavior and design was already implemented).

1 Like

FYI, I shared this post on Twitter/X and it generated some discussion.

I support this in relation to literal “End to End” and quote myself from this other thread:

Think it like this: Your automation checks the application BY (and including) the GUI. The automation takes the GUI as interface while the very nature of Graphical User Interface is to be used by humans.
Surely you should check the GUI itself and its integration with the server. But for business logic you could APIs as well.

I suggest to think about what you try to achieve with what automation.
For what is the current automation used and what could potentially changed in the future.
(Sometimes automation is used to counterbalance bad testing / bad test processes, finally distrust.)

I find this hard on GUI level. How do you do that?
By my experiences (most parts of) automation on GUI level (especial element locators) can be written only when the application is done for that part. You need an existing GUI to write the automation for it.
Its not just the locators alone, but also the structure of the GUI. Which elements are placed where, what is DOM hierarchy, on what elements do you need to wait.

Import is imo having a fast running GUI automation (as proposed in my previous comment), and in general fast feedback cycles by different meanings, to be able identify the changes fast.
e.g. the application developer changes the application code and minutes later can the automation developer (often refereed to as tester) run (e.g. locally) the related parts of the automation against it, finds the deviations and fix them.
I often change automation code on the feature branch on which the developers change the application.

I’m with Seb here too. We all use web apps ourselves and find bugs in them every day. We then ask “Did you test this app?”
Imagine if the answer was “Yes we tested the API underneath it.” That would make your company the laugh. Testers shall test what the customer receives first and foremost. It’s like being that tester in the hammer factory, who checks that all the handles are made from finest Hickory, and has a straight grain, but does not check that they are wedged into the head properly.