Masterclass Further Discussion: The UI Test That Cried Wolf

On Tuesday, @alyssaruth will be joining us for a masterclass focusing on what we can learn from flakey automation and what we can do about that.

If we don’t get to your questions on the night, we’ll add them to the thread below for Alex to answer later. If you’d like to continue the conversation from the webinar, this thread is an excellent place to do that :grin: Share resources and follow up success stories from your learnings here!

Of course, the recording of the masterclass will be available to everyone in the Masterclass section of our website after.

The recording is live!

1 Like

Questions we didn’t get to

  1. Is there ever a time (exception) where a sleep would be a good way to handle a flaky test? (you wrote generally it’s a bad idea, but when is it a good idea?)
  2. What was the alternative for cypress?
  3. ES and Kibana looks very helpful - Any tips about how to start the integration between your automation to these tools?
  4. What is the average of single test execution time?
  5. My team has considered moving off of our current UI testing framework and moving to another. You mentioned your team moved to Cypress a few years ago. Which tool were you using before and what made you make the switch? What was your process in considering making the switch?
  6. How much trust should we have in those flakey tests?
2 Likes

Not a question from me. Just wanted to say I enjoyed your session @alyssaruth. Lots of similar issues and ideas I’ve encountered before, and some that I think would be really help me. Thanks for sharing your story. And a great host as usual @gwendiagram

2 Likes

Is there ever a time (exception) where a sleep would be a good way to handle a flaky test? (you wrote generally it’s a bad idea, but when is it a good idea?)

Yes, I think the reality is that there will be some circumstances where a sleep is unavoidable - which is fine, provided they’re the exception and you’ve exhausted other options first! In general, it’s always better to wait for an explicit condition to be met (e.g. wait for the page to have loaded by checking for the presence/absence of a particular element) than to wait for an arbitrary length of time.

So, the scenarios where a sleep is a good idea are exactly those where you have been left no other choice. For us, so far this has only applied to a certain third party integration around processing payment, which pops up a window (an iframe) to interact with. Cypress doesn’t deal with iframes particularly well in general, and this particular iframe also doesn’t expose as much as we’d like in terms of CSS classes to signal state changes. This means we currently don’t have a way to concretely tell the difference between when the iframe is in a “loading” state vs when the form within it is ready for us to type into - and so we’re stuck with a wait that’s long enough to “practically guarantee” that it’s ready. We’ve pulled this wait out into its own support method, with a descriptive name so the intention behind the wait is clear.

But if that were to change - maybe after upgrading the third party version or Cypress - then we’d rip those waits out right away!

What was the alternative for Cypress?

My team has considered moving off of our current UI testing framework and moving to another. You mentioned your team moved to Cypress a few years ago. Which tool were you using before and what made you make the switch? What was your process in considering making the switch?

Hi both, thanks for the questions! I’m going to tackle these two together.

The project that we’re using Cypress in was actually greenfield, so Cypress was our first choice (we didn’t switch to it from another framework). We had a few devs on our team who’d had bad experiences with Selenium WebDriver in the past, so we decided to give something else a try for comparison. We found it to be a really good developer experience - things we liked were:

  • Writing tests was straightforward, for example Cypress has built-in waiting for your assertions so you don’t have to think about that bit much.
  • The documentation is pretty extensive and useful.
  • Tests run quickly and easily locally.
  • The UI when running the tests is really clean/nice.

It wasn’t all smooth sailing by any means - we did have to maintain a fork of it for a little while due to a Service Worker specific issue that took a while to get fixed. But in general they’ve been responsive to issues and it’s been a good fit for us - we’re back on the main trunk now!

I guess if we were to look into a replacement for some reason, the sorts of things that would be top of my shopping list would be:

  • Built in videos/screenshots for when tests fail.
  • Ability to run tests locally easily
  • Ability to write the tests in your desired language (we write in TypeScript, which is nice as it’s the same as what we use for the regular frontend code)
  • Support for multiple browsers / the ones you care about
  • An active community, regular updates, issues being responded to etc.

Back when our E2E flakes were really bad, we thought a couple of times about spiking TestCafe because we’d heard good things about it. At this point I’m kinda glad we didn’t, though, as I think it would have been a bit of a distraction - it turned out most of our problems were our own doing and just needed better investigation. I think it’s easy to fall into a “grass is always greener” mentality when it comes to tooling - you feel the pain points of what you’re currently using, but don’t necessarily know what new ones you might inherit if you switch…

Sorry - a bit of a long answer! I guess the TL;DR is, there are a bunch of tools out there and probably multiple that will do the job you want well. Before considering switching tools (potentially a long and painful process) you want to be really sure that the problems you’re facing are because of your current tool and not a symptom of something else.

What is the average of single test execution time?

This question comes at a good time as I’ve just finished going over all our Cypress tests and pruning/speeding them up so I have the answer to hand!

As things stand, we have 139 cypress tests across our two webapps, and adding up the runtimes of each (across a few builds) gives a total execution time of around 17 minutes. So that would mean approximately 7.5s per test on average.

We parallelise the tests per build too, though - those 139 tests are effectively divided into four roughly even chunks taking around 4 minutes each. So, allowing for a bit of variation here and there (waiting for build agents and suchlike) we’re currently in a position where the “end to end testing” stage of our CI pipeline takes around 5 minutes per branch.

Hi @heather_reid, do you know when this will be available in the Masterclass? I missed the start yesterday. :worried:

1 Like

ES and Kibana looks very helpful - Any tips about how to start the integration between your automation to these tools?

They are pretty powerful tools, although I’m sure others would do the job just fine as well. The important thing is that you have the logs somewhere so you can diagnose what’s going on when your tests fail.

In terms of getting started with ES/Kibana, there are various hosted solutions (be it in AWS, or native ElasticCloud) that get you up and running quickly - and they usually include Kibana out of the box too. Once you’ve got that up and running, you just need a way to post logs to your ES instance - since our services run in Kubernetes we’re using a gadget called filebeat for that purpose, but again there are other things we could have gone with.

As I showed in the presentation, in terms of getting data on which tests are failing it should be as some kind of “on failure” hook which does an HTTP POST to your elasticsearch instance with the relevant info.

How much trust should we have in those flakey tests?

Flakey tests definitely erode trust, which is why it’s so important to fix them before there are too many. With any flakey test, though, it’s also worth remembering that:

  • The original intent behind it was good - it’s attempting to add coverage to some flow in your system
  • At the time it was written, it presumably worked well enough to make it into main - so it should be salvageable now it’s (slightly) broken.

Tempting as it might be, just removing the flakey tests is not the way forward!

Hi @stephennb it should be available early to middle of next week depending on how long captions take :slight_smile:

Great thanks @heather_reid :slight_smile:

1 Like

The recording is now live! Link in the original post :grin:

Woop! Its available already! Way fast :grin:

1 Like