How to e2e test an app behavior that takes several minutes, that is asynchronous?

My QA team has been asked to create an end-to-end test in order to cover a new behavior in an app. Everything is pretty normal except for one thing: The scenario cover a step that takes several minutes in the app.

For context, this is a SaaS webapp. Nothing fancy, you just log into our website and do stuff via your web browser. And we already have some e2e tests that run perfectly and automatically on a regular basis. We use python+pytest+playwright for the e2e tests.

The scenario is basically:

  • Given user is on page X
  • And user has clicked on “Start this computation”
  • And user sees a message “Computation is running it can take several minutes, please wait”.
  • When calculation is done
  • Then the page is updated with the new results
  • And the user can see value Y in box Z

The computation phase now takes ~5 minutes. But it could take longer in the future.

And also, there are other processes in the app that we haven’t covered with e2e tests yet, but we want to, and they have this same “wait” steps. It could be downloading files, sending an email, whatever. So I’m looking for a solution that could be applied to other similar, but different, cases.

My questions are:

  • Have you already encountered this particular challenge?
  • How did you approach this?
  • Were you satisfied with your solution?
  • What were its advantages and drawbacks?

For information, so far the more technical people in the team were thinking of putting this test in a separate thread (“parallelize” it) and hope that the waiting period would be shorter than the execution of the rest of the other tests. Launch everything and hope this one will be done before everything else.

But I was more thinking of a “splitting" the test” approach. Something along the lines of: One test that launches the computation and only checks that the user is informed it is started ; and Another test that checks the proper display of the solution (once the computation is done). I can already see some advantages, but also obvious drawbacks, such as: when do you launch the second test? how do you know the computation is finished?

So far, none of these approaches satisfy me. So I was hoping for other solutions from the community.

Thank you very much!

2 Likes

Long polling, example getting selenium to refresh the page every 5 seconds for example, it does suck. Checking you email inbox every 5 seconds is extreme, but it is a End-2-end test and very fragile as a result. Basically in a test, your test just has to wait and not change anything that could affect the outcome, some things work this way. However
 you are effectively testing the user experience here, so ask yourself, part of the experience is that you wait go make some coffee, then come back. Is that a valuable test?

So if you can fake the triggers and generate a faster response by having the developers give you hooks into or just some flags that set the outcome up without waiting then that’s a good way to not only let you test positive, but also test negative end results and do so much faster. Talk to the devs. Remember your test is not to test the computation result, but to test that the result is of a certain kind, so a pre-made result, a small number of “golden” images or answers are a great piece of data you could feed into the system by using hooks to fake a quicker test outcome. One way to generate these fakes is by mocking, developers use mocks in tests all the time, you just need to mock the computation to return one of your pre-made answers, and to do so immediately. A mock is basically where you take the “real” component, and swap it out for your fake one, so it will involve you talking to the devs and writing a special version/variation of a component. this is the main reason I heavily advocate for every tester having some very basic mind-you, coding skills in the same coding language(s) as the system is written in.

Speeding it up is much more valuable that testing the end-2-end (E2E) experience.

2 Likes

If it’s possible, I would try hard to create a dataset which is faster to compute. Even if only for the QA process, it might be beneficial that the system works end-to-end, even if not with production data. This might not be the best approach in your situation.

On the other hand, given there is no way to speed up the process, I would consider splitting the e2e test suite into two parts. One for regular tests, and the other one for asynchronous, long ones. This would probably be simpler to run two suites in parallel, rather than to rune a bunch of test within one suite in the background. For example, Github Actions allows to run jobs in parallel by default. By the time your current suite finishes, your few new async, long-running ones should too. And if they don’t, the job will just take a bit longer, GHA won’t care.

3 Likes

You are on the right track and so are the comments above. If mocking is not an option, split the test and tag it so that you can only run that test in a separate job. Then remember, the wait time is a metric to be measured so make sure you keep track of it and after a few run you will be able to average it. Now this becomes your base line from which you can feedback to the team as an improvement. Say after monitoring, the step takes on average 7 minutes. Your tests that run over 7 minutes could indicate that there is an issue mark it as fail and anything less it’s a pass. Also, this is now measurable. Ask the devs to reduce this time by 30% on the next iteration and update the tests accordingly. Happy testing!

1 Like

I’ve run into similar issues a few times (both myself and in coaching other testers).

Generally my first question would be: do you really need to see this end-to-end? Or can you get away with setting up a playwright route that intercepts the request and returns predetermined results after (for example) a 30 second wait.

If it really is important to have a full end to end including calculations, then my next option would be the one already mentioned by @puck: try to get an extremely limited dataset to knock down the compute time as much as possible. This too comes with a caveat: if the important thing is testing the complex/long running dataset end-to-end that’s not going to do you any good either.

So if having the long-running big dataset in full end-to-end is really necessary (either technically, or because of past bugs which mean nothing else will be trusted) we end up with parallelization and/or splitting.

Again there is an ‘is this important’ line of thinking here. If it’s important to see the browser get live updated from ‘waiting’ to ‘done’ (some sort of websocket push I imagine) you’re stuck keeping this browser open for five minutes. If all that matters is ‘user sees “computation running” and can check back later’ a split test might work. I wouldn’t favour the split myself unless you have a functional reason to want to test the two parts separately - if you do have a reason for that then split, but maybe even see if you can have the second test trigger its own computation block via the api and only open a browser once its time to check the results.

As you get more tests with long waits separate suites in parallel, or higher numbers of threads/workers in playwright might be the answer, but that depends on the resources you have available on your CI platform (and their cost).

As a last resort sometimes it’s been necessary to limit when certain tests get run to keep things manageable. For example: all feature branch builds run a version with a playwright route instantly returning a preset result, and only release builds (or feature builds that get given a special flag) run the full e2e version and take the hit of waiting X minutes for the backend to be done.

1 Like