Visual regression test frameworks for Windows & MacOS applications?

Say a team releases Windows & MacOS applications and on occasion there’s an undesired visual change like a button shifting position or a font size increase affecting the wrong text labels.

Ideally I would not want to spend human brainpower on preventing these types of issues. I also would rather not use the less than brilliant cross-platform desktop test automation frameworks to write test scenarios against a large variety of potential minor visual issues.

For the web, tools like Percy or Applitools Eyes exist for quick UI diffs, but does something similar exist which supports Windows & MacOS applications and in a way you’ve found to be reliable and not too costly to maintain?

And if not, what alternative approach have you used?


I do not know this tools very well, I will describe you what I have used.

Once I was working on a Java client-server application where data charts where drawn at the server and then sent to the client. (It was much data to process per chart, including different calculations and no live updates needed)
Users could choose from multiple chart types and set different options per type. This resulted in a combinatoric explosion of charts a user could create.

We had to make changes at the code which should not result result in changes of this charts. “works as before”

Therefore we developed checks which where able to call the server API like a real client and compare the received picture with a defined one (which we took from the old version).
We did the comparison which ImageMagick. This is a library which could make diffs for pictures as we do not wanted to rely on visual comparison in the case of differences. I don’t fully remember, I think we also had reasons to not do a 1:1 comparison, but using threshold (somehow the pictures where slightly different with each generation). Which ImgaeMagick is able to do.
To reduce the number of charts to create we either used pair-wise or orthogonal testing for the types and parameters. We had a library which calculated the test cases with the necessary combinations.

I would not totally reject the human brainpower and would suggest a combination of both approaches.
Have automation which navigates through the application and takes screenshots from every important screen. This is not about asserting any output. Maybe use specific test data or short cuts for this to guarantee the success of different use cases. Maybe even mocking.
Afterwards humans flip through them and asses them.

It might still feel a bit laborious, but this approach should take away the approach of long lasting navigation.

Restriction for the following paragraph: I don’t know the capabilities of such tools you named
Every comparison tool is limited to what the developers coded it to compare. (Which might be very much for this tools)
A human being, specifically its brain, is a very good general comparison tool. It can recognize nearly any change without explicitly being told to look for it.
Sure, such a person needs to be used to the application, needs to know how it normally looks like. It needs some training to be able to tell unintended differences. But team members, using the application nearly every day, should already be in a good state. :slight_smile:

My question would be from where this potential change could come.
Is the typical automation maintenance hell worth the effort ? (I do not know if it it applies for this tools)
Ẁhat are the reasons your team fears unexpected changes? This typical, vague “Hell, I do not know.” ?
“We want to catch changes coming ‘out of the blue’” is imo a problematic motivation to do automated checks.
In my experience every automation comes with a maintenance cost.

So the question is when does your team expects visual changes, aside ‘out of the blue’.
If your team knows their changes well enough you should be able to guess where potential changes could occur.
Its often way less effort to either fully manually check them (even if you have to do it a couple of times) or by at least the combined approach of computer-makes-pictures-humans-asses-them.

This is an assumption I would also take with care.
Minor looking visual issues (which typical E2E automation does not notice) can screw up your user experience and can make you loose money in different ways.
The UI is typically first and most thing your users and customers see. Its THE impression they become from it.
Even if they can accept that the UI is not that beautiful, a visual issue can make the application either unusable or letting do the user wrong things. e.g. think about wrong places texts on buttons, save <> cancle, hiding options or buttons etc.

So investing effort into checking your visuals regular can be a key effort for your business success.

This is my advice, not an order.
Make what every want you with it.

Kind Regards :slight_smile:

1 Like

if, you have a UI that was designed using a cross-platform framework, then you need to use that F/W. if the UI differs on both platforms, then yep native apps are hard and time costly. But not to date have I seen anyone get this right using a UI automation tool. We have a homegrown UI framework, and even that has very little smarts but does let us for example change or “stuff” all the labels on element by adding uppercase characters, leetifying them and a small number of adding garbage characters to the end of every text string (it uses a special “monkey” language translation) . Hope that’s a tip, just use your normal language compiler flow to “munge” a new set of language files and build with those. The advantage is the munging still eaves you with an app that an English user can use, but makes it easier to detect with the human eye when overflows occur.

We use this in our tests, which because we have our own UI it can actually detect when text overflows a control - but it still fails to pick up on loads of other problem areas. Hope that helps at least confirm, that this is pretty hard stuff.

1 Like

We also created a homegrown Visual UI tool at my last job which had mixed results. Yes, there is Applitools, Percy, and even BitBar for paid tools. TBH, if I had a requirement to do this again, I would probably use Playwright which has Visual UI testing capabilities.

Thanks Sebastian, all of that is great input.

I love the ImageMagick approach. Perhaps I’ll be able to trigger ImageMagick cli commands when navigating through the UI through a desktop UI automation tool to take screenshots and do the comparison against baseline. If that ends up working and providing enough value, I’ll post about it.

It may indeed be the case that a human runthrough will end up being more efficient. If I can get a prototype up and running, I will run some experiments with a real team.