Why do screen-readers behave so differently from one another?

The product I work on is required to be WCAG compliant, and our team of 3 testers do all we can to ensure the WCAG criteria are met.

We always test our development with NVDA and JAWS screen-readers (usually only against Chrome and not all supported browsers) but often find that some components are only read correctly in one of the 2 screen-readers.

We’ve had some feedback this week that IOS screen-readers (presumably VoiceOver) and SuperNova do not work with our Web App at all. My understanding initially was that screen-readers should behave in similar ways, but this seems less and less the case. They also appear to behave differently with different browser combinations.

I’m interested in how other testers approach screen-reader testing to ensure their users have the best possible experience? In an ideal world we would test many more screen-reader/browser combinations, but I just don’t think it’s feasible with the amount of resource that we have.

Furthermore, does it count as a WCAG fail if certain screen-readers work and others do not? And how do your companies deal with licenses for screen-readers? NVDA is free, but do you pay to use others for testing?

Many thanks! :slight_smile:

2 Likes

Hi @claudia.davey, I’m by no means an expert. I believe that engineering needs to follow certain guidelines to make the app ADA/WCAG-friendly. For web it is often using aria- attributes like aria-label for almost all elements like buttons, inputs, etc. Also, it might be helpful to automate WCAG testing which can be easily done with a tool like testRigor. Disclaimer: I’m affiliated with testRigor.

It’s great that you’re not just seeing accessibility as a compliance checkbox, but that you’re thinking of how to give users the best possible experience!

In terms of how many screen reader and browser combinations to cover, at Understanding Conformance | WAI | W3C under the section Understanding Accessibility Support, the W3C Accessibility Guidelines Working Group says that:

the Success Criteria require that something be done in the Web content that would make it possible for assistive technologies to successfully present the content’s information to the user.

So the developers you work with don’t have the responsibility to make everything work for all possible assistive technology in all circumstances, but rather they have the responsibility to use standards-compliant techniques that allow assistive technologies to interpret a page correctly.

You could use that to justify only testing against one screen reader, but since we know that screen readers unfortunately do behave differently across different OSes, target technologies, and so on, we’d do well to catch and address those differences where we can. In your case I suspect you’d catch the vast majority of problems your users will have with screen readers through NVDA & VoiceOver, but you are better placed to figure out whether testing more than those two will be worth the time and effort of testing in comparison to how feasible it is to then make concrete improvements to the user experience.

Here is what WebAIM (you’ve probably come across them before, but they’re a non-profit organisation trying to make the web more accessible) has to say about screen reader testing under the question So should I test my content with all the screen readers?:

You certainly could, and you may learn quite a bit if you do. This would be especially true of JavaScript or PDF content. It would be wise to test it in as many technologies as possible, including a wide range of screen readers. For less complex content, though, testing in one or two screen readers is usually enough. If you follow guidelines, you can be reasonably assured that your content will be screen reader friendly.

If you look at the nature of your web application and the compatibility issues you’ve found thus far, you could make a judgment of the value of testing with additional (potentially more commonly used) tools versus the cost, and that’s the point where you might have a business case for either purchasing any required licences or testing only with the most commonly used free tools.

Two more things about reducing workload:

First I’d like to mention that there are organisations that offer testing by actual people with motor and visual issues. A while ago I came across one that provided videos of the testers showing how they use the application and what specific problems they come across, but I don’t remember which organisation that was. Just now I found Fable | Digital accessibility, powered by people with disabilities (makeitfable.com). I don’t know the cost, but perhaps it’s lower than the (real and opportunity) cost of testing internally, though it wouldn’t be able to replace it completely.

Second is automation. Accessibility very much needs human engagement, but there are many basic things that can be checked through tools. Take for example GitHub - dequelabs/axe-core: Accessibility engine for automated Web UI testing, developed by Deque Labs, who are also involved in the development of the WCAG. Their own claim is:

With axe-core, you can find on average 57% of WCAG issues automatically . Additionally, axe-core will return elements as “incomplete” where axe-core could not be certain, and manual review is needed.

Hopefully some of this is useful to your team!

Your post raises a lot of issues. I have been doing accessibility testing for more than 20 years, so I have wrestled with them for a long time.

Firstly, depending on the website’s functionality, it may not be necessary to test with a screen reader at all when doing a WCAG audit. All the WCAG success criteria are written such that they can (indeed, must) be tested by inspection of the user interface and the source code. In practice, it is good to test a couple of the success criteria (such as SC 4.1.3:Status Messages) with a screen reader, but it is literally just a few and even then you must still check the code.

Browse mode
One of the reasons screen reader behaviour varies is because some have a “virtual cursor” or “browse” mode and others don’t. JAWS and NVDA do, and you will notice that their behaviour is very similar. Voiceover on macOS does not have a “browse” mode, but it has a weird concept of left-to-right navigation that most people find incomprehensible. Mobile screen readers also do not have a “browse” mode.

Heuristics
Another major reason is that screen readers use heuristics to improve the user experience when websites are coded badly. JAWS does this a lot, whereas NVDA uses very few heuristics, if any. NVDA therefore gives you a more “true” user experience (which is why we use it for testing), whereas JAWS tends to give a better user experience.

Specifications
Yet another reason is that the HTML, CSS, ARIA and JavaScript specifications change continuously, but the browsers and screen readers don’t all adopt the new features at the same time. Some never adopt certain features. 20 years ago, the HTML specification got updated perhaps once every five years. Now it’s updated every five days.

Accessibility support
Then there’s the difficulty of “accessibility supported technologies”. WCAG says that your conformance claim must only rely on accessibility supported technologies. However, WCAG explicitly avoids stating how many assistive technologies must support a particular technology in order for it to be considered to be supported. Is it sufficient that a website only works with JAWS and NVDA? There is no way to know.

Despite all this, I am not saying you should not test with screen readers. I am just saying you don’t need to if you are only doing a WCAG audit. If you’ve got time, definitely test with assistive technologies and do user testing with disabled participants.

Licensing
Regarding licensing, we spend thousands of pounds a year. All our testers have licenses for JAWS, ZoomText and Dragon in addition to all the free products. We also have a smaller number of licenses for products like Read&Write. My view is that if you’re a professional, you do things professionally, so you just pay what it costs. Or tell your management they need to pay - do they really want to be regarded as amateurs?

Automation
Automated accessibility testing is useful, but there are a lot of gotchas. Firstly, let me say unequivocally that the claim that “axe-core can find on average 57% of WCAG issues automatically” is absolute bull. The true figure is anywhere between 0% and 100% depending on what the specific issues are on your website. We test upwards of 100 different websites every year and we use axe as a “safety net” after doing the manual testing. I estimate axe finds 20% to 30% of the issues, but increasingly it only finds the least important issues.

Furthermore, you can’t take the results of most tools at face value because they report false positives. They also find genuine issues and report the wrong cause and/or make the wrong recommendation for fixing it. Analysis of the results can take a long time. I could talk about this all day, but I had better stop here.

Steve Green
Managing Director
Test Partners Ltd

@wilcovanesch has covered things very well. I’d ask a few questions for clarity.
Do you only test on Windows machines?

  • It is important to cover both Windows and iOS operating systems as they have subtle differences

Do you test on mobiles at all?

  • It can be as important to test on mobile and tablets as the interactions with screen readers are again subtly different

What version and level of WCAG are you testing against?

  • There is significant differences between 1.0 and 2.1 and A to AA or AAA.

Do you commission a Voluntary Product Accessibility Template (VPAT) for your product or use people with disabilities to test?

  • Getting experts to test and help you understand how different people use software is invaluable to gaining insights for your product.

If you have any questions about the above please let me know.

We decided on using NVDA, as it’s rather popular here. The company does have a JAWS license for products in other countries.
I’ve been running into screen reader issues more often as well. If the code is correct then it will not be changed. It’s sad to know that the weird reader behavior is going to ruin the experience for users. But at the same time we can’t just break the rules to make it work on that one reader while potentially breaking it for an other one.

There is also an Accessibility Slack Slack for those kinds of questions with a11y experts and users.

Thank you for all of your detailed replies! @wilcovanesch @adystokes @testrigor @sles12 @steve.green

You have given me some excellent points to consider. We are required to support WCAG 2.1 to AA standard, and we do currently run AXE dev tools against new pages and components when they are created, but this is a manual run and is not integrated currently in our deployment pipelines. There are quite a number of accessibility related bug tickets in our backlog that I wish were picked up and resolved more quickly. Potentially some of these bug tickets relating to missing ARIA labels, page landmarks etc. could be causing some of the vastly differing functionality between screen-readers.

As several of you have pointed out, I think we are currently making a mistake only testing against Windows machines - and I’ve just now only used VoiceOver on an IPhone to test our web app for the first time. To get some more ideas on how to improve our screen-reader test strategy and to better understand where the differences are currently occurring, I’m going to run a test bash at my company with various devices/browsers/screen-readers and have people from around the business explore our product and record any successes/challenges that they face.

Furthermore, we have just had some user testing completed over the last 2 weeks with real screen-reader users. I’m not sure what company our UX researcher used to find the participant users, but sitting in on these sessions was a really good experience that I would advise for others! Seeing how real people interacted with these tools was invaluable.