2026-05-17 · 6 min read

Five Screen Reader Tests axe-core Cannot Run (And Every Dev Should)

Abstract dark editorial illustration: two overlapping circles rendered in fine line work, one in slate and one in copper, representing two testing methods with a shared intersection zone. No text.

We audit accessible websites for a living. We run axe-core on hundreds of projects a year. We also pair NVDA and JAWS with real user testing. And we see the same gap every single time: axe-core passes. The site fails in a screen reader.

This is not a failure of axe-core. Axe-core does what it was built to do — verify that ARIA attributes exist, that heading hierarchy is logical, that buttons have accessible names. Automated testing tools catch approximately 30–40% of accessibility issues ^[1]. The automated scanner inspects markup. It does not experience the site the way a screen reader user does.

Axe-core catches the low-hanging fruit with zero human effort. That is worth running in CI. The remaining 60–70% of failures — the inverse of the programmatically detectable slice ^[1] — require manual testing. You can focus that effort on the five tests below rather than exhaustive page audits. These are the tests that no automated scanner can replicate, and why they matter.

1. Verify dynamic content updates announce correctly

We set up a notification system. A button triggers a request. The server responds. New content appears on the page. Axe-core confirms the div has aria-live="polite". The test passes.

Now we open NVDA. We hit the button. We listen. If nothing announces, the user has no idea the request succeeded. The notification might be green. It might say "Saved." The screen reader user hears nothing.

Static analysis cannot test ARIA live regions or interact with controls the way a real screen reader does ^[3]. Automated tools do not wait for network responses. They do not hear what gets announced and what stays silent. We must run this test manually: trigger the dynamic update, unmute the screen reader, and listen for the announcement. If the user hears nothing, the form is broken — even though axe-core marked it compliant.

2. Test modal focus trap and escape behavior

We see modals all the time. They have role="dialog". They have aria-modal="true". Axe flags zero issues.

But when we test with JAWS, we tab through the dialog. Tab reaches the close button. We press Tab again. Focus jumps outside the modal. Now we are focused on a button behind the modal. The user can interact with the page while the modal is still visible. The modal is not a trap.

Axe-core does not test keyboard interaction or focus flow the way a manual tester can. It sees the attributes. It does not see the escape hatch. We must tab through the entire modal ourselves — forward and backward — and confirm that Tab and Shift+Tab cycle only within the dialog. We must press Escape and hear focus move back to the trigger button. Automation cannot replicate this test.

3. Listen for heading and landmark announcements

We run axe-core on a page with proper heading hierarchy: h1, then h2s, then h3s. The scan passes. The page structure is valid.

We open NVDA. We press H to jump to the next heading. We hear "Heading 1." We hear "Heading 2." Then we hear a heading with no text. Just silence, followed by the next one. A heading exists in the DOM. In our audits, we find headings with no text pass axe-core's hierarchy check while a screen reader user hears nothing ^[6]. A screen reader reveals headings with no content; an automated scan does not.

The same issue surfaces with landmarks. We mark sections with role="main" and role="navigation". Axe confirms they exist. But only a real screen reader user will know that two main regions are on the page, or that a navigation landmark announces as empty. We must navigate by landmark ourselves. If a landmark is silent or redundant, the user cannot navigate efficiently. Automation cannot replicate this verification.

This is where NVDA and JAWS diverge meaningfully. NVDA's stricter fidelity to markup often surfaces failures that JAWS obscures through heuristic correction ^[4]. JAWS uses heuristic methods to interpret content when markup is incomplete, giving users additional context where needed ^[2] — which can mask failures that NVDA surfaces clearly. Testing both readers is not redundant. It reveals where the experience depends on the reader's tolerance rather than the code's correctness.

4. Confirm form error messages associate to fields

A form has validation. A text input has aria-describedby="error-message-1". The error message is hidden until the user submits an empty field. Axe-core confirms the association exists. The test passes.

We test with NVDA. We submit the form. We focus on the field again. We listen. If the error message is announced, the user knows what went wrong and can fix it. If the message is not announced, the user hears only the field label. They have no idea why the form rejected their input — even though the markup is technically correct.

Axe-core verifies the attribute. It does not verify that the announcement is audible and contextual. We must fail the form, refocus on the field, and listen for the error in NVDA's reading. If we have to dig into the page source to find the error, so does the user. No scanner can execute this verification ^[6].

5. Test list semantics and nesting

We see navigation built with divs styled as lists. Each div has role="menuitem". Axe-core flags zero issues. The roles are present. The structure is accessible.

We open NVDA. NVDA's stricter fidelity to markup often surfaces failures that a more forgiving reader obscures ^[4]. When NVDA enters the list, it does not announce the number of items. The user hears no indication that this is a list at all — only a series of menu items. Without the semantic <ul> or <ol>, the structure is opaque.

We then test with JAWS. JAWS applies heuristics to enhance usability, sometimes inferring structure from context ^[2]. A user on JAWS may get acceptable navigation. A user on NVDA will not. The divergence tells us the code is relying on JAWS's forgiveness rather than correct markup. We fix the markup so it works for both readers — and for any new reader that ships with stricter semantics.

What automated testing cannot replace

Automated testing is a false negative filter — it catches markup errors but misses experience failures. Manual screen reader testing is a true positive gate — it confirms the user actually benefits.

The W3C is explicit: evaluation tools can produce false or misleading results, and human judgment is required alongside automated checks ^[5]. Axe-core is a baseline, not a ceiling. It confirms that attributes exist. It does not confirm that the experience is coherent. The five tests above are the ones we see fail most often across the broadest range of codebases — form-heavy UIs, single-page applications, and component libraries are the highest-risk areas.

Prioritize these five tests on form-heavy pages, dynamic UIs, and navigation patterns. That is where the experience failures concentrate, and where manual investment pays off fastest.

Start with one test today: open NVDA, trigger a dynamic update on your site, and listen for the announcement. If you hear nothing, you have found a gap that no automated scanner was designed to catch. This is the work we do in every accessibility audit at Morton Digital. If your team wants to run the same manual screen reader tests without building the process from scratch, Parallax wraps NVDA and JAWS testing into an audit workflow that fits your code review cycle — so your team catches the failures automation cannot.

Sources

[1] TestParty — Screen Reader Testing Guide: NVDA, JAWS, and VoiceOver for Developers — "While automated testing tools catch approximately 30-40% of accessibility issues, screen reader testing uncovers the semantic and structural problems that only manifest during real assistive technolog"
[2] UXPin — NVDA vs. JAWS: Screen Reader Testing Comparison — "it uses heuristic methods to interpret content when accessibility markup is incomplete, giving users additional context where needed"
[3] AssistivLabs — Automating Screen Readers for Accessibility Testing — "It also can't test tricky ARIA live regions or interact with controls like a screen reader will"
[4] Equally AI — JAWS vs NVDA: Which Is Better for Accessibility Audits? — "NVDA's stricter fidelity to markup often makes it a more precise choice for flagging WCAG failures"
[5] W3C WAI — Selecting Web Accessibility Evaluation Tools — "Sometimes evaluation tools can produce false or misleading results"
[6] Bird Eats Bug — How to Use NVDA Screen Reader for Accessibility Testing — "NVDA helps uncover accessibility issues that automated tools may miss, such as missing focus states, incorrect ARIA usage, or unlabeled dynamic components"

Morton Technology Consulting LLC — WCAG 2.1 AA audits for Florida government agencies. Parallax audit → · WCAG Readiness Kit → · All posts →