What the Research Actually Says About Accessibility Overlays and Automated Scanners
A review of the peer-reviewed evidence on accessibility overlays, automated WCAG scanners, and what ADA Title III defendants are actually relying on.
If you operate an e-commerce store in the United States, you've probably been pitched one of two things in the last year. Either an accessibility overlay — a widget that bolts onto your site and promises to fix WCAG issues on the fly — or an automated scanner that audits your site and hands you a compliance report.
Both product categories market themselves as an answer to ADA Title III exposure. Both are sold on the premise that they do something meaningful to reduce legal risk or improve accessibility for users with disabilities.
The peer-reviewed research is clearer than the marketing. Automated scanners miss most of the WCAG success criteria. Overlays don't reliably help users, and the most recent study on them concludes they cannot mitigate legal risk. The gap between what these tools claim and what independent researchers have documented is the subject of this post.
This is not a product pitch. It's a walk through what the academic literature — ACM conference papers, peer-reviewed journals, and systematic reviews — actually says, so that merchants, counsel, and anyone else trying to make sense of the accessibility compliance market has a clean read of the evidence.
What Overlays Actually Do
An accessibility overlay is a third-party JavaScript widget embedded on your site. Vendors in this space — accessiBe, UserWay, AudioEye, EqualWeb, and others — market the overlay as an automated remediation layer that scans and modifies the DOM to address accessibility barriers without requiring code changes on your end.
The promise is attractive because the alternative is expensive. Manual WCAG remediation of a mid-sized e-commerce site is a multi-month engineering project. A widget that bolts on in ten minutes and produces a compliance badge is a radically different economic proposition.
The question is whether the widget works. Two peer-reviewed studies, both published at ACM ASSETS — the premier venue for accessibility research — have tested that question directly.
Study One: Overlays and Actual Users
Makati and colleagues (ASSETS '24) ran a mixed-methods study combining a survey and interviews with blind and low-vision users who had encountered overlays in the wild. Their finding: overlays "often fail to deliver on their promises and, in many cases, increase existing challenges." Users reported that overlay widgets actively conflicted with their existing assistive technology — screen readers, magnifiers, keyboard navigation tools — and produced reduced functionality compared to what they had before the overlay was installed.
The mechanism is worth understanding. Users with disabilities typically arrive at a website with their own configured stack of assistive technology. An overlay that tries to modify the DOM or inject its own accessibility layer frequently ends up double-handling events, overriding ARIA attributes, or interfering with the user's already-tuned setup. The widget solves for a user who doesn't exist — someone who arrives without their own AT configured — and actively degrades the experience of users who do.
Study Two: Overlays and Legal Risk
Hartman and colleagues (ASSETS '25) went further. They built purpose-built test websites with known, well-characterized accessibility errors, installed three AI-powered overlay products, and then tested the resulting sites using both automated testing and manual keyboard and screen-reader evaluation.
The automated testing, run by the overlays themselves and by third-party scanners, reported that the sites had become more accessible after overlay installation. The manual evaluation told a different story. Major errors remained. The overlays had addressed the kind of issues that show up in automated reports — and left the issues that actually block users in place.
The authors' conclusion, which is the single most direct statement on this subject in the peer-reviewed literature: "automated overlays alone cannot mitigate legal risk for inaccessibility; manual human testing and solution development is still needed." That sentence comes from the venue where this work is peer-reviewed by the community of researchers who study accessibility for a living. It is not a competitor's marketing claim. It is a published finding.
What Automated Scanners Actually Catch
The other common response to ADA exposure is running an automated WCAG scanner — axe-core, WAVE, Lighthouse, IBM Equal Access, or one of the commercial products built on these engines. A scanner runs over your site, flags violations, and produces a report.
These tools are genuinely useful. The claim this post is pushing back on is not that scanners are worthless. The claim is narrower: a passing scanner report is not a conformance claim, and it is not a defense.
The Coverage Ceiling
Abduganiev (International Journal of Information Technology and Computer Science, 2017) ran an empirical comparison of eight automated evaluation tools against the WCAG 2.0 success criteria. The maximum coverage achieved by any single tool was 32.4 percent. Inter-tool reliability — the degree to which two scanners agree about whether a given issue exists — ran as low as 1.56 percent. The paper's own summary: relying on automated tools alone is "a great mistake."
Iniesto and colleagues (Journal of Universal Computer Science, 2024) extended the picture with a two-year study of WCAG applicability. Their finding: 62 of 72 WCAG success criteria — roughly 86 percent — are not correctly addressed by automated tools. The remaining 14 percent that scanners handle well tend to be the mechanical, structural criteria: things like image alt attributes, form label presence, and heading hierarchy. The criteria that matter most for actual user experience — meaningful sequence, focus management, error identification, sufficient context — require human judgment.
Doush and colleagues (CCF Transactions on Pervasive Computing and Interaction, 2023) reached a similar conclusion through a different method. Their systematic analysis of which WCAG 2.1 success criteria can be automatically tested found that most cannot, and require either manual review or technologies not present in production scanners.
Stacking Scanners Does Not Close the Gap
One reasonable response to the coverage ceiling is to run multiple scanners and take the union of their findings. Pool (ACM W4A, 2023) tested exactly this strategy. The study ran nine accessibility testing tools against 121 web pages at CVS Health, comparing their outputs head-to-head.
The result: each tool only fractionally duplicated any other, and each discovered numerous issue instances that all eight other tools missed. Stacking scanners broadens coverage but does not close the gap — every tool in the stack has blind spots, and those blind spots are not aligned.
Why This Matters for Defendants
Put the three findings together: scanners individually cover about a third of WCAG at best, eighty-six percent of WCAG criteria are not correctly addressed by automated tools, and stacking multiple scanners still leaves issues that any single tool would have found. A clean report from any scanner, or even from nine scanners, is not evidence of WCAG conformance. It is evidence that the scanners you ran did not flag issues within their limited coverage range.
What Live E-Commerce Sites Actually Look Like
Acosta-Vargas and colleagues (PeerJ Computer Science, 2022) scanned the top 50 ranked e-commerce websites. Of the accessibility barriers identified, 83.1 percent fell under the WCAG "perceivable" principle — color contrast failures, missing alt text, insufficient visual indicators. "Operable" issues came in at 13.7 percent, "robust" at 1.7 percent, "understandable" at 1.5 percent.
The practical implication is specific: the most common accessibility issues on live e-commerce sites are also the most common items cited in demand letters. Contrast ratios and alt text are the easiest things to detect, the easiest things to prove, and therefore the first things a plaintiff's counsel will screenshot.
Martins and colleagues (Universal Access in the Information Society, 2023) ran a much larger automated evaluation, covering approximately three million pages. Average: thirty errors per page. Most frequent issues: inadequate text contrast and missing accessible names.
Parthasarathy and colleagues (CHI '25 Extended Abstracts), citing WebAIM 2024 data, reported that only 4.1 percent of the world's top one million homepages are fully accessible, and that the detectable WCAG failure rate has decreased only 1.9 percentage points over five years. The market is not self-correcting.
Gonçalves and colleagues (Universal Access in the Information Society, 2017) evaluated a major e-commerce site using three methods — automated scanning, heuristic expert review, and actual blind-user testing. The site passed automated scans. It failed blind-user testing on efficiency, effectiveness, and satisfaction. The pattern the authors documented is the pattern this entire post is about: scanners produce a clean report on a site that users cannot actually use.
The Landscape as It Actually Is
Pulling the research together: accessibility overlays, as empirically tested at the leading peer-reviewed venue, do not reliably help users with disabilities and do not mitigate legal risk. Automated WCAG scanners cover a minority of the WCAG success criteria, disagree with each other substantially, and miss issues that human reviewers catch immediately. Running more of them helps at the margins but does not close the gap. Live e-commerce sites, across prevalence studies, average dozens of accessibility errors per page, concentrated in the categories — contrast and alt text — that are easiest for plaintiffs to document.
None of the research says automated tools are useless. Scanners are a reasonable first pass. Overlays, more narrowly, may produce marginal improvements on the specific issues their own detection catches. But neither category delivers what its marketing promises, and neither category is a defense.
The practical consequence for e-commerce operators is that the compliance story your tooling tells you is almost certainly not the compliance story a plaintiff's attorney will tell a court. The gap between those two stories is where ADA Title III exposure lives.
What Actually Works
The research consensus on what moves the needle is less dramatic than it should be, given how confidently the opposite claims are marketed. Meaningful WCAG conformance requires manual human testing by people who understand how users with disabilities actually navigate the web, combined with remediation that addresses the failures those tests surface. There is no automated shortcut that has survived peer review.
For defendants specifically, the second element is documentation. Mateus and colleagues (Journal of the Brazilian Computer Society, 2024), comparing 124 US federal digital-accessibility cases with Brazilian case law, found that all US cases analyzed resulted in settlement agreements requiring ongoing conformance monitoring, and in many cases specialist inspection and user testing. That's the shape of the US settlement environment. Contemporaneous records of remediation effort — what was tested, what was found, what was fixed, when — are the artifact those settlements converge on. Producing that artifact before a demand letter arrives is a different conversation than producing it under discovery pressure.
A Note on What This Post Is
This is a review of the peer-reviewed literature, not a legal opinion. The studies cited are publicly available. The ACM ASSETS papers are the authoritative venue for this category of research. Where this post quotes findings, it quotes them in the authors' own language.
BadgerTrace builds tooling to help e-commerce operators produce and maintain the kind of remediation record the research literature and US case law converge on. If that's useful to you, our product exists. If you take nothing from this post except a better map of what the research actually says about overlays and scanners, that's a good outcome too.
References
Abduganiev, S. G. (2017). Towards automated web accessibility evaluation: A comparative study. International Journal of Information Technology and Computer Science.
Acosta-Vargas, P., et al. (2022). Accessibility challenges of e-commerce websites. PeerJ Computer Science.
Doush, I. A., et al. (2023). Web accessibility automatic evaluation tools: To what extent can they be automated? CCF Transactions on Pervasive Computing and Interaction.
Gonçalves, R., et al. (2017). Evaluation of e-commerce websites accessibility and usability. Universal Access in the Information Society.
Hartman, A., et al. (2025). Evaluating AI-powered website accessibility overlays. Proceedings of ACM ASSETS '25.
Iniesto, F., et al. (2024). The use of WCAG and automatic tools by computer science students. Journal of Universal Computer Science.
Makati, A., et al. (2024). The promise and pitfalls of web accessibility overlays for blind and low vision users. Proceedings of ACM ASSETS '24.
Martins, J., et al. (2023). A large-scale web accessibility analysis considering technology adoption. Universal Access in the Information Society.
Mateus, D., et al. (2024). The legal handling of digital accessibility: A comparison of evaluation and policy approaches. Journal of the Brazilian Computer Society.
Parthasarathy, S., et al. (2025). Skill, will, or both? CHI '25 Extended Abstracts.
Pool, J. (2023). Accessibility metatesting: Comparing nine testing tools. Proceedings of the 20th W4A Conference.