I Built a Free WCAG Accessibility Scanner — Here's What I Learned

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • MyrinNew
    Senior Member
    • Feb 2024
    • 5168

    #1

    I Built a Free WCAG Accessibility Scanner — Here's What I Learned

    As a solo developer building in public, I recently launched AccessiGuard — a free WCAG accessibility scanner. What started as a side project to help developers catch accessibility issues early has taught me more about web standards, automated testing, and edge cases than I ever expected.


    Here's the technical journey, the challenges I faced, and what I learned along the way.


    Why Another Accessibility Tool?

    The accessibility landscape is changing fast. The EU's European Accessibility Act (EAA) is already in effect, and US government entities face an April 2026 deadline for WCAG compliance. Just last year, accessiBe was fined $1 million by the FTC for misleading accessibility claims.


    I wanted to build something honest: a tool that tells you what it actually checks, doesn't make inflated promises, and remains free for developers who want to catch issues before they become lawsuits.


    The Tech Stack

    I kept it intentionally simple:
    • Next.js 14 (App Router) — for the web interface and API routes
    • Cheerio — for fast, server-side HTML parsing
    • TypeScript — because accessibility checks require precision
    • React — for the frontend dashboard
    • Vercel — for hosting and edge functions


    The beauty of Cheerio over a full browser automation tool like Puppeteer is speed. We can parse and analyze HTML in milliseconds rather than seconds. The tradeoff? We can't check everything that requires JavaScript execution or visual rendering.


    What We Actually Check

    I'll be honest: automated tools can only catch about 30-40% of WCAG issues. The rest require human judgment. But that 30-40% matters — those are the low-hanging fruit that plague most websites.


    Here's what AccessiGuard scans for:


    1. Missing Alt Text on Images

    This is the most common issue. Here's a simplified version of the check:






    function checkImageAlt($) {
    const issues = [];

    $('img').each((i, elem) => {
    const $img = $(elem);
    const alt = $img.attr('alt');
    const role = $img.attr('role');

    // Decorative images should have empty alt or role="presentation"
    if (role === 'presentation' || role === 'none') {
    return; // Skip
    }

    // Non-decorative images must have alt text
    if (alt === undefined) {
    issues.push({
    type: 'missing-alt',
    element: $.html($img),
    wcag: '1.1.1',
    level: 'A'
    });
    }
    });

    return issues;
    }







    Challenge faced: Distinguishing between truly missing alt attributes and intentionally empty ones (alt=""). Empty alt is valid for decorative images, but missing alt is always an error.


    2. Form Labels

    Forms are critical for accessibility. Every input needs a label:






    function checkFormLabels($) {
    const issues = [];

    $('input, select, textarea').each((i, elem) => {
    const $input = $(elem);
    const id = $input.attr('id');
    const ariaLabel = $input.attr('aria-label');
    const ariaLabelledby = $input.attr('aria-labelledby');
    const type = $input.attr('type');

    // Skip hidden inputs and buttons
    if (type === 'hidden' || type === 'submit' || type === 'button') {
    return;
    }

    // Check for label association
    const hasLabel = id && $(`label[for="${id}"]`).length > 0;
    const hasAriaLabel = ariaLabel || ariaLabelledby;

    if (!hasLabel && !hasAriaLabel) {
    issues.push({
    type: 'missing-form-label',
    element: $.html($input),
    wcag: '3.3.2',
    level: 'A'
    });
    }
    });

    return issues;
    }







    Challenge faced: Modern frameworks like React often use aria-label or wrap inputs in labels without for attributes. I had to account for multiple valid labeling patterns.


    3. Heading Hierarchy

    Headings should follow a logical order (h1 → h2 → h3), not skip levels:






    function checkHeadingOrder($) {
    const issues = [];
    const headings = [];

    $('h1, h2, h3, h4, h5, h6').each((i, elem) => {
    const level = parseInt(elem.name.substring(1));
    headings.push({ level, text: $(elem).text().trim() });
    });

    for (let i = 1; i headings.length; i++) {
    const current = headings[i].level;
    const previous = headings[i - 1].level;

    // Check if we skip levels (e.g., h2 → h4)
    if (current - previous > 1) {
    issues.push({
    type: 'heading-skip',
    message: `Heading level ${current} appears after level ${previous}`,
    wcag: '1.3.1',
    level: 'A'
    });
    }
    }

    return issues;
    }







    Challenge faced: Some modern designs intentionally use CSS to style headings differently than their semantic level. I had to decide whether to flag semantic issues or trust the developer's intent.


    4. Color Contrast

    This is where it gets tricky. Without rendering the page, we can't truly measure contrast. So AccessiGuard:
    • Parses inline styles and tags
    • Flags suspicious color combinations
    • Recommends manual testing with browser DevTools




    function checkColorContrast($) {
    const warnings = [];

    $('[style*="color"]').each((i, elem) => {
    const $elem = $(elem);
    const style = $elem.attr('style');

    // Simple regex to extract colors (not production-ready)
    const colorMatch = style.match(/color:\s*([^;]+)/);
    const bgMatch = style.match(/background(-color)?:\s*([^;]+)/);

    if (colorMatch && bgMatch) {
    warnings.push({
    type: 'contrast-warning',
    message: 'Manual contrast check recommended',
    element: $.html($elem),
    wcag: '1.4.3',
    level: 'AA'
    });
    }
    });

    return warnings;
    }







    Challenge faced: Accurate contrast calculation requires computed styles from a rendered page. I chose to flag potential issues and recommend tools like WAVE or Axe DevTools for final verification.


    5. Language Declaration

    Simple but critical:






    function checkLanguage($) {
    const issues = [];
    const htmlLang = $('html').attr('lang');

    if (!htmlLang) {
    issues.push({
    type: 'missing-lang',
    message: 'HTML element missing lang attribute',
    wcag: '3.1.1',
    level: 'A'
    });
    }

    return issues;
    }







    The Architecture

    Here's how a scan works:

    1. User submits URL via the Next.js frontend
    2. API route (/api/scan) receives the request
    3. Fetch HTML using native fetch() with a 10-second timeout
    4. Parse with Cheerio — convert HTML string to queryable DOM
    5. Run checks — all check functions execute in parallel
    6. Aggregate results — combine issues by severity (A, AA, AAA)
    7. Return JSON — frontend displays results


    The entire scan typically takes 500ms to 2 seconds, depending on page size.


    Edge Cases and Gotchas

    SVGs with elements

    Screen readers handle SVG accessibility differently across browsers. I initially flagged SVGs without aria-label, but missed that elements inside SVGs are valid accessible names.


    Dynamic Content

    Single-page apps (SPAs) often render content client-side. Cheerio only sees the initial HTML. Solution: I added a notice recommending browser-based tools (Axe DevTools, Lighthouse) for SPA testing.


    Iframe Content

    Iframes are separate documents. I can detect their presence but can't scan cross-origin content without violating CORS. I flag this limitation in the report.


    ARIA Overrides

    If an element has aria-hidden="true", it's invisible to screen readers — even if it has other accessibility issues. I had to adjust checks to respect ARIA states.


    What I'd Do Differently

    Use a headless browser for premium scans. Cheerio is fast but limited. For a paid tier, I'd add Playwright or Puppeteer to check rendered styles, computed contrast, and JavaScript-generated content.


    Add axe-core integration. The axe-core library is battle-tested and catches issues I haven't coded for yet. I wanted to build the core myself first to learn, but I'll likely integrate it soon.


    More granular reporting. Right now, results are grouped by WCAG level. I should add filtering by issue type, element, and page section.


    Lessons Learned

    1. Accessibility is hard. Even automated checks require nuance. There's no one-size-fits-all rule.
    2. Be honest about limitations. Users trust tools that admit what they can't do.
    3. Speed matters. Developers won't use a slow tool. Cheerio's simplicity pays off.
    4. Edge cases are infinite. Every new scan reveals a pattern I didn't anticipate.


    What's Already Live

    Since the initial build, I've shipped:
    • Continuous monitoring — scheduled scans with email alerts (paid tiers at $29/$79/$199/month)
    • Historical tracking — see how your accessibility score changes over time
    • Multi-page scans — crawl entire sites, not just one page
    • AI-powered fix suggestions — actionable code snippets to resolve each issue


    What's Next

    What I'm working on now:
    • PDF compliance reports — downloadable, shareable with clients or legal
    • CI/CD integration — GitHub Action to catch accessibility regressions before deploy
    • EU localization — Czech/German landing pages for the European market


    Try It Yourself

    AccessiGuard is free and always will be for single scans. No signup required.


    👉 accessiguard.app


    Scan your site, get actionable feedback, and fix issues before they become problems.





    Building in public as a solo founder. If you have feedback, questions, or want to discuss accessibility testing, drop a comment below. I read and respond to everything.





    Want more technical deep-dives? Follow me here on Dev.to. Next up: "How I Built Continuous Accessibility Monitoring with Cron Jobs and Serverless Functions."




    More...
Working...