Mastering a Difficult API: Building a Secure OAuth 2.1 + PKCE Integration in Production

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • MyrinNew
    Senior Member
    • Feb 2024
    • 5168

    #1

    Mastering a Difficult API: Building a Secure OAuth 2.1 + PKCE Integration in Production


    Why this article? OAuth looks simple in diagrams. In production, it is not. After implementing OAuth integrations across multiple products (consumer apps, B2B dashboards, browser extensions), I’ve learned that most failures don’t come from misunderstanding the spec — they come from edge cases the spec technically allows but your app cannot survive. This post is a deep, practical walkthrough of a real OAuth 2.1 + PKCE implementation, including the parts most tutorials skip.


    This article is intentionally long, detailed, and opinionated. If you’re looking for a copy‑paste snippet, this is not it. If you want an implementation that survives production traffic, browser quirks, and security reviews — keep reading.


    The problem OAuth tutorials don’t solve

    Most guides assume:
    • a single frontend
    • a single backend
    • no extensions
    • no mobile clients
    • no concurrent logins
    • no token rotation
    • no compromised browser state


    Reality:
    • You may have web + mobile + extension clients
    • Users can click “Login” multiple times
    • Browsers suspend tabs
    • Redirects fail
    • Tokens leak into logs
    • Refresh tokens get replayed


    OAuth still works — but only if you design around these realities.


    Architecture we’re actually building






    Client (SPA / Extension)

    │ 1. Authorization Request (PKCE)

    Authorization Server (OAuth Provider)

    │ 2. Authorization Code

    Backend (Token Broker)

    │ 3. Token Exchange + Rotation

    Secure API Access







    Key decision:


    The frontend NEVER talks to the token endpoint directly.


    This single decision eliminates 70% of real-world OAuth bugs.


    Step 1: Correct PKCE (the part people subtly break)

    Generate PKCE correctly


    Common mistakes:
    • using Math.random() ❌
    • reusing the verifier ❌
    • storing it in localStorage ❌


    Correct approach (browser):






    const verifier = base64UrlEncode(crypto.getRandomValues(new Uint8Array(32)))


    const challenge = base64UrlEncode(
    await crypto.subtle.digest(
    'SHA-256',
    new TextEncoder().encode(verifier)
    )
    )







    Where to store it?
    • Memory only (not persistent)
    • Indexed by state




    pkceStore[state] = { verifier, createdAt: Date.now() }







    This allows parallel login attempts without collisions.


    Step 2: State is not CSRF protection (only)

    Most articles say: “Use state to prevent CSRF.”


    That’s incomplete.


    We use state as:
    • CSRF protection
    • PKCE lookup key
    • Login attempt correlation ID
    • Replay protection




    State structure
    {
    "id": "uuid",
    "origin": "extension",
    "ts": 1730000000
    }







    Encoded + signed (HMAC) by the backend.


    Why sign it?






    Because the frontend cannot be trusted to preserve meaning — only bytes.







    Step 3: Authorization redirect (with real constraints)

    Never dynamically build redirect URIs in the client.


    Instead:
    • Frontend requests a login intent from backend
    • Backend returns:
      • authorization URL
      • signed state


    This prevents:
    • open redirect vulnerabilities
    • misconfigured environments
    • extension spoofing


    Step 4: Backend token exchange (the critical section)

    This is where most implementations are technically correct and practically unsafe.


    What we do differently

    1. Validate state signature
    2. Enforce single-use authorization codes
    3. Immediately rotate refresh tokens
    4. Bind tokens to a logical session




    if (state.used) {
    throw new Error('Replay detected')
    }







    Step 5: Refresh token rotation (non-optional)

    If your OAuth provider supports refresh token rotation and you’re not using it — your system is already compromised, you just don’t know it yet.


    Correct rotation logic






    BEGIN TRANSACTION

    if (refreshToken.isRevoked) fail()

    revoke(oldRefreshToken)
    store(newRefreshToken)

    COMMIT








    If any step fails, the session is invalidated.


    No retries. No grace period.


    Step 6: Handling silent failures

    Things that will happen:
    • user closes tab mid-login
    • provider redirects twice
    • browser restores old state


    Solution:
    • expire login intents aggressively
    • allow safe re-entry
    • never assume linear flow


    OAuth is a distributed system problem, not an auth problem.


    Production checklist (the part I wish I had)

    If you miss any one of these, your system will fail — quietly.


    Final thoughts

    OAuth isn’t hard because the spec is complex.


    It’s hard because:
    • browsers are unreliable
    • users are unpredictable
    • attackers are patient


    Once you design for that reality, OAuth becomes boring.


    And boring authentication is the highest compliment.


    If this helped you avoid even one production incident, it did its job.


    💬 Questions, edge cases, or disagreements? Let’s discuss in the comments.




    More...
Working...