I built a 20 kB React hook that doesn't care which AI you use — here's how streaming actually works

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • MyrinNew
    Senior Member
    • Feb 2024
    • 5168

    #1

    I built a 20 kB React hook that doesn't care which AI you use — here's how streaming actually works

    `---








    Most React AI chat libraries are secretly backend libraries.


    They stream directly from OpenAI, or through their own cloud, or via a framework-specific server

    adapter. The React hook is just a thin client on top of one particular provider. Switch from Claude

    to GPT-4? Rewrite the frontend. Migrate off Vercel? Rewrite the frontend. Add Groq for a faster path?

    Rewrite the frontend.


    But here's the thing: streaming AI chat is fundamentally just three events:


    data: {"type":"text","text":"Hello"}

    data: {"type":"text","text":", world"}

    data: {"type":"done"}


    That's it. text, done, error. Your React component shouldn't need to know anything more than that.


    So I built react-ai-stream (https://github.com/trimooo/react-ai-stream) — a backend-agnostic

    streaming hook that speaks this protocol. Any server that produces those three events works,

    regardless of which LLM is behind it.





    The architecture


    Here's the full picture:


    React UI /



    useAIChat() hook

    useSyncExternalStore



    Zustand store SSE parser + normalizer

    messages · loading · error ReadableStream → StreamChunk

    │ │

    └───────────────────────────────┘



    HTTP POST + SSE stream



    Your server (/api/chat)

    Next.js · Express · FastAPI · Go · Rails



    Anthropic · OpenAI · Groq · Custom


    The boundary in the middle is everything. The React layer speaks {type, text} over SSE. The server

    speaks whatever the LLM provider requires. Neither knows about the other's implementation.





    How streaming actually works


    Most tutorials skip the networking part. Here's what's actually happening.


    Server-Sent Events (SSE) is a one-directional HTTP protocol: the server opens a connection and keeps

    sending data:


    HTTP/1.1 200 OK

    Content-Type: text/event-stream

    Cache-Control: no-cache


    data: {"type":"text","text":"Hello"}


    data: {"type":"text","text":", world"}


    data: {"type":"done"}


    The double newline (\n\n) is the event delimiter. Your API route receives the user's messages, calls

    the LLM, and re-emits tokens in this format.


    The buffering problem nobody talks about


    Here's where most implementations have a subtle bug. Network chunks don't align with SSE event

    boundaries. One reader.read() call might return half an event. The next call might return three

    events and the beginning of a fourth.


    The correct pattern:


    let buf = ''

    while (true) {

    const { done, value } = await reader.read()

    if (done) break

    buf += decoder.decode(value, { stream: true })

    const parts = buf.split('\n\n')

    buf = parts.pop() ?? '' // ← preserve the incomplete tail

    for (const part of parts) {

    // process complete events

    }

    }


    The critical invariant: buf = parts.pop() keeps the incomplete trailing event. If you write buf = ''

    inside the loop (I've seen this in production code), you silently drop buffered content. No error.

    The message just ends mid-sentence sometimes.





    10 lines to a streaming chat


    'use client'

    import { useAIChat } from '@react-ai-stream/react'

    import { Chat } from '@react-ai-stream/ui'

    import '@react-ai-stream/ui/styles'


    export default function Page() {

    const { messages, sendMessage, loading, stop } = useAIChat({

    endpoint: '/api/chat', // any streaming endpoint

    })

    return

    }


    The hook has no dependency on the UI package. You can wire messages to any component — Tailwind,

    shadcn/ui, a floating widget, a sidebar panel. is opt-in.





    Why "backend-agnostic" is the right abstraction


    Compare these two approaches:


    Coupled approach — OpenAI SDK in the browser:


    // Your LLM choice is now in your bundle.

    // Your API key is exposed.

    // Switching providers requires a frontend deploy.

    import OpenAI from 'openai'

    const client = new OpenAI({ apiKey: process.env.NEXT_PUBLIC_KEY, dangerouslyAllowBrowser: true })


    Decoupled approach — hook speaks HTTP:


    // The frontend doesn't know or care what's behind this endpoint.

    // It could be GPT-4 today, Claude tomorrow, a local Llama next week.

    const chat = useAIChat({ endpoint: '/api/chat' })


    The server-side API route handles provider selection. It might route to Anthropic by default, fall

    back to Groq during an outage, and serve EU traffic to a region-compliant endpoint — all without

    touching the React component.


    This also means you can run three providers simultaneously in complete isolation:


    const claude = useAIChat({ endpoint: '/api/chat?provider=anthropic' })

    const gpt = useAIChat({ endpoint: '/api/chat?provider=openai' })

    const groq = useAIChat({ endpoint: '/api/chat?provider=groq' })


    Each instance has its own message history, loading state, and abort controller. No shared context

    required.





    The React rendering challenge


    The naive implementation of streaming into React state has a real performance problem:


    // This fires a state update — and a re-render — for every token.

    // At 50 tokens/second, that's 50 re-renders/second.

    setResponse(prev => prev + token)


    React 18 batches some updates, but async loop callbacks aren't always batched. During fast streaming

    you can get tens of renders per second from a single useAIChat call.


    The library solves this by using Zustand's createStore (the vanilla, framework-agnostic version)

    combined with useSyncExternalStore:


    // The store lives outside React.

    // It mutates at whatever rate tokens arrive.

    // useSyncExternalStore decides when React re-renders.

    const storeRef = useRef(createStore())

    const state = useSyncExternalStore(

    storeRef.current.subscribe,

    storeRef.current.getState

    )


    The mutation rate and the render rate are decoupled. The store can receive 100 tokens/second while

    React batches updates efficiently.


    This also enables true isolation. Each useAIChat() call creates its own store instance via a ref.

    Three hook calls → three completely independent stores → three isolated chat instances. No

    wrapping needed, no cross-component re-renders.





    How abort propagates end-to-end


    The stop button works through a chain of signals most people don't trace all the way:


    user clicks Stop

    → abortController.abort()

    → fetch rejects (AbortError)

    → stream loop catches isAbortError() — true

    → loading → false, no error surfaced

    → partial response preserved in messages


    On the server side, req.signal reflects this abort too. Forwarding it to the upstream LLM call

    cancels token generation before it completes:


    const upstream = await fetch(LLM_API_URL, {

    signal: req.signal, // ← the user stopping the stream cancels the LLM call

    body: JSON.stringify({ messages, stream: true }),

    })


    That's waste reduction at the infrastructure level, not just UI polish.





    What's in the library


    Three packages, all MIT, ~20 kB total:


    Package: @react-ai-stream/core

    What it does: SSE parser, chunk normalizer, Zustand store factory, abort utils — no React dep

    ────────────────────────────────────────

    Package: @react-ai-stream/react

    What it does: useAIChat hook, AIChatProvider context

    ────────────────────────────────────────

    Package: @react-ai-stream/ui

    What it does: , , with syntax highlighting


    Built with: TypeScript strict mode, tsup (ESM + CJS), Vitest (34 tests), Turborepo monorepo.





    Try it


    npm install @react-ai-stream/react

    The architecture page (https://react-ai-stream-docs.vercel.app/architecture) and How streaming works

    (https://react-ai-stream-docs.vercel....ming-explained) have the full technical detail

    if you want to go deeper.





    What I'd like to hear


    If you've built AI chat in React, I'm curious: what was the hardest part? Provider coupling,

    streaming reliability, render performance, something else? The answer will probably shape what this

    library focuses on next.


    ---`




    More...
Working...