I built a 20 kB React hook that doesn't care which AI you use — here's how streaming actually works

**MyrinNew** · 05-09-2026, 06:33 PM

`---

Most React AI chat libraries are secretly backend libraries.

They stream directly from OpenAI, or through their own cloud, or via a framework-specific server

adapter. The React hook is just a thin client on top of one particular provider. Switch from Claude

to GPT-4? Rewrite the frontend. Migrate off Vercel? Rewrite the frontend. Add Groq for a faster path?

Rewrite the frontend.

But here's the thing: streaming AI chat is fundamentally just three events:

data: {"type":"text","text":"Hello"}

data: {"type":"text","text":", world"}

data: {"type":"done"}

That's it. text, done, error. Your React component shouldn't need to know anything more than that.

So I built react-ai-stream (https://github.com/trimooo/react-ai-stream) — a backend-agnostic

streaming hook that speaks this protocol. Any server that produces those three events works,

regardless of which LLM is behind it.

The architecture

Here's the full picture:

React UI /

│

useAIChat() hook

useSyncExternalStore

│

Zustand store SSE parser + normalizer

messages · loading · error ReadableStream → StreamChunk

│ │

└───────────────────────────────┘

│

HTTP POST + SSE stream

│

Your server (/api/chat)

Next.js · Express · FastAPI · Go · Rails

│

Anthropic · OpenAI · Groq · Custom

The boundary in the middle is everything. The React layer speaks {type, text} over SSE. The server

speaks whatever the LLM provider requires. Neither knows about the other's implementation.

How streaming actually works

Most tutorials skip the networking part. Here's what's actually happening.

Server-Sent Events (SSE) is a one-directional HTTP protocol: the server opens a connection and keeps

sending data:

HTTP/1.1 200 OK

Content-Type: text/event-stream

Cache-Control: no-cache

data: {"type":"text","text":"Hello"}

data: {"type":"text","text":", world"}

data: {"type":"done"}

The double newline (\n\n) is the event delimiter. Your API route receives the user's messages, calls

the LLM, and re-emits tokens in this format.

The buffering problem nobody talks about

Here's where most implementations have a subtle bug. Network chunks don't align with SSE event

boundaries. One reader.read() call might return half an event. The next call might return three

events and the beginning of a fourth.

The correct pattern:

let buf = ''

while (true) {

const { done, value } = await reader.read()

if (done) break

buf += decoder.decode(value, { stream: true })

const parts = buf.split('\n\n')

buf = parts.pop() ?? '' // ← preserve the incomplete tail

for (const part of parts) {

// process complete events

}

}

The critical invariant: buf = parts.pop() keeps the incomplete trailing event. If you write buf = ''

inside the loop (I've seen this in production code), you silently drop buffered content. No error.

The message just ends mid-sentence sometimes.

10 lines to a streaming chat

'use client'

import { useAIChat } from '@react-ai-stream/react'

import { Chat } from '@react-ai-stream/ui'

import '@react-ai-stream/ui/styles'

export default function Page() {

const { messages, sendMessage, loading, stop } = useAIChat({

endpoint: '/api/chat', // any streaming endpoint

})

return

}

The hook has no dependency on the UI package. You can wire messages to any component — Tailwind,

shadcn/ui, a floating widget, a sidebar panel. is opt-in.

Why "backend-agnostic" is the right abstraction

Compare these two approaches:

Coupled approach — OpenAI SDK in the browser:

// Your LLM choice is now in your bundle.

// Your API key is exposed.

// Switching providers requires a frontend deploy.

import OpenAI from 'openai'

const client = new OpenAI({ apiKey: process.env.NEXT_PUBLIC_KEY, dangerouslyAllowBrowser: true })

Decoupled approach — hook speaks HTTP:

// The frontend doesn't know or care what's behind this endpoint.

// It could be GPT-4 today, Claude tomorrow, a local Llama next week.

const chat = useAIChat({ endpoint: '/api/chat' })

The server-side API route handles provider selection. It might route to Anthropic by default, fall

back to Groq during an outage, and serve EU traffic to a region-compliant endpoint — all without

touching the React component.

This also means you can run three providers simultaneously in complete isolation:

const claude = useAIChat({ endpoint: '/api/chat?provider=anthropic' })

const gpt = useAIChat({ endpoint: '/api/chat?provider=openai' })

const groq = useAIChat({ endpoint: '/api/chat?provider=groq' })

Each instance has its own message history, loading state, and abort controller. No shared context

required.

The React rendering challenge

The naive implementation of streaming into React state has a real performance problem:

// This fires a state update — and a re-render — for every token.

// At 50 tokens/second, that's 50 re-renders/second.

setResponse(prev => prev + token)

React 18 batches some updates, but async loop callbacks aren't always batched. During fast streaming

you can get tens of renders per second from a single useAIChat call.

The library solves this by using Zustand's createStore (the vanilla, framework-agnostic version)

combined with useSyncExternalStore:

// The store lives outside React.

// It mutates at whatever rate tokens arrive.

// useSyncExternalStore decides when React re-renders.

const storeRef = useRef(createStore())

const state = useSyncExternalStore(

storeRef.current.subscribe,

storeRef.current.getState

)

The mutation rate and the render rate are decoupled. The store can receive 100 tokens/second while

React batches updates efficiently.

This also enables true isolation. Each useAIChat() call creates its own store instance via a ref.

Three hook calls → three completely independent stores → three isolated chat instances. No

wrapping needed, no cross-component re-renders.

How abort propagates end-to-end

The stop button works through a chain of signals most people don't trace all the way:

user clicks Stop

→ abortController.abort()

→ fetch rejects (AbortError)

→ stream loop catches isAbortError() — true

→ loading → false, no error surfaced

→ partial response preserved in messages

On the server side, req.signal reflects this abort too. Forwarding it to the upstream LLM call

cancels token generation before it completes:

const upstream = await fetch(LLM_API_URL, {

signal: req.signal, // ← the user stopping the stream cancels the LLM call

body: JSON.stringify({ messages, stream: true }),

})

That's waste reduction at the infrastructure level, not just UI polish.

What's in the library

Three packages, all MIT, ~20 kB total:

Package: @react-ai-stream/core

What it does: SSE parser, chunk normalizer, Zustand store factory, abort utils — no React dep

────────────────────────────────────────

Package: @react-ai-stream/react

What it does: useAIChat hook, AIChatProvider context

────────────────────────────────────────

Package: @react-ai-stream/ui

What it does: , , with syntax highlighting

Built with: TypeScript strict mode, tsup (ESM + CJS), Vitest (34 tests), Turborepo monorepo.

Try it

npm install @react-ai-stream/react

Live demo (https://react-ai-stream-example.vercel.app) — three models streaming in parallel via
Groq
Docs (https://react-ai-stream-docs.vercel.app) — quickstart, provider setup, API reference
GitHub (https://github.com/trimooo/react-ai-stream) — source, examples, architecture deep-dive

The architecture page (https://react-ai-stream-docs.vercel.app/architecture) and How streaming works

(https://react-ai-stream-docs.vercel....ming-explained) have the full technical detail

if you want to go deeper.

What I'd like to hear

If you've built AI chat in React, I'm curious: what was the hardest part? Provider coupling,

streaming reliability, render performance, something else? The answer will probably shape what this

library focuses on next.

---`

More...