AI Chatbot UI and Streaming Response Patterns
AI Chatbot UI and Streaming Response Patterns
Real-World Problem Context
Every frontend team is now building some form of AI chat interface — whether it's a customer support chatbot, a code assistant panel, a documentation Q&A widget, or a full ChatGPT-like experience embedded in their product. These UIs have unique frontend challenges: streaming responses token-by-token (not waiting for complete responses), rendering markdown and code blocks in real-time as they arrive, managing conversation state across sessions, handling tool calls and multi-step AI actions, building responsive message bubbles that handle images/code/tables, and creating the "typing indicator" experience that feels natural. This post covers the internal architecture of building production-quality AI chat UIs — the streaming protocol, incremental markdown parsing, message state management, and the subtle UX patterns that make chat interfaces feel good.
Problem Statements
-
Streaming Protocol: How does the frontend receive and process tokens as they stream from an LLM API, handle connection drops and retries, and render partial text that might be mid-word or mid-markdown-syntax?
-
Incremental Rendering: How do you parse and render markdown (with code blocks, tables, links) incrementally as text streams in — without re-parsing the entire message on every token?
-
Complex Message Types: Beyond plain text, how do you handle messages containing code with syntax highlighting, tool call results, images, structured data, citations, and interactive elements?
Deep Dive: Internal Mechanisms
1. Chat UI Architecture Overview
/*
* AI Chat UI component tree:
*
* <ChatContainer>
* │
* ├── <MessageList>
* │ ├── <Message role="user">
* │ │ └── <UserBubble text="..." />
* │ │
* │ ├── <Message role="assistant">
* │ │ ├── <AssistantBubble>
* │ │ │ ├── <StreamingMarkdown text="..." />
* │ │ │ ├── <CodeBlock lang="tsx" code="..." />
* │ │ │ └── <Citation sources=[...] />
* │ │ └── <MessageActions> (copy, regenerate, thumbs)
* │ │
* │ ├── <Message role="tool">
* │ │ └── <ToolCallResult name="search" result={...} />
* │ │
* │ └── <TypingIndicator /> (while streaming)
* │
* ├── <InputArea>
* │ ├── <textarea /> (auto-resize)
* │ ├── <AttachmentPicker />
* │ ├── <ModelSelector />
* │ └── <SendButton /> / <StopButton />
* │
* └── <ConversationSidebar>
* └── <ConversationList>
*
* State management:
* messages: Message[] (conversation history)
* isStreaming: boolean (show stop button, disable send)
* streamingContent: string (accumulating assistant reply)
* abortController: ref (cancel in-flight request)
*/
2. Server-Sent Events (SSE) Streaming Protocol
/*
* Most LLM APIs use Server-Sent Events (SSE) for streaming:
*
* HTTP Request:
* POST /api/chat
* Content-Type: application/json
* { "messages": [...], "stream": true }
*
* HTTP Response (SSE):
* Content-Type: text/event-stream
*
* data: {"id":"chatcmpl-1","choices":[{"delta":{"content":"Hello"}}]}
*
* data: {"id":"chatcmpl-1","choices":[{"delta":{"content":" world"}}]}
*
* data: {"id":"chatcmpl-1","choices":[{"delta":{"content":"!"}}]}
*
* data: [DONE]
*
* Each "data:" line is one chunk (usually 1-3 tokens).
* The frontend must parse this stream incrementally.
*/
async function streamChat(messages, onChunk, onDone, signal) {
const response = await fetch('/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ messages, stream: true }),
signal, // AbortController signal for cancellation
});
if (!response.ok) {
throw new Error(`Chat API error: ${response.status}`);
}
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
// Decode bytes to string and add to buffer:
buffer += decoder.decode(value, { stream: true });
// Process complete SSE lines:
const lines = buffer.split('\n');
buffer = lines.pop() || ''; // Keep incomplete last line
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = line.slice(6);
if (data === '[DONE]') {
onDone();
return;
}
try {
const parsed = JSON.parse(data);
const content = parsed.choices?.[0]?.delta?.content;
if (content) {
onChunk(content);
}
// Handle tool calls in the stream:
const toolCall = parsed.choices?.[0]?.delta?.tool_calls?.[0];
if (toolCall) {
onChunk({ type: 'tool_call', data: toolCall });
}
} catch (e) {
// Malformed JSON chunk — skip
}
}
}
}
}
3. Message State Management
/*
* Chat state is surprisingly complex:
*
* Message lifecycle:
* user types → optimistic add → API call → streaming → complete
* │
* tool call detected
* │
* execute tool → add result
* │
* continue streaming
*/
function useChatState() {
const [messages, setMessages] = useState([]);
const [isStreaming, setIsStreaming] = useState(false);
const streamingContentRef = useRef('');
const abortControllerRef = useRef(null);
const sendMessage = useCallback(async (userInput) => {
// 1. Add user message immediately (optimistic):
const userMessage = {
id: crypto.randomUUID(),
role: 'user',
content: userInput,
timestamp: Date.now(),
};
// 2. Add placeholder for assistant response:
const assistantMessage = {
id: crypto.randomUUID(),
role: 'assistant',
content: '',
timestamp: Date.now(),
isStreaming: true,
};
setMessages(prev => [...prev, userMessage, assistantMessage]);
setIsStreaming(true);
streamingContentRef.current = '';
// 3. Create abort controller for cancellation:
abortControllerRef.current = new AbortController();
try {
await streamChat(
// Send full conversation history:
[...messages, userMessage].map(m => ({
role: m.role,
content: m.content,
})),
// On each chunk:
(chunk) => {
if (typeof chunk === 'string') {
streamingContentRef.current += chunk;
// Update the assistant message content:
setMessages(prev => prev.map(m =>
m.id === assistantMessage.id
? { ...m, content: streamingContentRef.current }
: m
));
}
},
// On done:
() => {
setMessages(prev => prev.map(m =>
m.id === assistantMessage.id
? { ...m, isStreaming: false }
: m
));
setIsStreaming(false);
},
abortControllerRef.current.signal
);
} catch (error) {
if (error.name === 'AbortError') {
// User clicked stop — keep partial response:
setMessages(prev => prev.map(m =>
m.id === assistantMessage.id
? { ...m, isStreaming: false, stopped: true }
: m
));
} else {
// Real error — show error state:
setMessages(prev => prev.map(m =>
m.id === assistantMessage.id
? { ...m, isStreaming: false, error: error.message }
: m
));
}
setIsStreaming(false);
}
}, [messages]);
const stopGeneration = useCallback(() => {
abortControllerRef.current?.abort();
}, []);
return { messages, isStreaming, sendMessage, stopGeneration };
}
4. Incremental Markdown Rendering
/*
* Challenge: Markdown arrives incomplete.
*
* Stream progress:
* "Here's a " → plain text
* "Here's a **bo" → incomplete bold
* "Here's a **bold**" → complete bold
* "Here's a **bold** `co" → incomplete code
* "Here's a **bold** `code`" → complete code
* "```\nconst" → incomplete code block
* "```\nconst x = 1\n```" → complete code block
*
* Naive approach: re-parse full text on every chunk.
* Problem: Expensive for long messages. Causes flicker.
*
* Better: Incremental parsing with a streaming-aware parser.
*/
function useStreamingMarkdown(content, isStreaming) {
const [rendered, setRendered] = useState([]);
const parserStateRef = useRef(null);
useEffect(() => {
if (!content) {
setRendered([]);
return;
}
// Split into completed blocks and in-progress tail:
const { completedBlocks, inProgressTail } = splitAtLastSafePoint(content);
// Parse completed blocks (stable, won't change):
const parsedBlocks = parseMarkdown(completedBlocks);
// Render in-progress tail as plain text (might be mid-syntax):
const tailElement = inProgressTail
? { type: 'text', content: inProgressTail }
: null;
setRendered(tailElement ? [...parsedBlocks, tailElement] : parsedBlocks);
}, [content]);
return rendered;
}
function splitAtLastSafePoint(text) {
/*
* Find the last "safe" split point where we know
* the markdown syntax is complete:
*
* Safe points:
* - End of a complete paragraph (double newline)
* - End of a complete code block (closing ```)
* - End of a complete list item
*
* The text after the last safe point is the "tail"
* that might be mid-syntax (e.g., "**bo" or "```\ncon").
*/
// Check if we're inside a code block:
const codeBlockOpens = (text.match(/```/g) || []).length;
const insideCodeBlock = codeBlockOpens % 2 === 1;
if (insideCodeBlock) {
// Everything after the last ``` is the tail:
const lastOpen = text.lastIndexOf('```');
return {
completedBlocks: text.substring(0, lastOpen),
inProgressTail: text.substring(lastOpen),
};
}
// Find the last double newline (paragraph break):
const lastParagraphBreak = text.lastIndexOf('\n\n');
if (lastParagraphBreak > 0) {
return {
completedBlocks: text.substring(0, lastParagraphBreak),
inProgressTail: text.substring(lastParagraphBreak),
};
}
// No safe split point — treat all as in-progress:
return {
completedBlocks: '',
inProgressTail: text,
};
}
/*
* For code blocks specifically, use streaming syntax highlighting:
*/
function StreamingCodeBlock({ language, code, isStreaming }) {
// Only re-highlight when a complete new line arrives:
const [highlightedLines, setHighlightedLines] = useState([]);
const lastLineCountRef = useRef(0);
useEffect(() => {
const lines = code.split('\n');
const completeLines = isStreaming ? lines.slice(0, -1) : lines;
const inProgressLine = isStreaming ? lines[lines.length - 1] : null;
// Only highlight new lines (don't re-highlight everything):
if (completeLines.length > lastLineCountRef.current) {
const newLines = completeLines.slice(lastLineCountRef.current);
const highlighted = highlightCode(newLines.join('\n'), language);
setHighlightedLines(prev => [
...prev,
...highlighted.split('\n').map(line => ({ html: line, complete: true })),
]);
lastLineCountRef.current = completeLines.length;
}
// Append in-progress line without highlighting:
if (inProgressLine) {
setHighlightedLines(prev => [
...prev.filter(l => l.complete),
{ html: escapeHtml(inProgressLine), complete: false },
]);
}
}, [code, isStreaming, language]);
return (
<pre className="bg-gray-900 rounded-lg p-4 overflow-x-auto">
<code>
{highlightedLines.map((line, i) => (
<div key={i} dangerouslySetInnerHTML={{ __html: line.html }} />
))}
{isStreaming && <span className="animate-pulse">▋</span>}
</code>
</pre>
);
}
5. Auto-Scrolling Behavior
/*
* Auto-scroll is deceptively tricky:
*
* Rules:
* 1. While streaming, auto-scroll to bottom (follow new content)
* 2. If user manually scrolls UP, STOP auto-scrolling
* 3. If user scrolls back to bottom, RESUME auto-scrolling
* 4. New user message: always scroll to bottom
*
* The "scroll anchor" pattern:
*/
function useAutoScroll(containerRef, isStreaming) {
const isUserScrolledUp = useRef(false);
const lastScrollTop = useRef(0);
// Detect if user scrolled up manually:
useEffect(() => {
const container = containerRef.current;
if (!container) return;
const handleScroll = () => {
const { scrollTop, scrollHeight, clientHeight } = container;
const distanceFromBottom = scrollHeight - scrollTop - clientHeight;
// User is "at bottom" if within 50px:
const atBottom = distanceFromBottom < 50;
// Detect upward scroll (user action, not programmatic):
if (scrollTop < lastScrollTop.current && !atBottom) {
isUserScrolledUp.current = true;
}
if (atBottom) {
isUserScrolledUp.current = false;
}
lastScrollTop.current = scrollTop;
};
container.addEventListener('scroll', handleScroll, { passive: true });
return () => container.removeEventListener('scroll', handleScroll);
}, []);
// Auto-scroll to bottom when content changes:
const scrollToBottom = useCallback((force = false) => {
const container = containerRef.current;
if (!container) return;
if (force || !isUserScrolledUp.current) {
// Use requestAnimationFrame for smooth scroll:
requestAnimationFrame(() => {
container.scrollTo({
top: container.scrollHeight,
behavior: force ? 'smooth' : 'instant',
});
});
}
}, []);
return { scrollToBottom, isUserScrolledUp };
}
6. Tool Calls and Multi-Step Actions
/*
* Modern LLMs can call tools during a conversation:
*
* User: "What's the weather in Tokyo?"
* │
* ▼
* LLM: [tool_call: get_weather({city: "Tokyo"})]
* │
* ▼
* Frontend executes tool → gets result
* │
* ▼
* LLM resumes: "The weather in Tokyo is 22°C and sunny."
*
* The UI needs to show tool calls in progress:
*
* ┌──────────────────────────┐
* │ 🔧 Searching weather... │ ← tool call in progress
* │ ┌──────────────────────┐ │
* │ │ get_weather("Tokyo") │ │ ← collapsible detail
* │ │ Result: 22°C, Sunny │ │
* │ └──────────────────────┘ │
* │ │
* │ The weather in Tokyo is │ ← response continues after
* │ 22°C and sunny today. │
* └──────────────────────────┘
*/
function ToolCallMessage({ toolCall, result, isExecuting }) {
const [isExpanded, setIsExpanded] = useState(false);
return (
<div className="border rounded-lg p-3 my-2 bg-gray-50">
<button
onClick={() => setIsExpanded(!isExpanded)}
className="flex items-center gap-2 text-sm"
>
{isExecuting ? (
<Spinner className="w-4 h-4" />
) : (
<CheckIcon className="w-4 h-4 text-green-500" />
)}
<span className="font-medium">
{toolCall.function.name}
</span>
<ChevronIcon className={isExpanded ? 'rotate-180' : ''} />
</button>
{isExpanded && (
<div className="mt-2 text-sm">
<div className="text-gray-500">Input:</div>
<pre className="bg-white p-2 rounded text-xs">
{JSON.stringify(
JSON.parse(toolCall.function.arguments), null, 2
)}
</pre>
{result && (
<>
<div className="text-gray-500 mt-2">Result:</div>
<pre className="bg-white p-2 rounded text-xs">
{JSON.stringify(result, null, 2)}
</pre>
</>
)}
</div>
)}
</div>
);
}
// Handling tool calls in the stream:
async function handleStreamWithToolCalls(messages, setMessages) {
let assistantContent = '';
let pendingToolCalls = [];
await streamChat(
messages,
(chunk) => {
if (chunk.type === 'tool_call') {
pendingToolCalls.push(chunk.data);
} else {
assistantContent += chunk;
}
},
async () => {
// If there are tool calls, execute them:
if (pendingToolCalls.length > 0) {
const toolResults = await Promise.all(
pendingToolCalls.map(async (tc) => {
const result = await executeToolCall(tc);
return {
tool_call_id: tc.id,
role: 'tool',
content: JSON.stringify(result),
};
})
);
// Continue the conversation with tool results:
const updatedMessages = [
...messages,
{ role: 'assistant', content: assistantContent, tool_calls: pendingToolCalls },
...toolResults,
];
// Stream the continuation:
await handleStreamWithToolCalls(updatedMessages, setMessages);
}
}
);
}
7. Input Area UX Patterns
/*
* The chat input has many specific UX requirements:
*
* 1. Auto-resize textarea (grows with content)
* 2. Submit on Enter, newline on Shift+Enter
* 3. Paste image support
* 4. Command shortcuts (/ for commands)
* 5. @mentions for context
* 6. Disable while streaming + show stop button
*/
function ChatInput({ onSend, isStreaming, onStop }) {
const textareaRef = useRef(null);
const [input, setInput] = useState('');
const [attachments, setAttachments] = useState([]);
// Auto-resize textarea:
useEffect(() => {
const textarea = textareaRef.current;
if (!textarea) return;
textarea.style.height = 'auto';
textarea.style.height = Math.min(textarea.scrollHeight, 200) + 'px';
}, [input]);
const handleKeyDown = (e) => {
if (e.key === 'Enter' && !e.shiftKey) {
e.preventDefault();
handleSend();
}
};
const handleSend = () => {
if (!input.trim() && attachments.length === 0) return;
if (isStreaming) return;
onSend({
content: input.trim(),
attachments,
});
setInput('');
setAttachments([]);
textareaRef.current?.focus();
};
// Handle paste (for images):
const handlePaste = (e) => {
const items = e.clipboardData?.items;
if (!items) return;
for (const item of items) {
if (item.type.startsWith('image/')) {
e.preventDefault();
const file = item.getAsFile();
const reader = new FileReader();
reader.onload = () => {
setAttachments(prev => [...prev, {
type: 'image',
data: reader.result,
name: file.name || 'pasted-image.png',
}]);
};
reader.readAsDataURL(file);
}
}
};
return (
<div className="border-t p-4">
{/* Attachment previews */}
{attachments.length > 0 && (
<div className="flex gap-2 mb-2">
{attachments.map((att, i) => (
<div key={i} className="relative">
<img
src={att.data}
alt={att.name}
className="w-16 h-16 object-cover rounded"
/>
<button
onClick={() => setAttachments(
prev => prev.filter((_, j) => j !== i)
)}
className="absolute -top-1 -right-1 bg-red-500 text-white rounded-full w-4 h-4 text-xs"
>
×
</button>
</div>
))}
</div>
)}
<div className="flex gap-2 items-end">
<textarea
ref={textareaRef}
value={input}
onChange={(e) => setInput(e.target.value)}
onKeyDown={handleKeyDown}
onPaste={handlePaste}
placeholder="Type a message..."
className="flex-1 resize-none border rounded-lg p-3 max-h-48 focus:outline-none focus:ring-2"
rows={1}
disabled={isStreaming}
/>
{isStreaming ? (
<button
onClick={onStop}
className="p-3 bg-red-500 text-white rounded-lg"
>
<StopIcon />
</button>
) : (
<button
onClick={handleSend}
disabled={!input.trim() && attachments.length === 0}
className="p-3 bg-blue-500 text-white rounded-lg disabled:opacity-50"
>
<SendIcon />
</button>
)}
</div>
</div>
);
}
8. Conversation Persistence and Branching
/*
* Chat conversations need:
* 1. Persistence (survive page refresh)
* 2. History (list of past conversations)
* 3. Branching (regenerate from any point)
*
* Data model:
*
* Conversation
* ├── id: string
* ├── title: string (auto-generated from first message)
* ├── createdAt: Date
* ├── updatedAt: Date
* └── messages: Message[]
* ├── id: string
* ├── role: 'user' | 'assistant' | 'system' | 'tool'
* ├── content: string
* ├── parentId: string | null ← for branching
* ├── children: string[] ← alternate responses
* └── metadata: { model, tokens, latency }
*
* Branching allows:
* User message 1
* └── Assistant response 1a
* └── User message 2
* ├── Assistant response 2a (original)
* └── Assistant response 2b (regenerated)
*/
class ConversationStore {
constructor() {
this.db = null;
}
async init() {
// IndexedDB for client-side persistence:
this.db = await openDB('chat-app', 1, {
upgrade(db) {
const convStore = db.createObjectStore('conversations', {
keyPath: 'id',
});
convStore.createIndex('updatedAt', 'updatedAt');
db.createObjectStore('messages', { keyPath: 'id' });
},
});
}
async saveMessage(conversationId, message) {
const tx = this.db.transaction(
['conversations', 'messages'], 'readwrite'
);
await tx.objectStore('messages').put(message);
// Update conversation's updatedAt:
const conv = await tx.objectStore('conversations').get(conversationId);
if (conv) {
conv.updatedAt = Date.now();
conv.messageCount = (conv.messageCount || 0) + 1;
await tx.objectStore('conversations').put(conv);
}
await tx.done;
}
async regenerateFrom(conversationId, messageId) {
// Get all messages after this point:
const messages = await this.getMessages(conversationId);
const messageIndex = messages.findIndex(m => m.id === messageId);
// Keep messages up to (not including) the one being regenerated:
const keptMessages = messages.slice(0, messageIndex);
// Mark the regenerated message as having a sibling:
const originalMessage = messages[messageIndex];
originalMessage.hasSibling = true;
return {
keptMessages,
originalMessage,
// The caller will generate a new response and add it
// as a sibling of originalMessage
};
}
}
9. Performance Optimization for Long Conversations
/*
* Conversations can have 100+ messages with code blocks.
* Rendering all of them causes jank.
*
* Optimizations:
* 1. Virtualized message list (only render visible messages)
* 2. Memoize parsed markdown (don't re-parse old messages)
* 3. Lazy syntax highlighting (only highlight visible code blocks)
* 4. Debounce streaming renders
*/
// 1. Virtualized message list:
function VirtualizedMessageList({ messages, isStreaming }) {
const parentRef = useRef(null);
const virtualizer = useVirtualizer({
count: messages.length,
getScrollElement: () => parentRef.current,
estimateSize: (index) => {
// Estimate message height based on content length:
const msg = messages[index];
const lines = msg.content.split('\n').length;
const hasCode = msg.content.includes('```');
return Math.max(60, lines * 24 + (hasCode ? 100 : 0));
},
overscan: 5,
});
return (
<div ref={parentRef} className="overflow-auto h-full">
<div
style={{ height: virtualizer.getTotalSize(), position: 'relative' }}
>
{virtualizer.getVirtualItems().map((virtualItem) => (
<div
key={virtualItem.key}
style={{
position: 'absolute',
top: virtualItem.start,
width: '100%',
}}
ref={virtualizer.measureElement}
data-index={virtualItem.index}
>
<MemoizedMessage message={messages[virtualItem.index]} />
</div>
))}
</div>
</div>
);
}
// 2. Memoize parsed markdown for completed messages:
const MemoizedMessage = memo(function Message({ message }) {
// Only re-render when content changes (streaming) or isStreaming toggles:
return (
<div className={`p-4 ${message.role === 'user' ? 'bg-gray-50' : ''}`}>
{message.isStreaming ? (
<StreamingMarkdown content={message.content} />
) : (
<ParsedMarkdown content={message.content} />
)}
</div>
);
}, (prev, next) => {
// Custom comparison: skip re-render if completed message unchanged
return prev.message.id === next.message.id
&& prev.message.content === next.message.content
&& prev.message.isStreaming === next.message.isStreaming;
});
// 3. Debounce streaming renders (batch token updates):
function useStreamingContent() {
const [displayContent, setDisplayContent] = useState('');
const bufferRef = useRef('');
const rafRef = useRef(null);
const appendChunk = useCallback((chunk) => {
bufferRef.current += chunk;
// Batch updates using requestAnimationFrame:
if (!rafRef.current) {
rafRef.current = requestAnimationFrame(() => {
setDisplayContent(bufferRef.current);
rafRef.current = null;
});
}
}, []);
return { displayContent, appendChunk };
}
10. Accessibility and Keyboard Navigation
/*
* Chat UIs must be accessible:
*
* 1. Screen readers announce new messages
* 2. Keyboard navigation between messages
* 3. Focus management (input stays focused)
* 4. Reduced motion for streaming animation
* 5. High contrast for code blocks
*/
function AccessibleChatMessage({ message, index }) {
return (
<div
role="log"
aria-label={`Message from ${message.role}`}
aria-live={message.isStreaming ? 'polite' : 'off'}
aria-atomic={!message.isStreaming}
tabIndex={0}
className="focus:outline-none focus:ring-2 focus:ring-blue-500 rounded"
>
{/* Screen reader label */}
<span className="sr-only">
{message.role === 'user' ? 'You said' : 'Assistant said'}:
</span>
<div aria-busy={message.isStreaming}>
<MessageContent content={message.content} />
</div>
{/* Message actions with keyboard support */}
<div role="toolbar" aria-label="Message actions">
<button
aria-label="Copy message"
onClick={() => navigator.clipboard.writeText(message.content)}
>
<CopyIcon />
</button>
{message.role === 'assistant' && (
<button aria-label="Regenerate response">
<RefreshIcon />
</button>
)}
</div>
</div>
);
}
// Live region for announcing new messages to screen readers:
function ChatLiveRegion({ messages }) {
const lastMessage = messages[messages.length - 1];
return (
<div
aria-live="polite"
aria-atomic
className="sr-only"
>
{lastMessage && !lastMessage.isStreaming && (
<p>
{lastMessage.role === 'assistant'
? `Assistant responded: ${lastMessage.content.substring(0, 200)}`
: `You sent: ${lastMessage.content.substring(0, 100)}`
}
</p>
)}
</div>
);
}
Trade-offs & Considerations
| Aspect | SSE Streaming | WebSocket | Long Polling |
|---|---|---|---|
| Browser support | Universal | Universal | Universal |
| Reconnection | Built-in (EventSource) | Manual | Built-in |
| Bidirectional | No (server→client only) | Yes | No |
| Proxy/CDN compat | Good | Often blocked | Good |
| Token-by-token | Natural fit | Works | Inefficient |
| Connection limit | 6 per domain (HTTP/1.1) | No limit | 6 per domain |
| Cancellation | Close connection | Send message | Don't poll |
Best Practices
-
Use
fetchwithReadableStreaminstead ofEventSourcefor SSE —EventSourcedoesn't support POST requests (needed to send message history) or custom headers (needed for auth tokens); usefetchwith streaming response body and manually parse the SSE protocol; this gives full control over the request while maintaining the streaming behavior. -
Batch streaming token updates using requestAnimationFrame — rendering on every single token (which can arrive at 50+ per second) causes unnecessary React re-renders and DOM thrashing; buffer incoming tokens and flush to state once per animation frame; this reduces render frequency from 50/s to ~60fps-matched updates with no visible difference to the user.
-
Split markdown parsing at "safe points" (paragraph breaks, complete code blocks) — don't re-parse the entire message content on every token; find the last safe boundary (double newline, closing ```, end of list) and only parse up to that point; render the in-progress tail as plain text; this prevents markdown syntax from flickering as it arrives character-by-character.
-
Implement auto-scroll that respects user intent — auto-scroll to follow streaming content, but immediately stop if the user scrolls up to read earlier messages; resume auto-scrolling only when the user scrolls back to the bottom; use a 50px threshold from the bottom to determine the "at bottom" state; this simple heuristic prevents the most frustrating UX issue in chat interfaces.
-
Store conversations in IndexedDB with message-level granularity for branching — persist each message individually with a parentId for tree-structured conversations; this enables regeneration (create a sibling response), editing (fork from any user message), and efficient updates (save one message, not the entire conversation); use the conversation store for offline support and instant load on revisit.
Conclusion
Building AI chat UIs requires solving several frontend-specific challenges: streaming token-by-token via SSE (parsed from fetch ReadableStream, not EventSource, to support POST and auth headers), incremental markdown rendering (split at safe boundaries to avoid mid-syntax flicker), complex state management (optimistic user messages, streaming assistant messages, tool call execution, abort/retry), auto-scroll behavior (follow streaming content but respect user scrolling up), input UX (auto-resize textarea, Enter to send, Shift+Enter for newline, paste image support), tool call visualization (collapsible inline cards showing tool input/output), conversation persistence (IndexedDB with message-level storage for branching/regeneration), and performance optimization (virtualized message list, memoized parsed markdown, requestAnimationFrame batching for streaming renders). The key UX difference between a good and bad chat UI is in these details: smooth streaming without flicker, auto-scroll that respects intent, responsive input area, and clear visual hierarchy for different message types (user, assistant, tool calls, errors).
What did you think?