AI Chatbot UI and Streaming Response Patterns

May 8, 202678 min read0 views

ai chatbot

streaming responses

frontend architecture

AI Chatbot UI and Streaming Response Patterns

Real-World Problem Context

Every frontend team is now building some form of AI chat interface — whether it's a customer support chatbot, a code assistant panel, a documentation Q&A widget, or a full ChatGPT-like experience embedded in their product. These UIs have unique frontend challenges: streaming responses token-by-token (not waiting for complete responses), rendering markdown and code blocks in real-time as they arrive, managing conversation state across sessions, handling tool calls and multi-step AI actions, building responsive message bubbles that handle images/code/tables, and creating the "typing indicator" experience that feels natural. This post covers the internal architecture of building production-quality AI chat UIs — the streaming protocol, incremental markdown parsing, message state management, and the subtle UX patterns that make chat interfaces feel good.

Problem Statements

Streaming Protocol: How does the frontend receive and process tokens as they stream from an LLM API, handle connection drops and retries, and render partial text that might be mid-word or mid-markdown-syntax?
Incremental Rendering: How do you parse and render markdown (with code blocks, tables, links) incrementally as text streams in — without re-parsing the entire message on every token?
Complex Message Types: Beyond plain text, how do you handle messages containing code with syntax highlighting, tool call results, images, structured data, citations, and interactive elements?

Deep Dive: Internal Mechanisms

1. Chat UI Architecture Overview

/*
 * AI Chat UI component tree:
 *
 *   <ChatContainer>
 *   │
 *   ├── <MessageList>
 *   │   ├── <Message role="user">
 *   │   │   └── <UserBubble text="..." />
 *   │   │
 *   │   ├── <Message role="assistant">
 *   │   │   ├── <AssistantBubble>
 *   │   │   │   ├── <StreamingMarkdown text="..." />
 *   │   │   │   ├── <CodeBlock lang="tsx" code="..." />
 *   │   │   │   └── <Citation sources=[...] />
 *   │   │   └── <MessageActions> (copy, regenerate, thumbs)
 *   │   │
 *   │   ├── <Message role="tool">
 *   │   │   └── <ToolCallResult name="search" result={...} />
 *   │   │
 *   │   └── <TypingIndicator /> (while streaming)
 *   │
 *   ├── <InputArea>
 *   │   ├── <textarea /> (auto-resize)
 *   │   ├── <AttachmentPicker />
 *   │   ├── <ModelSelector />
 *   │   └── <SendButton /> / <StopButton />
 *   │
 *   └── <ConversationSidebar>
 *       └── <ConversationList>
 *
 * State management:
 *   messages: Message[]       (conversation history)
 *   isStreaming: boolean      (show stop button, disable send)
 *   streamingContent: string  (accumulating assistant reply)
 *   abortController: ref     (cancel in-flight request)
 */

2. Server-Sent Events (SSE) Streaming Protocol

/*
 * Most LLM APIs use Server-Sent Events (SSE) for streaming:
 *
 * HTTP Request:
 *   POST /api/chat
 *   Content-Type: application/json
 *   { "messages": [...], "stream": true }
 *
 * HTTP Response (SSE):
 *   Content-Type: text/event-stream
 *
 *   data: {"id":"chatcmpl-1","choices":[{"delta":{"content":"Hello"}}]}
 *   
 *   data: {"id":"chatcmpl-1","choices":[{"delta":{"content":" world"}}]}
 *   
 *   data: {"id":"chatcmpl-1","choices":[{"delta":{"content":"!"}}]}
 *   
 *   data: [DONE]
 *
 * Each "data:" line is one chunk (usually 1-3 tokens).
 * The frontend must parse this stream incrementally.
 */

async function streamChat(messages, onChunk, onDone, signal) {
    const response = await fetch('/api/chat', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ messages, stream: true }),
        signal, // AbortController signal for cancellation
    });
    
    if (!response.ok) {
        throw new Error(`Chat API error: ${response.status}`);
    }
    
    const reader = response.body.getReader();
    const decoder = new TextDecoder();
    let buffer = '';
    
    while (true) {
        const { done, value } = await reader.read();
        
        if (done) break;
        
        // Decode bytes to string and add to buffer:
        buffer += decoder.decode(value, { stream: true });
        
        // Process complete SSE lines:
        const lines = buffer.split('\n');
        buffer = lines.pop() || ''; // Keep incomplete last line
        
        for (const line of lines) {
            if (line.startsWith('data: ')) {
                const data = line.slice(6);
                
                if (data === '[DONE]') {
                    onDone();
                    return;
                }
                
                try {
                    const parsed = JSON.parse(data);
                    const content = parsed.choices?.[0]?.delta?.content;
                    
                    if (content) {
                        onChunk(content);
                    }
                    
                    // Handle tool calls in the stream:
                    const toolCall = parsed.choices?.[0]?.delta?.tool_calls?.[0];
                    if (toolCall) {
                        onChunk({ type: 'tool_call', data: toolCall });
                    }
                } catch (e) {
                    // Malformed JSON chunk — skip
                }
            }
        }
    }
}

3. Message State Management

/*
 * Chat state is surprisingly complex:
 *
 * Message lifecycle:
 *   user types → optimistic add → API call → streaming → complete
 *                                              │
 *                                    tool call detected
 *                                              │
 *                                    execute tool → add result
 *                                              │
 *                                    continue streaming
 */

function useChatState() {
    const [messages, setMessages] = useState([]);
    const [isStreaming, setIsStreaming] = useState(false);
    const streamingContentRef = useRef('');
    const abortControllerRef = useRef(null);
    
    const sendMessage = useCallback(async (userInput) => {
        // 1. Add user message immediately (optimistic):
        const userMessage = {
            id: crypto.randomUUID(),
            role: 'user',
            content: userInput,
            timestamp: Date.now(),
        };
        
        // 2. Add placeholder for assistant response:
        const assistantMessage = {
            id: crypto.randomUUID(),
            role: 'assistant',
            content: '',
            timestamp: Date.now(),
            isStreaming: true,
        };
        
        setMessages(prev => [...prev, userMessage, assistantMessage]);
        setIsStreaming(true);
        streamingContentRef.current = '';
        
        // 3. Create abort controller for cancellation:
        abortControllerRef.current = new AbortController();
        
        try {
            await streamChat(
                // Send full conversation history:
                [...messages, userMessage].map(m => ({
                    role: m.role,
                    content: m.content,
                })),
                
                // On each chunk:
                (chunk) => {
                    if (typeof chunk === 'string') {
                        streamingContentRef.current += chunk;
                        
                        // Update the assistant message content:
                        setMessages(prev => prev.map(m =>
                            m.id === assistantMessage.id
                                ? { ...m, content: streamingContentRef.current }
                                : m
                        ));
                    }
                },
                
                // On done:
                () => {
                    setMessages(prev => prev.map(m =>
                        m.id === assistantMessage.id
                            ? { ...m, isStreaming: false }
                            : m
                    ));
                    setIsStreaming(false);
                },
                
                abortControllerRef.current.signal
            );
        } catch (error) {
            if (error.name === 'AbortError') {
                // User clicked stop — keep partial response:
                setMessages(prev => prev.map(m =>
                    m.id === assistantMessage.id
                        ? { ...m, isStreaming: false, stopped: true }
                        : m
                ));
            } else {
                // Real error — show error state:
                setMessages(prev => prev.map(m =>
                    m.id === assistantMessage.id
                        ? { ...m, isStreaming: false, error: error.message }
                        : m
                ));
            }
            setIsStreaming(false);
        }
    }, [messages]);
    
    const stopGeneration = useCallback(() => {
        abortControllerRef.current?.abort();
    }, []);
    
    return { messages, isStreaming, sendMessage, stopGeneration };
}

4. Incremental Markdown Rendering

/*
 * Challenge: Markdown arrives incomplete.
 *
 * Stream progress:
 *   "Here's a "          → plain text
 *   "Here's a **bo"      → incomplete bold
 *   "Here's a **bold**"  → complete bold
 *   "Here's a **bold** `co"  → incomplete code
 *   "Here's a **bold** `code`" → complete code
 *   "```\nconst"         → incomplete code block
 *   "```\nconst x = 1\n```" → complete code block
 *
 * Naive approach: re-parse full text on every chunk.
 * Problem: Expensive for long messages. Causes flicker.
 *
 * Better: Incremental parsing with a streaming-aware parser.
 */

function useStreamingMarkdown(content, isStreaming) {
    const [rendered, setRendered] = useState([]);
    const parserStateRef = useRef(null);
    
    useEffect(() => {
        if (!content) {
            setRendered([]);
            return;
        }
        
        // Split into completed blocks and in-progress tail:
        const { completedBlocks, inProgressTail } = splitAtLastSafePoint(content);
        
        // Parse completed blocks (stable, won't change):
        const parsedBlocks = parseMarkdown(completedBlocks);
        
        // Render in-progress tail as plain text (might be mid-syntax):
        const tailElement = inProgressTail
            ? { type: 'text', content: inProgressTail }
            : null;
        
        setRendered(tailElement ? [...parsedBlocks, tailElement] : parsedBlocks);
    }, [content]);
    
    return rendered;
}

function splitAtLastSafePoint(text) {
    /*
     * Find the last "safe" split point where we know
     * the markdown syntax is complete:
     *
     * Safe points:
     * - End of a complete paragraph (double newline)
     * - End of a complete code block (closing ```)
     * - End of a complete list item
     *
     * The text after the last safe point is the "tail"
     * that might be mid-syntax (e.g., "**bo" or "```\ncon").
     */
    
    // Check if we're inside a code block:
    const codeBlockOpens = (text.match(/```/g) || []).length;
    const insideCodeBlock = codeBlockOpens % 2 === 1;
    
    if (insideCodeBlock) {
        // Everything after the last ``` is the tail:
        const lastOpen = text.lastIndexOf('```');
        return {
            completedBlocks: text.substring(0, lastOpen),
            inProgressTail: text.substring(lastOpen),
        };
    }
    
    // Find the last double newline (paragraph break):
    const lastParagraphBreak = text.lastIndexOf('\n\n');
    
    if (lastParagraphBreak > 0) {
        return {
            completedBlocks: text.substring(0, lastParagraphBreak),
            inProgressTail: text.substring(lastParagraphBreak),
        };
    }
    
    // No safe split point — treat all as in-progress:
    return {
        completedBlocks: '',
        inProgressTail: text,
    };
}

/*
 * For code blocks specifically, use streaming syntax highlighting:
 */
function StreamingCodeBlock({ language, code, isStreaming }) {
    // Only re-highlight when a complete new line arrives:
    const [highlightedLines, setHighlightedLines] = useState([]);
    const lastLineCountRef = useRef(0);
    
    useEffect(() => {
        const lines = code.split('\n');
        const completeLines = isStreaming ? lines.slice(0, -1) : lines;
        const inProgressLine = isStreaming ? lines[lines.length - 1] : null;
        
        // Only highlight new lines (don't re-highlight everything):
        if (completeLines.length > lastLineCountRef.current) {
            const newLines = completeLines.slice(lastLineCountRef.current);
            const highlighted = highlightCode(newLines.join('\n'), language);
            
            setHighlightedLines(prev => [
                ...prev,
                ...highlighted.split('\n').map(line => ({ html: line, complete: true })),
            ]);
            
            lastLineCountRef.current = completeLines.length;
        }
        
        // Append in-progress line without highlighting:
        if (inProgressLine) {
            setHighlightedLines(prev => [
                ...prev.filter(l => l.complete),
                { html: escapeHtml(inProgressLine), complete: false },
            ]);
        }
    }, [code, isStreaming, language]);
    
    return (
        <pre className="bg-gray-900 rounded-lg p-4 overflow-x-auto">
            <code>
                {highlightedLines.map((line, i) => (
                    <div key={i} dangerouslySetInnerHTML={{ __html: line.html }} />
                ))}
                {isStreaming && <span className="animate-pulse">▋</span>}
            </code>
        </pre>
    );
}

5. Auto-Scrolling Behavior

/*
 * Auto-scroll is deceptively tricky:
 *
 * Rules:
 * 1. While streaming, auto-scroll to bottom (follow new content)
 * 2. If user manually scrolls UP, STOP auto-scrolling
 * 3. If user scrolls back to bottom, RESUME auto-scrolling
 * 4. New user message: always scroll to bottom
 *
 * The "scroll anchor" pattern:
 */

function useAutoScroll(containerRef, isStreaming) {
    const isUserScrolledUp = useRef(false);
    const lastScrollTop = useRef(0);
    
    // Detect if user scrolled up manually:
    useEffect(() => {
        const container = containerRef.current;
        if (!container) return;
        
        const handleScroll = () => {
            const { scrollTop, scrollHeight, clientHeight } = container;
            const distanceFromBottom = scrollHeight - scrollTop - clientHeight;
            
            // User is "at bottom" if within 50px:
            const atBottom = distanceFromBottom < 50;
            
            // Detect upward scroll (user action, not programmatic):
            if (scrollTop < lastScrollTop.current && !atBottom) {
                isUserScrolledUp.current = true;
            }
            
            if (atBottom) {
                isUserScrolledUp.current = false;
            }
            
            lastScrollTop.current = scrollTop;
        };
        
        container.addEventListener('scroll', handleScroll, { passive: true });
        return () => container.removeEventListener('scroll', handleScroll);
    }, []);
    
    // Auto-scroll to bottom when content changes:
    const scrollToBottom = useCallback((force = false) => {
        const container = containerRef.current;
        if (!container) return;
        
        if (force || !isUserScrolledUp.current) {
            // Use requestAnimationFrame for smooth scroll:
            requestAnimationFrame(() => {
                container.scrollTo({
                    top: container.scrollHeight,
                    behavior: force ? 'smooth' : 'instant',
                });
            });
        }
    }, []);
    
    return { scrollToBottom, isUserScrolledUp };
}

6. Tool Calls and Multi-Step Actions

/*
 * Modern LLMs can call tools during a conversation:
 *
 *   User: "What's the weather in Tokyo?"
 *       │
 *       ▼
 *   LLM: [tool_call: get_weather({city: "Tokyo"})]
 *       │
 *       ▼
 *   Frontend executes tool → gets result
 *       │
 *       ▼
 *   LLM resumes: "The weather in Tokyo is 22°C and sunny."
 *
 * The UI needs to show tool calls in progress:
 *
 *   ┌──────────────────────────┐
 *   │ 🔧 Searching weather...  │ ← tool call in progress
 *   │ ┌──────────────────────┐ │
 *   │ │ get_weather("Tokyo") │ │ ← collapsible detail
 *   │ │ Result: 22°C, Sunny  │ │
 *   │ └──────────────────────┘ │
 *   │                          │
 *   │ The weather in Tokyo is  │ ← response continues after
 *   │ 22°C and sunny today.    │
 *   └──────────────────────────┘
 */

function ToolCallMessage({ toolCall, result, isExecuting }) {
    const [isExpanded, setIsExpanded] = useState(false);
    
    return (
        <div className="border rounded-lg p-3 my-2 bg-gray-50">
            <button
                onClick={() => setIsExpanded(!isExpanded)}
                className="flex items-center gap-2 text-sm"
            >
                {isExecuting ? (
                    <Spinner className="w-4 h-4" />
                ) : (
                    <CheckIcon className="w-4 h-4 text-green-500" />
                )}
                <span className="font-medium">
                    {toolCall.function.name}
                </span>
                <ChevronIcon className={isExpanded ? 'rotate-180' : ''} />
            </button>
            
            {isExpanded && (
                <div className="mt-2 text-sm">
                    <div className="text-gray-500">Input:</div>
                    <pre className="bg-white p-2 rounded text-xs">
                        {JSON.stringify(
                            JSON.parse(toolCall.function.arguments), null, 2
                        )}
                    </pre>
                    {result && (
                        <>
                            <div className="text-gray-500 mt-2">Result:</div>
                            <pre className="bg-white p-2 rounded text-xs">
                                {JSON.stringify(result, null, 2)}
                            </pre>
                        </>
                    )}
                </div>
            )}
        </div>
    );
}

// Handling tool calls in the stream:
async function handleStreamWithToolCalls(messages, setMessages) {
    let assistantContent = '';
    let pendingToolCalls = [];
    
    await streamChat(
        messages,
        (chunk) => {
            if (chunk.type === 'tool_call') {
                pendingToolCalls.push(chunk.data);
            } else {
                assistantContent += chunk;
            }
        },
        async () => {
            // If there are tool calls, execute them:
            if (pendingToolCalls.length > 0) {
                const toolResults = await Promise.all(
                    pendingToolCalls.map(async (tc) => {
                        const result = await executeToolCall(tc);
                        return {
                            tool_call_id: tc.id,
                            role: 'tool',
                            content: JSON.stringify(result),
                        };
                    })
                );
                
                // Continue the conversation with tool results:
                const updatedMessages = [
                    ...messages,
                    { role: 'assistant', content: assistantContent, tool_calls: pendingToolCalls },
                    ...toolResults,
                ];
                
                // Stream the continuation:
                await handleStreamWithToolCalls(updatedMessages, setMessages);
            }
        }
    );
}

7. Input Area UX Patterns

/*
 * The chat input has many specific UX requirements:
 *
 * 1. Auto-resize textarea (grows with content)
 * 2. Submit on Enter, newline on Shift+Enter
 * 3. Paste image support
 * 4. Command shortcuts (/ for commands)
 * 5. @mentions for context
 * 6. Disable while streaming + show stop button
 */

function ChatInput({ onSend, isStreaming, onStop }) {
    const textareaRef = useRef(null);
    const [input, setInput] = useState('');
    const [attachments, setAttachments] = useState([]);
    
    // Auto-resize textarea:
    useEffect(() => {
        const textarea = textareaRef.current;
        if (!textarea) return;
        
        textarea.style.height = 'auto';
        textarea.style.height = Math.min(textarea.scrollHeight, 200) + 'px';
    }, [input]);
    
    const handleKeyDown = (e) => {
        if (e.key === 'Enter' && !e.shiftKey) {
            e.preventDefault();
            handleSend();
        }
    };
    
    const handleSend = () => {
        if (!input.trim() && attachments.length === 0) return;
        if (isStreaming) return;
        
        onSend({
            content: input.trim(),
            attachments,
        });
        
        setInput('');
        setAttachments([]);
        textareaRef.current?.focus();
    };
    
    // Handle paste (for images):
    const handlePaste = (e) => {
        const items = e.clipboardData?.items;
        if (!items) return;
        
        for (const item of items) {
            if (item.type.startsWith('image/')) {
                e.preventDefault();
                const file = item.getAsFile();
                const reader = new FileReader();
                reader.onload = () => {
                    setAttachments(prev => [...prev, {
                        type: 'image',
                        data: reader.result,
                        name: file.name || 'pasted-image.png',
                    }]);
                };
                reader.readAsDataURL(file);
            }
        }
    };
    
    return (
        <div className="border-t p-4">
            {/* Attachment previews */}
            {attachments.length > 0 && (
                <div className="flex gap-2 mb-2">
                    {attachments.map((att, i) => (
                        <div key={i} className="relative">
                            <img
                                src={att.data}
                                alt={att.name}
                                className="w-16 h-16 object-cover rounded"
                            />
                            <button
                                onClick={() => setAttachments(
                                    prev => prev.filter((_, j) => j !== i)
                                )}
                                className="absolute -top-1 -right-1 bg-red-500 text-white rounded-full w-4 h-4 text-xs"
                            >
                                ×
                            </button>
                        </div>
                    ))}
                </div>
            )}
            
            <div className="flex gap-2 items-end">
                <textarea
                    ref={textareaRef}
                    value={input}
                    onChange={(e) => setInput(e.target.value)}
                    onKeyDown={handleKeyDown}
                    onPaste={handlePaste}
                    placeholder="Type a message..."
                    className="flex-1 resize-none border rounded-lg p-3 max-h-48 focus:outline-none focus:ring-2"
                    rows={1}
                    disabled={isStreaming}
                />
                
                {isStreaming ? (
                    <button
                        onClick={onStop}
                        className="p-3 bg-red-500 text-white rounded-lg"
                    >
                        <StopIcon />
                    </button>
                ) : (
                    <button
                        onClick={handleSend}
                        disabled={!input.trim() && attachments.length === 0}
                        className="p-3 bg-blue-500 text-white rounded-lg disabled:opacity-50"
                    >
                        <SendIcon />
                    </button>
                )}
            </div>
        </div>
    );
}

8. Conversation Persistence and Branching

/*
 * Chat conversations need:
 * 1. Persistence (survive page refresh)
 * 2. History (list of past conversations)
 * 3. Branching (regenerate from any point)
 *
 * Data model:
 *
 *   Conversation
 *   ├── id: string
 *   ├── title: string (auto-generated from first message)
 *   ├── createdAt: Date
 *   ├── updatedAt: Date
 *   └── messages: Message[]
 *       ├── id: string
 *       ├── role: 'user' | 'assistant' | 'system' | 'tool'
 *       ├── content: string
 *       ├── parentId: string | null  ← for branching
 *       ├── children: string[]       ← alternate responses
 *       └── metadata: { model, tokens, latency }
 *
 * Branching allows:
 *   User message 1
 *       └── Assistant response 1a
 *           └── User message 2
 *               ├── Assistant response 2a (original)
 *               └── Assistant response 2b (regenerated)
 */

class ConversationStore {
    constructor() {
        this.db = null;
    }
    
    async init() {
        // IndexedDB for client-side persistence:
        this.db = await openDB('chat-app', 1, {
            upgrade(db) {
                const convStore = db.createObjectStore('conversations', {
                    keyPath: 'id',
                });
                convStore.createIndex('updatedAt', 'updatedAt');
                
                db.createObjectStore('messages', { keyPath: 'id' });
            },
        });
    }
    
    async saveMessage(conversationId, message) {
        const tx = this.db.transaction(
            ['conversations', 'messages'], 'readwrite'
        );
        
        await tx.objectStore('messages').put(message);
        
        // Update conversation's updatedAt:
        const conv = await tx.objectStore('conversations').get(conversationId);
        if (conv) {
            conv.updatedAt = Date.now();
            conv.messageCount = (conv.messageCount || 0) + 1;
            await tx.objectStore('conversations').put(conv);
        }
        
        await tx.done;
    }
    
    async regenerateFrom(conversationId, messageId) {
        // Get all messages after this point:
        const messages = await this.getMessages(conversationId);
        const messageIndex = messages.findIndex(m => m.id === messageId);
        
        // Keep messages up to (not including) the one being regenerated:
        const keptMessages = messages.slice(0, messageIndex);
        
        // Mark the regenerated message as having a sibling:
        const originalMessage = messages[messageIndex];
        originalMessage.hasSibling = true;
        
        return {
            keptMessages,
            originalMessage,
            // The caller will generate a new response and add it
            // as a sibling of originalMessage
        };
    }
}

9. Performance Optimization for Long Conversations

/*
 * Conversations can have 100+ messages with code blocks.
 * Rendering all of them causes jank.
 *
 * Optimizations:
 * 1. Virtualized message list (only render visible messages)
 * 2. Memoize parsed markdown (don't re-parse old messages)
 * 3. Lazy syntax highlighting (only highlight visible code blocks)
 * 4. Debounce streaming renders
 */

// 1. Virtualized message list:
function VirtualizedMessageList({ messages, isStreaming }) {
    const parentRef = useRef(null);
    
    const virtualizer = useVirtualizer({
        count: messages.length,
        getScrollElement: () => parentRef.current,
        estimateSize: (index) => {
            // Estimate message height based on content length:
            const msg = messages[index];
            const lines = msg.content.split('\n').length;
            const hasCode = msg.content.includes('```');
            return Math.max(60, lines * 24 + (hasCode ? 100 : 0));
        },
        overscan: 5,
    });
    
    return (
        <div ref={parentRef} className="overflow-auto h-full">
            <div
                style={{ height: virtualizer.getTotalSize(), position: 'relative' }}
            >
                {virtualizer.getVirtualItems().map((virtualItem) => (
                    <div
                        key={virtualItem.key}
                        style={{
                            position: 'absolute',
                            top: virtualItem.start,
                            width: '100%',
                        }}
                        ref={virtualizer.measureElement}
                        data-index={virtualItem.index}
                    >
                        <MemoizedMessage message={messages[virtualItem.index]} />
                    </div>
                ))}
            </div>
        </div>
    );
}

// 2. Memoize parsed markdown for completed messages:
const MemoizedMessage = memo(function Message({ message }) {
    // Only re-render when content changes (streaming) or isStreaming toggles:
    return (
        <div className={`p-4 ${message.role === 'user' ? 'bg-gray-50' : ''}`}>
            {message.isStreaming ? (
                <StreamingMarkdown content={message.content} />
            ) : (
                <ParsedMarkdown content={message.content} />
            )}
        </div>
    );
}, (prev, next) => {
    // Custom comparison: skip re-render if completed message unchanged
    return prev.message.id === next.message.id
        && prev.message.content === next.message.content
        && prev.message.isStreaming === next.message.isStreaming;
});

// 3. Debounce streaming renders (batch token updates):
function useStreamingContent() {
    const [displayContent, setDisplayContent] = useState('');
    const bufferRef = useRef('');
    const rafRef = useRef(null);
    
    const appendChunk = useCallback((chunk) => {
        bufferRef.current += chunk;
        
        // Batch updates using requestAnimationFrame:
        if (!rafRef.current) {
            rafRef.current = requestAnimationFrame(() => {
                setDisplayContent(bufferRef.current);
                rafRef.current = null;
            });
        }
    }, []);
    
    return { displayContent, appendChunk };
}

/*
 * Chat UIs must be accessible:
 *
 * 1. Screen readers announce new messages
 * 2. Keyboard navigation between messages
 * 3. Focus management (input stays focused)
 * 4. Reduced motion for streaming animation
 * 5. High contrast for code blocks
 */

function AccessibleChatMessage({ message, index }) {
    return (
        <div
            role="log"
            aria-label={`Message from ${message.role}`}
            aria-live={message.isStreaming ? 'polite' : 'off'}
            aria-atomic={!message.isStreaming}
            tabIndex={0}
            className="focus:outline-none focus:ring-2 focus:ring-blue-500 rounded"
        >
            {/* Screen reader label */}
            <span className="sr-only">
                {message.role === 'user' ? 'You said' : 'Assistant said'}:
            </span>
            
            <div aria-busy={message.isStreaming}>
                <MessageContent content={message.content} />
            </div>
            
            {/* Message actions with keyboard support */}
            <div role="toolbar" aria-label="Message actions">
                <button
                    aria-label="Copy message"
                    onClick={() => navigator.clipboard.writeText(message.content)}
                >
                    <CopyIcon />
                </button>
                
                {message.role === 'assistant' && (
                    <button aria-label="Regenerate response">
                        <RefreshIcon />
                    </button>
                )}
            </div>
        </div>
    );
}

// Live region for announcing new messages to screen readers:
function ChatLiveRegion({ messages }) {
    const lastMessage = messages[messages.length - 1];
    
    return (
        <div
            aria-live="polite"
            aria-atomic
            className="sr-only"
        >
            {lastMessage && !lastMessage.isStreaming && (
                <p>
                    {lastMessage.role === 'assistant'
                        ? `Assistant responded: ${lastMessage.content.substring(0, 200)}`
                        : `You sent: ${lastMessage.content.substring(0, 100)}`
                    }
                </p>
            )}
        </div>
    );
}

Trade-offs & Considerations

Aspect	SSE Streaming	WebSocket	Long Polling
Browser support	Universal	Universal	Universal
Reconnection	Built-in (EventSource)	Manual	Built-in
Bidirectional	No (server→client only)	Yes	No
Proxy/CDN compat	Good	Often blocked	Good
Token-by-token	Natural fit	Works	Inefficient
Connection limit	6 per domain (HTTP/1.1)	No limit	6 per domain
Cancellation	Close connection	Send message	Don't poll

Best Practices

Use fetch with ReadableStream instead of EventSource for SSE — EventSource doesn't support POST requests (needed to send message history) or custom headers (needed for auth tokens); use fetch with streaming response body and manually parse the SSE protocol; this gives full control over the request while maintaining the streaming behavior.
Batch streaming token updates using requestAnimationFrame — rendering on every single token (which can arrive at 50+ per second) causes unnecessary React re-renders and DOM thrashing; buffer incoming tokens and flush to state once per animation frame; this reduces render frequency from 50/s to ~60fps-matched updates with no visible difference to the user.
Split markdown parsing at "safe points" (paragraph breaks, complete code blocks) — don't re-parse the entire message content on every token; find the last safe boundary (double newline, closing ```, end of list) and only parse up to that point; render the in-progress tail as plain text; this prevents markdown syntax from flickering as it arrives character-by-character.
Implement auto-scroll that respects user intent — auto-scroll to follow streaming content, but immediately stop if the user scrolls up to read earlier messages; resume auto-scrolling only when the user scrolls back to the bottom; use a 50px threshold from the bottom to determine the "at bottom" state; this simple heuristic prevents the most frustrating UX issue in chat interfaces.
Store conversations in IndexedDB with message-level granularity for branching — persist each message individually with a parentId for tree-structured conversations; this enables regeneration (create a sibling response), editing (fork from any user message), and efficient updates (save one message, not the entire conversation); use the conversation store for offline support and instant load on revisit.

Conclusion

Building AI chat UIs requires solving several frontend-specific challenges: streaming token-by-token via SSE (parsed from fetch ReadableStream, not EventSource, to support POST and auth headers), incremental markdown rendering (split at safe boundaries to avoid mid-syntax flicker), complex state management (optimistic user messages, streaming assistant messages, tool call execution, abort/retry), auto-scroll behavior (follow streaming content but respect user scrolling up), input UX (auto-resize textarea, Enter to send, Shift+Enter for newline, paste image support), tool call visualization (collapsible inline cards showing tool input/output), conversation persistence (IndexedDB with message-level storage for branching/regeneration), and performance optimization (virtualized message list, memoized parsed markdown, requestAnimationFrame batching for streaming renders). The key UX difference between a good and bad chat UI is in these details: smooth streaming without flicker, auto-scroll that respects intent, responsive input area, and clear visual hierarchy for different message types (user, assistant, tool calls, errors).

What did you think?