WebSocket Protocol Internals: From HTTP Upgrade to Full-Duplex Binary Frames
WebSocket Protocol Internals: From HTTP Upgrade to Full-Duplex Binary Frames
HTTP is request-response: the client asks, the server answers, the connection idles. WebSocket breaks this model — after an HTTP handshake upgrade, the connection becomes a persistent, full-duplex, message-oriented channel where either side can send at any time. This post implements the complete WebSocket protocol stack in TypeScript, from the opening handshake through frame parsing, masking, fragmentation, control frames, extension negotiation, and backpressure management.
The WebSocket Handshake
┌─────────────────────────────────────────────────────────────────────────────┐
│ WEBSOCKET OPENING HANDSHAKE │
│ │
│ Client sends HTTP Upgrade request: │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ GET /chat HTTP/1.1 │ │
│ │ Host: server.example.com │ │
│ │ Upgrade: websocket │ │
│ │ Connection: Upgrade │ │
│ │ Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ== ← 16-byte nonce │ │
│ │ Sec-WebSocket-Version: 13 │ │
│ │ Sec-WebSocket-Protocol: chat, superchat ← subprotocols │ │
│ │ Sec-WebSocket-Extensions: permessage-deflate ← compression │ │
│ │ Origin: http://example.com │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
│ │
│ Server responds with 101 Switching Protocols: │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ HTTP/1.1 101 Switching Protocols │ │
│ │ Upgrade: websocket │ │
│ │ Connection: Upgrade │ │
│ │ Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo= ← SHA1 hash │ │
│ │ Sec-WebSocket-Protocol: chat ← selected │ │
│ │ Sec-WebSocket-Extensions: permessage-deflate │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
│ │
│ Accept = Base64(SHA1(Key + "258EAFA5-E914-47DA-95CA-C5AB0DC85B11")) │
│ This proves the server understood the WebSocket request (not a random │
│ HTTP server accidentally accepting the Upgrade header). │
│ │
│ After 101: TCP connection is now WebSocket — raw frames, no HTTP. │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
import { createHash, randomBytes } from 'crypto';
const WEBSOCKET_GUID = '258EAFA5-E914-47DA-95CA-C5AB0DC85B11';
class WebSocketHandshake {
// Client: generate the upgrade request
static createClientRequest(host: string, path: string, protocols: string[] = []): {
request: string;
key: string;
} {
// Generate 16-byte random nonce, base64 encode
const keyBytes = randomBytes(16);
const key = keyBytes.toString('base64');
let request = `GET ${path} HTTP/1.1\r\n`;
request += `Host: ${host}\r\n`;
request += `Upgrade: websocket\r\n`;
request += `Connection: Upgrade\r\n`;
request += `Sec-WebSocket-Key: ${key}\r\n`;
request += `Sec-WebSocket-Version: 13\r\n`;
if (protocols.length > 0) {
request += `Sec-WebSocket-Protocol: ${protocols.join(', ')}\r\n`;
}
request += `\r\n`;
return { request, key };
}
// Server: validate request and generate response
static createServerResponse(clientKey: string, selectedProtocol?: string): string {
// The accept value proves the server understands WebSocket
const acceptValue = createHash('sha1')
.update(clientKey + WEBSOCKET_GUID)
.digest('base64');
let response = `HTTP/1.1 101 Switching Protocols\r\n`;
response += `Upgrade: websocket\r\n`;
response += `Connection: Upgrade\r\n`;
response += `Sec-WebSocket-Accept: ${acceptValue}\r\n`;
if (selectedProtocol) {
response += `Sec-WebSocket-Protocol: ${selectedProtocol}\r\n`;
}
response += `\r\n`;
return response;
}
// Client: verify server's accept header
static validateServerAccept(clientKey: string, serverAccept: string): boolean {
const expected = createHash('sha1')
.update(clientKey + WEBSOCKET_GUID)
.digest('base64');
return expected === serverAccept;
}
}
WebSocket Frame Format
┌─────────────────────────────────────────────────────────────────────────────┐
│ WEBSOCKET FRAME WIRE FORMAT (RFC 6455) │
│ │
│ 0 1 2 3 │
│ 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 │
│ +-+-+-+-+-------+-+-------------+-------------------------------+ │
│ |F|R|R|R| opcode|M| Payload len | Extended payload length | │
│ |I|S|S|S| (4) |A| (7) | (16/64 bits) | │
│ |N|V|V|V| |S| | (if payload len == 126/127) | │
│ | |1|2|3| |K| | | │
│ +-+-+-+-+-------+-+-------------+-------------------------------+ │
│ | Extended payload length continued, if payload len == 127 | │
│ +-------------------------------+-------------------------------+ │
│ | |Masking-key, if MASK set to 1 | │
│ +-------------------------------+-------------------------------+ │
│ | Masking-key (continued) | Payload Data | │
│ +-------------------------------+-------------------------------+ │
│ : Payload Data continued ... : │
│ +---------------------------------------------------------------+ │
│ │
│ FIN (1 bit): 1 = final fragment of a message │
│ RSV1-3 (1 bit each): reserved for extensions (e.g., compression) │
│ Opcode (4 bits): │
│ 0x0 = continuation frame │
│ 0x1 = text frame (UTF-8) │
│ 0x2 = binary frame │
│ 0x8 = connection close │
│ 0x9 = ping │
│ 0xA = pong │
│ MASK (1 bit): 1 = payload is masked (REQUIRED for client→server) │
│ Payload length: │
│ 0-125: actual length │
│ 126: next 2 bytes are the length (uint16) │
│ 127: next 8 bytes are the length (uint64) │
│ Masking key (4 bytes): XOR mask for payload (if MASK=1) │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
enum WebSocketOpcode {
Continuation = 0x0,
Text = 0x1,
Binary = 0x2,
Close = 0x8,
Ping = 0x9,
Pong = 0xA,
}
interface WebSocketFrame {
fin: boolean; // Is this the final fragment?
rsv1: boolean; // Reserved (used by permessage-deflate)
rsv2: boolean;
rsv3: boolean;
opcode: WebSocketOpcode;
masked: boolean; // Is payload masked?
maskingKey: Uint8Array | null; // 4-byte XOR key
payload: Uint8Array; // The actual data
}
class WebSocketFrameParser {
private buffer: Uint8Array = new Uint8Array(0);
private state: 'header' | 'extended-length' | 'mask' | 'payload' = 'header';
// Partial frame being assembled
private currentFrame: Partial<WebSocketFrame> = {};
private payloadLength: number = 0;
private headerSize: number = 0;
// Feed raw bytes from TCP
feed(data: Uint8Array): WebSocketFrame[] {
// Append to buffer
const combined = new Uint8Array(this.buffer.length + data.length);
combined.set(this.buffer);
combined.set(data, this.buffer.length);
this.buffer = combined;
const frames: WebSocketFrame[] = [];
while (this.buffer.length > 0) {
const frame = this.tryParseFrame();
if (!frame) break; // Need more data
frames.push(frame);
}
return frames;
}
private tryParseFrame(): WebSocketFrame | null {
if (this.buffer.length < 2) return null; // Need at least 2 bytes for header
let offset = 0;
// Byte 0: FIN, RSV1-3, Opcode
const byte0 = this.buffer[0];
const fin = !!(byte0 & 0x80);
const rsv1 = !!(byte0 & 0x40);
const rsv2 = !!(byte0 & 0x20);
const rsv3 = !!(byte0 & 0x10);
const opcode = byte0 & 0x0F;
// Byte 1: MASK, Payload length
const byte1 = this.buffer[1];
const masked = !!(byte1 & 0x80);
let payloadLen = byte1 & 0x7F;
offset = 2;
// Extended payload length
if (payloadLen === 126) {
if (this.buffer.length < 4) return null;
payloadLen = (this.buffer[2] << 8) | this.buffer[3];
offset = 4;
} else if (payloadLen === 127) {
if (this.buffer.length < 10) return null;
// 8-byte length (we only use lower 32 bits for safety)
const view = new DataView(this.buffer.buffer, this.buffer.byteOffset);
payloadLen = Number(view.getBigUint64(2));
offset = 10;
// Security: reject frames claiming > 1GB payload
if (payloadLen > 1024 * 1024 * 1024) {
throw new Error('Frame payload too large');
}
}
// Masking key (4 bytes)
let maskingKey: Uint8Array | null = null;
if (masked) {
if (this.buffer.length < offset + 4) return null;
maskingKey = this.buffer.slice(offset, offset + 4);
offset += 4;
}
// Payload data
if (this.buffer.length < offset + payloadLen) return null;
let payload = this.buffer.slice(offset, offset + payloadLen);
// Unmask payload
if (masked && maskingKey) {
payload = this.unmask(payload, maskingKey);
}
// Consume parsed bytes from buffer
this.buffer = this.buffer.slice(offset + payloadLen);
return {
fin,
rsv1,
rsv2,
rsv3,
opcode: opcode as WebSocketOpcode,
masked,
maskingKey,
payload,
};
}
// XOR masking/unmasking (same operation — XOR is its own inverse)
private unmask(data: Uint8Array, key: Uint8Array): Uint8Array {
const result = new Uint8Array(data.length);
// Optimize: process 4 bytes at a time where possible
const len = data.length;
let i = 0;
// Process 4-byte chunks
for (; i + 3 < len; i += 4) {
result[i] = data[i] ^ key[0];
result[i + 1] = data[i + 1] ^ key[1];
result[i + 2] = data[i + 2] ^ key[2];
result[i + 3] = data[i + 3] ^ key[3];
}
// Process remaining bytes
for (; i < len; i++) {
result[i] = data[i] ^ key[i & 3]; // i % 4, but faster
}
return result;
}
}
class WebSocketFrameBuilder {
// Build a frame for sending
static build(opcode: WebSocketOpcode, payload: Uint8Array, options: {
fin?: boolean;
masked?: boolean;
rsv1?: boolean;
} = {}): Uint8Array {
const { fin = true, masked = false, rsv1 = false } = options;
// Calculate header size
let headerSize = 2;
if (payload.length > 65535) headerSize += 8;
else if (payload.length > 125) headerSize += 2;
if (masked) headerSize += 4;
const frame = new Uint8Array(headerSize + payload.length);
let offset = 0;
// Byte 0: FIN, RSV, Opcode
frame[0] = (fin ? 0x80 : 0) | (rsv1 ? 0x40 : 0) | opcode;
offset = 1;
// Byte 1: MASK, Payload length
let lenByte = masked ? 0x80 : 0;
if (payload.length > 65535) {
lenByte |= 127;
frame[offset++] = lenByte;
const view = new DataView(frame.buffer);
view.setBigUint64(offset, BigInt(payload.length));
offset += 8;
} else if (payload.length > 125) {
lenByte |= 126;
frame[offset++] = lenByte;
frame[offset++] = (payload.length >> 8) & 0xFF;
frame[offset++] = payload.length & 0xFF;
} else {
lenByte |= payload.length;
frame[offset++] = lenByte;
}
// Masking key
if (masked) {
const key = randomBytes(4);
frame.set(key, offset);
offset += 4;
// Mask payload
for (let i = 0; i < payload.length; i++) {
frame[offset + i] = payload[i] ^ key[i & 3];
}
} else {
frame.set(payload, offset);
}
return frame;
}
static text(message: string, masked: boolean = false): Uint8Array {
const encoder = new TextEncoder();
return WebSocketFrameBuilder.build(
WebSocketOpcode.Text,
encoder.encode(message),
{ masked }
);
}
static binary(data: Uint8Array, masked: boolean = false): Uint8Array {
return WebSocketFrameBuilder.build(WebSocketOpcode.Binary, data, { masked });
}
static ping(data: Uint8Array = new Uint8Array(0)): Uint8Array {
return WebSocketFrameBuilder.build(WebSocketOpcode.Ping, data);
}
static pong(data: Uint8Array = new Uint8Array(0)): Uint8Array {
return WebSocketFrameBuilder.build(WebSocketOpcode.Pong, data);
}
static close(code: number = 1000, reason: string = ''): Uint8Array {
const encoder = new TextEncoder();
const reasonBytes = encoder.encode(reason);
const payload = new Uint8Array(2 + reasonBytes.length);
payload[0] = (code >> 8) & 0xFF;
payload[1] = code & 0xFF;
payload.set(reasonBytes, 2);
return WebSocketFrameBuilder.build(WebSocketOpcode.Close, payload);
}
}
Message Fragmentation
┌─────────────────────────────────────────────────────────────────────────────┐
│ MESSAGE FRAGMENTATION │
│ │
│ A single WebSocket message can be split across multiple frames: │
│ │
│ Full message: "Hello, World!" (13 bytes) │
│ │
│ Frame 1: FIN=0, opcode=0x1(text), payload="Hello" ← First fragment │
│ Frame 2: FIN=0, opcode=0x0(cont), payload=", Wor" ← Continuation │
│ Frame 3: FIN=1, opcode=0x0(cont), payload="ld!" ← Final fragment │
│ │
│ Control frames (ping/pong/close) MAY be interleaved: │
│ │
│ Frame 1: FIN=0, opcode=0x1(text), "Hello" │
│ Frame *: FIN=1, opcode=0x9(ping), "keepalive" ← Control frames are │
│ Frame 2: FIN=0, opcode=0x0(cont), ", Wor" always single-frame │
│ Frame *: FIN=1, opcode=0xA(pong), "keepalive" ← Pong response │
│ Frame 3: FIN=1, opcode=0x0(cont), "ld!" │
│ │
│ Rules: │
│ • First fragment: FIN=0, opcode=text/binary │
│ • Continuation: FIN=0, opcode=0x0 │
│ • Final: FIN=1, opcode=0x0 │
│ • Control frames: MUST have FIN=1, MUST NOT be fragmented │
│ • Control frame payload: max 125 bytes │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
// Reassemble fragmented messages
class MessageAssembler {
private fragments: Uint8Array[] = [];
private messageOpcode: WebSocketOpcode | null = null;
// Returns complete messages (may return 0 or 1 per call)
processFrame(frame: WebSocketFrame): {
type: 'message' | 'control';
opcode: WebSocketOpcode;
data: Uint8Array;
} | null {
// Control frames are never fragmented — handle immediately
if (frame.opcode >= 0x8) {
if (!frame.fin) {
throw new Error('Control frames must not be fragmented');
}
if (frame.payload.length > 125) {
throw new Error('Control frame payload must be <= 125 bytes');
}
return { type: 'control', opcode: frame.opcode, data: frame.payload };
}
// Data frame handling
if (frame.opcode !== WebSocketOpcode.Continuation) {
// Start of new message
if (this.fragments.length > 0) {
throw new Error('New message started before previous completed');
}
this.messageOpcode = frame.opcode;
this.fragments.push(frame.payload);
} else {
// Continuation frame
if (this.fragments.length === 0) {
throw new Error('Continuation frame without start frame');
}
this.fragments.push(frame.payload);
}
if (frame.fin) {
// Message complete — concatenate all fragments
const totalLength = this.fragments.reduce((sum, f) => sum + f.length, 0);
const message = new Uint8Array(totalLength);
let offset = 0;
for (const frag of this.fragments) {
message.set(frag, offset);
offset += frag.length;
}
const opcode = this.messageOpcode!;
this.fragments = [];
this.messageOpcode = null;
return { type: 'message', opcode, data: message };
}
return null; // Need more fragments
}
}
Close Handshake
// Close status codes (RFC 6455 Section 7.4.1)
enum CloseCode {
Normal = 1000, // Normal closure
GoingAway = 1001, // Server shutting down / browser navigating away
ProtocolError = 1002, // Protocol error
UnsupportedData = 1003, // Received unacceptable data type
NoStatusReceived = 1005, // Expected status code but didn't get one (internal)
AbnormalClosure = 1006, // No close frame received (internal)
InvalidPayload = 1007, // Payload data not consistent with type (bad UTF-8)
PolicyViolation = 1008, // Generic policy violation
MessageTooBig = 1009, // Message too large to process
MandatoryExtension = 1010, // Client expected extension negotiation
InternalError = 1011, // Unexpected server error
TLSHandshakeFailed = 1015, // TLS handshake failure (internal, never sent)
}
class CloseHandshake {
private closeSent: boolean = false;
private closeReceived: boolean = false;
private closeTimer: ReturnType<typeof setTimeout> | null = null;
// Initiate close
initiateClose(code: CloseCode = CloseCode.Normal, reason: string = ''): Uint8Array {
this.closeSent = true;
// Start close timer — if peer doesn't respond, force close TCP
this.closeTimer = setTimeout(() => {
// Force close the TCP connection
}, 5000);
return WebSocketFrameBuilder.close(code, reason);
}
// Handle received close frame
handleClose(payload: Uint8Array): {
code: number;
reason: string;
shouldRespond: boolean;
connectionClosed: boolean;
} {
this.closeReceived = true;
let code = CloseCode.NoStatusReceived;
let reason = '';
if (payload.length >= 2) {
code = (payload[0] << 8) | payload[1];
if (payload.length > 2) {
reason = new TextDecoder().decode(payload.slice(2));
}
}
// If WE sent the close first, this is the response — connection closed
if (this.closeSent) {
if (this.closeTimer) clearTimeout(this.closeTimer);
return { code, reason, shouldRespond: false, connectionClosed: true };
}
// If THEY sent close first, we must respond with our own close frame
return { code, reason, shouldRespond: true, connectionClosed: false };
}
}
Full WebSocket Connection
type WebSocketState = 'connecting' | 'open' | 'closing' | 'closed';
interface WebSocketEvents {
onOpen?: () => void;
onMessage?: (data: string | Uint8Array) => void;
onClose?: (code: number, reason: string) => void;
onPing?: (data: Uint8Array) => void;
onError?: (error: Error) => void;
}
class WebSocketConnection {
private state: WebSocketState = 'connecting';
private parser: WebSocketFrameParser = new WebSocketFrameParser();
private assembler: MessageAssembler = new MessageAssembler();
private closeHandshake: CloseHandshake = new CloseHandshake();
private events: WebSocketEvents;
private isClient: boolean;
private sendQueue: Uint8Array[] = [];
private pingTimer: ReturnType<typeof setInterval> | null = null;
private lastPongTime: number = Date.now();
constructor(isClient: boolean, events: WebSocketEvents) {
this.isClient = isClient;
this.events = events;
}
// Called when handshake is complete
onHandshakeComplete(): void {
this.state = 'open';
// Start ping/pong heartbeat (server-side typically)
if (!this.isClient) {
this.pingTimer = setInterval(() => {
if (Date.now() - this.lastPongTime > 60000) {
// No pong in 60s — connection dead
this.close(CloseCode.GoingAway, 'Ping timeout');
return;
}
this.sendFrame(WebSocketFrameBuilder.ping());
}, 30000);
}
this.events.onOpen?.();
}
// Process raw TCP data
onTCPData(data: Uint8Array): void {
let frames: WebSocketFrame[];
try {
frames = this.parser.feed(data);
} catch (err) {
this.close(CloseCode.ProtocolError, 'Frame parse error');
return;
}
for (const frame of frames) {
this.processFrame(frame);
}
}
private processFrame(frame: WebSocketFrame): void {
// Clients MUST mask frames. Servers MUST NOT mask frames.
if (frame.masked === this.isClient) {
// Wrong: client received masked frame, or server received unmasked
this.close(CloseCode.ProtocolError, 'Masking violation');
return;
}
const result = this.assembler.processFrame(frame);
if (!result) return; // Fragment received, waiting for more
if (result.type === 'control') {
this.handleControl(result.opcode, result.data);
} else {
this.handleMessage(result.opcode, result.data);
}
}
private handleMessage(opcode: WebSocketOpcode, data: Uint8Array): void {
if (opcode === WebSocketOpcode.Text) {
// Validate UTF-8
const decoder = new TextDecoder('utf-8', { fatal: true });
try {
const text = decoder.decode(data);
this.events.onMessage?.(text);
} catch {
this.close(CloseCode.InvalidPayload, 'Invalid UTF-8');
}
} else {
this.events.onMessage?.(data);
}
}
private handleControl(opcode: WebSocketOpcode, data: Uint8Array): void {
switch (opcode) {
case WebSocketOpcode.Ping:
this.events.onPing?.(data);
// MUST respond with pong containing same payload
this.sendFrame(WebSocketFrameBuilder.pong(data));
break;
case WebSocketOpcode.Pong:
this.lastPongTime = Date.now();
break;
case WebSocketOpcode.Close: {
const result = this.closeHandshake.handleClose(data);
if (result.shouldRespond) {
// Echo close frame back
this.sendFrame(WebSocketFrameBuilder.close(result.code, result.reason));
}
this.state = 'closed';
if (this.pingTimer) clearInterval(this.pingTimer);
this.events.onClose?.(result.code, result.reason);
break;
}
}
}
// Public API
send(data: string): void;
send(data: Uint8Array): void;
send(data: string | Uint8Array): void {
if (this.state !== 'open') {
throw new Error(`Cannot send in state: ${this.state}`);
}
const frame = typeof data === 'string'
? WebSocketFrameBuilder.text(data, this.isClient)
: WebSocketFrameBuilder.binary(data, this.isClient);
this.sendFrame(frame);
}
// Send large messages as fragments
sendFragmented(data: Uint8Array, opcode: WebSocketOpcode, chunkSize: number = 16384): void {
if (this.state !== 'open') throw new Error(`Cannot send in state: ${this.state}`);
let offset = 0;
let isFirst = true;
while (offset < data.length) {
const end = Math.min(offset + chunkSize, data.length);
const chunk = data.slice(offset, end);
const isFinal = end >= data.length;
const frame = WebSocketFrameBuilder.build(
isFirst ? opcode : WebSocketOpcode.Continuation,
chunk,
{ fin: isFinal, masked: this.isClient }
);
this.sendFrame(frame);
offset = end;
isFirst = false;
}
}
close(code: CloseCode = CloseCode.Normal, reason: string = ''): void {
if (this.state !== 'open') return;
this.state = 'closing';
const frame = this.closeHandshake.initiateClose(code, reason);
this.sendFrame(frame);
}
private sendFrame(frame: Uint8Array): void {
// In real implementation: write to TCP socket
this.sendQueue.push(frame);
}
// Drain send queue — returns frames to write to TCP
drain(): Uint8Array[] {
const frames = this.sendQueue;
this.sendQueue = [];
return frames;
}
}
Per-Message Deflate Extension
// permessage-deflate compresses message payloads using zlib DEFLATE
// Negotiated during handshake:
// Client: Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits
// Server: Sec-WebSocket-Extensions: permessage-deflate; server_max_window_bits=15
interface DeflateConfig {
serverMaxWindowBits: number; // 8-15, default 15 (32KB window)
clientMaxWindowBits: number; // 8-15, default 15
serverNoContextTakeover: boolean; // Reset deflate state between messages
clientNoContextTakeover: boolean;
}
class PerMessageDeflate {
private config: DeflateConfig;
constructor(config: Partial<DeflateConfig> = {}) {
this.config = {
serverMaxWindowBits: config.serverMaxWindowBits ?? 15,
clientMaxWindowBits: config.clientMaxWindowBits ?? 15,
serverNoContextTakeover: config.serverNoContextTakeover ?? false,
clientNoContextTakeover: config.clientNoContextTakeover ?? false,
};
}
// Compress a message payload
compress(data: Uint8Array): { compressed: Uint8Array; rsv1: boolean } {
if (data.length < 128) {
// Don't compress small messages — overhead exceeds savings
return { compressed: data, rsv1: false };
}
// In real implementation: use zlib.deflateRawSync(data)
// Then strip trailing 0x00 0x00 0xFF 0xFF (SYNC flush marker)
// Set RSV1=1 to indicate compression
return { compressed: data, rsv1: true }; // Simplified
}
// Decompress a message payload (if RSV1 is set)
decompress(data: Uint8Array, rsv1: boolean): Uint8Array {
if (!rsv1) return data; // Not compressed
// Append 0x00 0x00 0xFF 0xFF (stripped during compression)
// Then inflate using zlib.inflateRawSync()
return data; // Simplified
}
static negotiateExtension(clientOffer: string): string | null {
// Parse client's extension offer
if (!clientOffer.includes('permessage-deflate')) return null;
// Accept with server preferences
return 'permessage-deflate; server_max_window_bits=15; client_max_window_bits=15';
}
}
Backpressure & Flow Control
// WebSocket has no built-in flow control — TCP handles it at transport layer
// But application-level backpressure is critical for high-throughput
class WebSocketWithBackpressure {
private connection: WebSocketConnection;
private bufferedAmount: number = 0;
private readonly highWaterMark: number;
private readonly lowWaterMark: number;
private paused: boolean = false;
private drainCallbacks: (() => void)[] = [];
constructor(
connection: WebSocketConnection,
highWaterMark: number = 16 * 1024 * 1024, // 16MB
lowWaterMark: number = 1 * 1024 * 1024, // 1MB
) {
this.connection = connection;
this.highWaterMark = highWaterMark;
this.lowWaterMark = lowWaterMark;
}
send(data: string | Uint8Array): boolean {
const size = typeof data === 'string' ? data.length * 2 : data.length;
this.bufferedAmount += size;
this.connection.send(data as any);
// Return false if buffer is full (caller should wait for drain)
if (this.bufferedAmount >= this.highWaterMark) {
this.paused = true;
return false;
}
return true;
}
// Called when TCP write completes (data actually sent)
onWriteComplete(bytesWritten: number): void {
this.bufferedAmount -= bytesWritten;
if (this.paused && this.bufferedAmount <= this.lowWaterMark) {
this.paused = false;
// Signal that writing can resume
const callbacks = this.drainCallbacks;
this.drainCallbacks = [];
for (const cb of callbacks) cb();
}
}
// Wait for buffer to drain below low water mark
waitForDrain(): Promise<void> {
if (!this.paused) return Promise.resolve();
return new Promise(resolve => {
this.drainCallbacks.push(resolve);
});
}
get isPaused(): boolean { return this.paused; }
get buffered(): number { return this.bufferedAmount; }
}
Comparison: WebSocket vs Alternatives
| Aspect | WebSocket | SSE (Server-Sent Events) | HTTP Long Polling | HTTP/2 Server Push | gRPC-Web Streaming |
|---|---|---|---|---|---|
| Direction | Full-duplex | Server → Client only | Simulated duplex | Server → Client | Bidirectional |
| Protocol | ws:// / wss:// | HTTP | HTTP | HTTP/2 | HTTP/2 |
| Binary Support | Yes (opcode 0x2) | No (text only) | Yes | Yes | Yes (protobuf) |
| Overhead per Message | 2-14 bytes header | ~50 bytes (field: data\n) | Full HTTP headers | HTTP/2 header | HTTP/2 header + protobuf envelope |
| Reconnection | Manual | Built-in (EventSource) | Manual | N/A | Manual |
| Proxy Support | Varies (needs upgrade) | Excellent (plain HTTP) | Excellent | Good (HTTP/2) | Good (HTTP/2) |
| Browser Support | Universal | Universal (except IE) | Universal | Limited | Via grpc-web proxy |
| Connection Limit | Same TCP socket | 6 per domain (HTTP/1.1) | 6 per domain | Multiplexed | Multiplexed |
| Best For | Real-time bidirectional | Notifications, feeds | Legacy compat | Preloading resources | Type-safe RPCs |
Interview Q&A
Q1: Why does WebSocket require client-to-server masking? What security problem does it solve?
Client-to-server masking protects against cache poisoning attacks on transparent proxies. Without masking, a malicious client could craft WebSocket frames that look like valid HTTP responses when observed by an intermediary proxy. The attack works like this: (1) Attacker controls a WebSocket client and a cooperating server. (2) Client sends a WebSocket data frame whose payload contains a crafted HTTP response (e.g., HTTP/1.1 200 OK\r\n... with malicious JavaScript). (3) A transparent proxy between client and server doesn't understand WebSocket — it sees raw bytes flowing through what it thinks is an HTTP connection. (4) The proxy might cache this "response" and serve it to other users requesting the same URL. Masking prevents this: every client-to-server frame applies a 4-byte XOR key (randomly chosen per frame) to the payload. The proxy sees random-looking bytes that don't match any HTTP pattern. The server knows the mask key (it's in the frame header) and unmasks. Critically, the mask key is randomly chosen by the client for each frame — an attacker can't predict what the XOR'd output will look like to the proxy. Server-to-client frames are NOT masked because the attacker already controls the client and can read whatever arrives.
Q2: Explain WebSocket message fragmentation. When and why would you use it?
WebSocket supports splitting a single message across multiple frames using the FIN bit and the continuation opcode (0x0). The first frame has FIN=0 and opcode=Text/Binary. Subsequent frames have opcode=Continuation with FIN=0. The final frame has opcode=Continuation with FIN=1. The receiver concatenates all fragment payloads to reconstruct the complete message. Use cases: (1) Streaming — send data as it becomes available without waiting for the full message (e.g., a large file transfer or real-time audio chunks). (2) Memory management — avoid buffering an entire large message in memory before sending; write chunks as they're produced. (3) Interleaving control frames — while a fragmented data message is in progress, control frames (ping/pong/close) can be inserted between fragments. This ensures heartbeat pings aren't blocked by a long data transfer. (4) Proxies — intermediaries can forward fragments as they arrive without buffering the full message. Control frames have strict rules: they're always single-frame (FIN must be 1) and payloads must be ≤ 125 bytes.
Q3: How does the WebSocket close handshake work, and why is it a two-phase process?
The close handshake ensures both sides agree the connection is ending and have a chance to finish sending data. Phase 1 (initiator): The closing side sends a Close control frame (opcode 0x8) containing a 2-byte status code and optional UTF-8 reason string. After sending, it MUST NOT send any more data frames. It enters the "closing" state and starts a timer. Phase 2 (responder): The other side receives the Close frame, sends back its own Close frame (echoing the code or sending its own), and closes its half of the TCP connection. The initiator receives the Close response and closes TCP. If the responder doesn't reply within the timeout (typically 5-30 seconds), the initiator force-closes TCP. Status codes are meaningful: 1000 = normal, 1001 = going away (server shutdown, page navigation), 1002 = protocol error, 1008 = policy violation, 1009 = message too big, 1011 = server error. Codes 1005 (no status received) and 1006 (abnormal closure — TCP dropped) are never sent on the wire; they're internal sentinel values for APIs.
Q4: What is permessage-deflate, and what are the trade-offs of enabling it?
permessage-deflate is the WebSocket compression extension (RFC 7692). It compresses each message's payload using the DEFLATE algorithm (same as gzip, without the gzip header). It's negotiated during the handshake: the client offers Sec-WebSocket-Extensions: permessage-deflate, the server accepts or rejects. Parameters control window size (8-15, where 15 = 32KB dictionary) and context takeover (whether the compression dictionary persists across messages). Trade-offs — Benefits: reduces bandwidth 60-80% for text-heavy payloads (JSON, XML, chat messages). Costs: (1) CPU overhead — compression/decompression on every message. For a server handling 100K connections, this can be a significant CPU burden. (2) Memory — each connection with context takeover maintains a 32KB compression dictionary (both directions), so 100K connections × 64KB = 6.4GB just for compression state. (3) Latency — compression adds ~0.1-1ms per message. (4) Diminishing returns for binary data — pre-compressed data (images, protobuf) won't shrink further. Practical advice: enable for text-heavy protocols (JSON APIs, chat), disable for binary protocols or when server CPU is the bottleneck. Use server_no_context_takeover to trade compression ratio for memory savings.
Q5: How does WebSocket handle backpressure, and what happens when a client can't keep up with the server?
WebSocket itself has no application-level flow control — it relies entirely on TCP's flow control. When the server sends faster than the client can process, TCP's receive window fills up on the client side. The client's OS advertises a smaller TCP window, which causes TCP to slow down the sender — the server's write() calls start blocking or returning partial writes. This is TCP backpressure. However, there's a gap: if the application buffers outgoing messages in memory before they reach TCP, the server can accumulate massive memory usage before TCP backpressure kicks in. This is why application-level backpressure is critical: (1) Monitor the write buffer size (bufferedAmount in browser WebSocket API). (2) Implement high-water and low-water marks — when buffered data exceeds the high-water mark, pause message production; when it drops below the low-water mark, resume. (3) In Node.js, the ws library returns false from send() when the internal buffer is full and emits a drain event when it's ready for more data. Without application-level backpressure, a slow client can cause the server to buffer gigabytes of messages in memory, leading to OOM crashes.
Key Takeaways
-
WebSocket is not HTTP — after the upgrade handshake, the connection switches to a binary frame protocol with 2-14 byte headers, eliminating HTTP's per-request overhead.
-
Client-to-server masking prevents cache poisoning — the 4-byte XOR mask makes frames unrecognizable to confused HTTP proxies, preventing attackers from injecting fake cached responses.
-
Control frames (ping/pong/close) can interleave with data fragments — this ensures heartbeats and graceful shutdown work even during large message transfers.
-
The close handshake is bidirectional — both sides must send and receive Close frames before TCP is closed, with a timeout to handle unresponsive peers.
-
permessage-deflate saves bandwidth but costs CPU and memory — each connection with context takeover maintains a 32KB compression dictionary per direction. Disable for binary protocols.
-
WebSocket has no application-level flow control — TCP backpressure works but has a gap. Monitor
bufferedAmountand implement high/low water marks to prevent OOM. -
Payload length encoding uses 7, 16, or 64 bits — messages up to 125 bytes have only 2 bytes of header overhead. Up to 64KB: 4 bytes. Beyond: 10 bytes.
-
Text frames must be valid UTF-8 — the server must validate and close with 1007 (Invalid Payload) if UTF-8 decoding fails. Binary frames have no such constraint.
-
Fragmentation enables streaming without buffering — senders can push data as it arrives. Receivers concatenate fragments. Control frames may be interleaved between fragments.
-
The Sec-WebSocket-Accept header proves protocol understanding — the SHA-1 hash of the client's nonce + a fixed GUID ensures the server intentionally accepted the WebSocket upgrade.
What did you think?