Binary Protocol Parsers: Designing and Parsing Wire Formats From Scratch
Binary Protocol Parsers: Designing and Parsing Wire Formats From Scratch
JSON is human-readable but wastes bandwidth. Binary protocols are 2-10× smaller, 10-100× faster to parse, and power every high-performance system: TCP/IP headers, Protocol Buffers, MessagePack, DNS, TLS, WebSocket frames, and database wire protocols. Today we build binary protocol parsers from the ground up — type-length-value encoding, schema evolution, zero-copy decoding, and a complete protocol implementation.
Why Binary Protocols
JSON: {"id":12345,"name":"Alice","age":30,"active":true}
Bytes: 52 bytes, requires full text parsing + string allocation
Binary (TLV): [01 00 04 00 00 30 39] [02 00 05 41 6C 69 63 65] ...
Bytes: ~20 bytes, parsed via pointer arithmetic
JSON Binary (Protobuf)
─────────────────────────────────────────────────
Payload size 100% 20-40%
Parse time 100% 5-10%
Memory allocations many zero (zero-copy)
Schema evolution ✅ (flexible) ✅ (with field numbers)
Human readable ✅ ❌
Debug-friendly ✅ ❌ (need tooling)
1. Binary Reader/Writer Primitives
/**
* Low-level binary buffer reader with cursor management.
* Foundation for ALL binary protocol parsers.
*/
class BinaryReader {
private view: DataView;
private offset: number = 0;
private readonly buffer: ArrayBuffer;
constructor(buffer: ArrayBuffer) {
this.buffer = buffer;
this.view = new DataView(buffer);
}
// --- Integer types ---
readUint8(): number {
const value = this.view.getUint8(this.offset);
this.offset += 1;
return value;
}
readUint16LE(): number {
const value = this.view.getUint16(this.offset, true);
this.offset += 2;
return value;
}
readUint16BE(): number {
const value = this.view.getUint16(this.offset, false);
this.offset += 2;
return value;
}
readUint32LE(): number {
const value = this.view.getUint32(this.offset, true);
this.offset += 4;
return value;
}
readUint32BE(): number {
const value = this.view.getUint32(this.offset, false);
this.offset += 4;
return value;
}
readInt32LE(): number {
const value = this.view.getInt32(this.offset, true);
this.offset += 4;
return value;
}
readUint64LE(): bigint {
const value = this.view.getBigUint64(this.offset, true);
this.offset += 8;
return value;
}
readFloat32LE(): number {
const value = this.view.getFloat32(this.offset, true);
this.offset += 4;
return value;
}
readFloat64LE(): number {
const value = this.view.getFloat64(this.offset, true);
this.offset += 8;
return value;
}
/**
* Variable-length integer (varint) — same as Protocol Buffers.
* Each byte uses 7 bits for data, 1 bit to indicate continuation.
*
* Value 300 = 0b100101100:
* Byte 1: 1_0101100 (continuation=1, data=0101100)
* Byte 2: 0_0000010 (continuation=0, data=0000010)
* Result: 0000010_0101100 = 300
*/
readVarint(): number {
let result = 0;
let shift = 0;
while (true) {
const byte = this.readUint8();
result |= (byte & 0x7F) << shift;
if ((byte & 0x80) === 0) break; // No continuation bit
shift += 7;
if (shift > 35) throw new Error('Varint too long');
}
return result >>> 0; // Ensure unsigned
}
/**
* Signed varint using ZigZag encoding.
* Maps signed integers to unsigned:
* 0 → 0, -1 → 1, 1 → 2, -2 → 3, 2 → 4, ...
*
* This way, small negative numbers are still small.
*/
readSignedVarint(): number {
const unsigned = this.readVarint();
return (unsigned >>> 1) ^ -(unsigned & 1);
}
// --- String / Bytes ---
/**
* Length-prefixed string (varint length + UTF-8 bytes).
* Zero-copy: returns a view into the original buffer.
*/
readString(): string {
const length = this.readVarint();
const bytes = new Uint8Array(this.buffer, this.offset, length);
this.offset += length;
return new TextDecoder().decode(bytes);
}
readBytes(length: number): Uint8Array {
const bytes = new Uint8Array(this.buffer, this.offset, length);
this.offset += length;
return bytes;
}
readLengthPrefixedBytes(): Uint8Array {
const length = this.readVarint();
return this.readBytes(length);
}
// --- Cursor management ---
getOffset(): number { return this.offset; }
setOffset(offset: number): void { this.offset = offset; }
remaining(): number { return this.buffer.byteLength - this.offset; }
isEOF(): boolean { return this.offset >= this.buffer.byteLength; }
/**
* Peek without advancing cursor.
*/
peek(fn: () => any): any {
const savedOffset = this.offset;
const result = fn();
this.offset = savedOffset;
return result;
}
/**
* Sub-reader for parsing nested messages.
*/
subReader(length: number): BinaryReader {
const sub = new BinaryReader(
this.buffer.slice(this.offset, this.offset + length)
);
this.offset += length;
return sub;
}
}
class BinaryWriter {
private buffer: ArrayBuffer;
private view: DataView;
private offset: number = 0;
private capacity: number;
constructor(initialCapacity: number = 1024) {
this.capacity = initialCapacity;
this.buffer = new ArrayBuffer(initialCapacity);
this.view = new DataView(this.buffer);
}
// Auto-grow
private ensureCapacity(needed: number): void {
if (this.offset + needed <= this.capacity) return;
while (this.capacity < this.offset + needed) {
this.capacity *= 2;
}
const newBuffer = new ArrayBuffer(this.capacity);
new Uint8Array(newBuffer).set(new Uint8Array(this.buffer));
this.buffer = newBuffer;
this.view = new DataView(this.buffer);
}
writeUint8(value: number): void {
this.ensureCapacity(1);
this.view.setUint8(this.offset, value);
this.offset += 1;
}
writeUint16LE(value: number): void {
this.ensureCapacity(2);
this.view.setUint16(this.offset, value, true);
this.offset += 2;
}
writeUint16BE(value: number): void {
this.ensureCapacity(2);
this.view.setUint16(this.offset, value, false);
this.offset += 2;
}
writeUint32LE(value: number): void {
this.ensureCapacity(4);
this.view.setUint32(this.offset, value, true);
this.offset += 4;
}
writeUint32BE(value: number): void {
this.ensureCapacity(4);
this.view.setUint32(this.offset, value, false);
this.offset += 4;
}
writeFloat64LE(value: number): void {
this.ensureCapacity(8);
this.view.setFloat64(this.offset, value, true);
this.offset += 8;
}
writeVarint(value: number): void {
value = value >>> 0;
while (value > 0x7F) {
this.writeUint8((value & 0x7F) | 0x80);
value >>>= 7;
}
this.writeUint8(value);
}
writeSignedVarint(value: number): void {
// ZigZag encode: (n << 1) ^ (n >> 31)
const zigzag = (value << 1) ^ (value >> 31);
this.writeVarint(zigzag >>> 0);
}
writeString(str: string): void {
const encoded = new TextEncoder().encode(str);
this.writeVarint(encoded.length);
this.writeByteArray(encoded);
}
writeByteArray(bytes: Uint8Array): void {
this.ensureCapacity(bytes.length);
new Uint8Array(this.buffer).set(bytes, this.offset);
this.offset += bytes.length;
}
/**
* Get the final buffer (trimmed to actual content).
*/
finish(): ArrayBuffer {
return this.buffer.slice(0, this.offset);
}
getOffset(): number { return this.offset; }
}
2. TLV Protocol (Type-Length-Value)
/**
* TLV (Type-Length-Value) — the foundation of many binary protocols.
* Used in: ASN.1/BER, RADIUS, DHCP options, DNS resource records.
*
* Each field is self-describing:
* ┌──────┬────────┬─────────────────┐
* │ Type │ Length │ Value │
* │ 1-2B │ 1-4B │ (Length bytes) │
* └──────┴────────┴─────────────────┘
*/
enum TLVType {
// Wire types
VARINT = 0, // int32, int64, bool, enum
FIXED64 = 1, // fixed64, double
BYTES = 2, // string, bytes, embedded messages
FIXED32 = 5, // fixed32, float
}
interface TLVField {
fieldNumber: number;
wireType: TLVType;
value: number | bigint | Uint8Array | string;
}
class TLVCodec {
/**
* Encode a message as TLV fields.
*
* Field key format (same as Protocol Buffers):
* key = (field_number << 3) | wire_type
* Encoded as varint.
*/
static encode(fields: TLVField[]): ArrayBuffer {
const writer = new BinaryWriter();
for (const field of fields) {
const key = (field.fieldNumber << 3) | field.wireType;
writer.writeVarint(key);
switch (field.wireType) {
case TLVType.VARINT:
writer.writeVarint(field.value as number);
break;
case TLVType.FIXED64:
writer.writeFloat64LE(field.value as number);
break;
case TLVType.BYTES:
if (typeof field.value === 'string') {
writer.writeString(field.value);
} else {
const bytes = field.value as Uint8Array;
writer.writeVarint(bytes.length);
writer.writeByteArray(bytes);
}
break;
case TLVType.FIXED32:
writer.writeUint32LE(field.value as number);
break;
}
}
return writer.finish();
}
/**
* Decode TLV bytes into fields.
* Unknown fields are preserved (forward compatibility!).
*/
static decode(buffer: ArrayBuffer): TLVField[] {
const reader = new BinaryReader(buffer);
const fields: TLVField[] = [];
while (!reader.isEOF()) {
const key = reader.readVarint();
const fieldNumber = key >>> 3;
const wireType = (key & 0x7) as TLVType;
let value: TLVField['value'];
switch (wireType) {
case TLVType.VARINT:
value = reader.readVarint();
break;
case TLVType.FIXED64:
value = reader.readFloat64LE();
break;
case TLVType.BYTES:
value = reader.readLengthPrefixedBytes();
break;
case TLVType.FIXED32:
value = reader.readUint32LE();
break;
default:
throw new Error(`Unknown wire type: ${wireType}`);
}
fields.push({ fieldNumber, wireType, value });
}
return fields;
}
}
3. Schema-Driven Protocol (like Protobuf)
/**
* A schema-driven binary protocol with:
* - Field numbers for backward/forward compatibility
* - Nested messages
* - Repeated fields
* - Optional fields (just omit from wire)
* - Default values (not serialized)
*
* Schema definition:
* message User {
* 1: uint32 id
* 2: string name
* 3: string email
* 4: repeated string tags
* 5: Address address // Nested message
* 6: bool active = true // Default value
* }
*/
type FieldType =
| 'uint32' | 'int32' | 'uint64' | 'int64'
| 'float' | 'double'
| 'bool' | 'string' | 'bytes'
| 'message';
interface FieldDef {
number: number;
name: string;
type: FieldType;
repeated?: boolean;
messageSchema?: MessageSchema;
defaultValue?: any;
}
interface MessageSchema {
name: string;
fields: Map<number, FieldDef>;
}
class SchemaSerializer {
private schema: MessageSchema;
constructor(schema: MessageSchema) {
this.schema = schema;
}
/**
* Serialize an object according to schema.
*/
serialize(obj: Record<string, any>): ArrayBuffer {
const writer = new BinaryWriter();
this.writeMessage(writer, obj, this.schema);
return writer.finish();
}
private writeMessage(
writer: BinaryWriter,
obj: Record<string, any>,
schema: MessageSchema
): void {
for (const [fieldNum, fieldDef] of schema.fields) {
const value = obj[fieldDef.name];
// Skip missing or default-valued fields
if (value === undefined || value === null) continue;
if (value === fieldDef.defaultValue) continue;
if (fieldDef.repeated && Array.isArray(value)) {
// Repeated field: write each element with the same field number
for (const item of value) {
this.writeField(writer, fieldNum, fieldDef, item);
}
} else {
this.writeField(writer, fieldNum, fieldDef, value);
}
}
}
private writeField(
writer: BinaryWriter,
fieldNum: number,
fieldDef: FieldDef,
value: any
): void {
switch (fieldDef.type) {
case 'uint32':
case 'int32':
case 'bool': {
const key = (fieldNum << 3) | TLVType.VARINT;
writer.writeVarint(key);
writer.writeVarint(fieldDef.type === 'bool' ? (value ? 1 : 0) : value);
break;
}
case 'int64':
case 'uint64': {
const key = (fieldNum << 3) | TLVType.VARINT;
writer.writeVarint(key);
writer.writeVarint(Number(value));
break;
}
case 'double': {
const key = (fieldNum << 3) | TLVType.FIXED64;
writer.writeVarint(key);
writer.writeFloat64LE(value);
break;
}
case 'float': {
const key = (fieldNum << 3) | TLVType.FIXED32;
writer.writeVarint(key);
writer.writeUint32LE(value);
break;
}
case 'string': {
const key = (fieldNum << 3) | TLVType.BYTES;
writer.writeVarint(key);
writer.writeString(value);
break;
}
case 'bytes': {
const key = (fieldNum << 3) | TLVType.BYTES;
writer.writeVarint(key);
writer.writeVarint(value.length);
writer.writeByteArray(value);
break;
}
case 'message': {
// Serialize nested message to bytes, then write as length-delimited
const nested = new SchemaSerializer(fieldDef.messageSchema!);
const nestedBuf = nested.serialize(value);
const key = (fieldNum << 3) | TLVType.BYTES;
writer.writeVarint(key);
writer.writeVarint(nestedBuf.byteLength);
writer.writeByteArray(new Uint8Array(nestedBuf));
break;
}
}
}
/**
* Deserialize bytes according to schema.
*/
deserialize(buffer: ArrayBuffer): Record<string, any> {
const reader = new BinaryReader(buffer);
return this.readMessage(reader, this.schema, buffer.byteLength);
}
private readMessage(
reader: BinaryReader,
schema: MessageSchema,
length: number
): Record<string, any> {
const result: Record<string, any> = {};
const endOffset = reader.getOffset() + length;
// Initialize defaults
for (const [, fieldDef] of schema.fields) {
if (fieldDef.defaultValue !== undefined) {
result[fieldDef.name] = fieldDef.defaultValue;
}
if (fieldDef.repeated) {
result[fieldDef.name] = [];
}
}
while (reader.getOffset() < endOffset) {
const key = reader.readVarint();
const fieldNum = key >>> 3;
const wireType = key & 0x7;
const fieldDef = schema.fields.get(fieldNum);
if (!fieldDef) {
// Unknown field — skip it (forward compatibility!)
this.skipUnknownField(reader, wireType);
continue;
}
const value = this.readFieldValue(reader, fieldDef, wireType);
if (fieldDef.repeated) {
result[fieldDef.name].push(value);
} else {
result[fieldDef.name] = value;
}
}
return result;
}
private readFieldValue(
reader: BinaryReader,
fieldDef: FieldDef,
wireType: number
): any {
switch (fieldDef.type) {
case 'uint32':
case 'int32':
return reader.readVarint();
case 'bool':
return reader.readVarint() !== 0;
case 'double':
return reader.readFloat64LE();
case 'string': {
const len = reader.readVarint();
const bytes = reader.readBytes(len);
return new TextDecoder().decode(bytes);
}
case 'bytes': {
const len = reader.readVarint();
return reader.readBytes(len);
}
case 'message': {
const len = reader.readVarint();
return this.readMessage(reader, fieldDef.messageSchema!, len);
}
default:
this.skipUnknownField(reader, wireType);
return undefined;
}
}
private skipUnknownField(reader: BinaryReader, wireType: number): void {
switch (wireType) {
case TLVType.VARINT:
reader.readVarint();
break;
case TLVType.FIXED64:
reader.readBytes(8);
break;
case TLVType.BYTES: {
const len = reader.readVarint();
reader.readBytes(len);
break;
}
case TLVType.FIXED32:
reader.readBytes(4);
break;
}
}
}
/**
* Demo: define and use a schema.
*/
function schemaDemo(): void {
const addressSchema: MessageSchema = {
name: 'Address',
fields: new Map([
[1, { number: 1, name: 'street', type: 'string' }],
[2, { number: 2, name: 'city', type: 'string' }],
[3, { number: 3, name: 'zip', type: 'string' }],
]),
};
const userSchema: MessageSchema = {
name: 'User',
fields: new Map([
[1, { number: 1, name: 'id', type: 'uint32' }],
[2, { number: 2, name: 'name', type: 'string' }],
[3, { number: 3, name: 'email', type: 'string' }],
[4, { number: 4, name: 'tags', type: 'string', repeated: true }],
[5, {
number: 5, name: 'address', type: 'message',
messageSchema: addressSchema,
}],
[6, { number: 6, name: 'active', type: 'bool', defaultValue: true }],
]),
};
const serializer = new SchemaSerializer(userSchema);
const user = {
id: 12345,
name: 'Alice',
email: 'alice@example.com',
tags: ['admin', 'dev'],
address: { street: '123 Main St', city: 'Springfield', zip: '62701' },
active: true, // Won't be serialized (it's the default!)
};
const binary = serializer.serialize(user);
console.log(`JSON size: ${JSON.stringify(user).length} bytes`);
console.log(`Binary size: ${binary.byteLength} bytes`);
console.log(`Ratio: ${(binary.byteLength / JSON.stringify(user).length * 100).toFixed(0)}%`);
const decoded = serializer.deserialize(binary);
console.log('Decoded:', decoded);
}
4. Framing Protocol (like WebSocket/gRPC)
/**
* Message framing — how to send multiple messages over a stream.
* TCP is a byte stream, not a message stream.
* You need to delimit message boundaries.
*
* FRAMING STRATEGIES:
* 1. Length-prefixed (protobuf, gRPC, most binary protocols)
* 2. Delimiter-based (HTTP headers use \r\n\r\n)
* 3. Fixed-size (some network protocols)
*
* We implement length-prefixed framing like gRPC/HTTP2.
*
* Frame format:
* ┌──────────┬──────────┬──────────────────┐
* │ Flags │ Length │ Payload │
* │ (1 byte) │ (4 bytes)│ (Length bytes) │
* └──────────┴──────────┴──────────────────┘
*/
const enum FrameFlag {
NONE = 0x00,
COMPRESSED = 0x01,
END_STREAM = 0x02,
HEARTBEAT = 0x04,
}
interface Frame {
flags: number;
payload: Uint8Array;
}
class FrameCodec {
private readonly maxFrameSize: number;
constructor(maxFrameSize: number = 16 * 1024 * 1024) { // 16MB default
this.maxFrameSize = maxFrameSize;
}
/**
* Encode a single frame.
*/
encodeFrame(payload: Uint8Array, flags: number = FrameFlag.NONE): ArrayBuffer {
const writer = new BinaryWriter(5 + payload.length);
writer.writeUint8(flags);
writer.writeUint32BE(payload.length);
writer.writeByteArray(payload);
return writer.finish();
}
/**
* Streaming frame decoder.
* Handles partial reads (TCP doesn't guarantee complete messages).
*
* This is the core challenge of binary protocol parsing over streams:
* you might receive half a frame, need to buffer it, and continue
* when more data arrives.
*/
createStreamDecoder(): StreamFrameDecoder {
return new StreamFrameDecoder(this.maxFrameSize);
}
}
class StreamFrameDecoder {
private chunks: Uint8Array[] = [];
private totalBuffered: number = 0;
private readonly maxFrameSize: number;
// Parser state machine
private state: 'header' | 'payload' = 'header';
private currentFlags: number = 0;
private currentLength: number = 0;
constructor(maxFrameSize: number) {
this.maxFrameSize = maxFrameSize;
}
/**
* Feed incoming bytes. Returns decoded frames.
* May return 0 frames (incomplete data) or multiple frames.
*/
feed(data: Uint8Array): Frame[] {
this.chunks.push(data);
this.totalBuffered += data.length;
const frames: Frame[] = [];
while (true) {
if (this.state === 'header') {
// Need 5 bytes: 1 (flags) + 4 (length)
if (this.totalBuffered < 5) break;
const header = this.consume(5);
this.currentFlags = header[0];
this.currentLength =
(header[1] << 24) | (header[2] << 16) |
(header[3] << 8) | header[4];
if (this.currentLength > this.maxFrameSize) {
throw new Error(
`Frame too large: ${this.currentLength} > ${this.maxFrameSize}`
);
}
this.state = 'payload';
}
if (this.state === 'payload') {
if (this.totalBuffered < this.currentLength) break;
const payload = this.consume(this.currentLength);
frames.push({
flags: this.currentFlags,
payload,
});
this.state = 'header';
}
}
return frames;
}
/**
* Consume exactly `n` bytes from the buffer.
*/
private consume(n: number): Uint8Array {
const result = new Uint8Array(n);
let copied = 0;
while (copied < n) {
const chunk = this.chunks[0];
const needed = n - copied;
if (chunk.length <= needed) {
result.set(chunk, copied);
copied += chunk.length;
this.chunks.shift();
} else {
result.set(chunk.subarray(0, needed), copied);
this.chunks[0] = chunk.subarray(needed);
copied += needed;
}
}
this.totalBuffered -= n;
return result;
}
}
5. Complete RPC Protocol
/**
* A complete binary RPC protocol (simplified gRPC-like).
*
* Message types:
* REQUEST: [msgType=1] [requestId] [methodLen] [method] [payloadLen] [payload]
* RESPONSE: [msgType=2] [requestId] [statusCode] [payloadLen] [payload]
* STREAM: [msgType=3] [streamId] [payloadLen] [payload]
* PING: [msgType=4] [timestamp]
* PONG: [msgType=5] [timestamp]
*/
enum RPCMessageType {
REQUEST = 1,
RESPONSE = 2,
STREAM = 3,
PING = 4,
PONG = 5,
}
enum RPCStatus {
OK = 0,
ERROR = 1,
NOT_FOUND = 2,
INVALID_ARGUMENT = 3,
TIMEOUT = 4,
INTERNAL_ERROR = 5,
}
interface RPCRequest {
type: RPCMessageType.REQUEST;
requestId: number;
method: string;
payload: Uint8Array;
}
interface RPCResponse {
type: RPCMessageType.RESPONSE;
requestId: number;
status: RPCStatus;
payload: Uint8Array;
}
interface RPCStream {
type: RPCMessageType.STREAM;
streamId: number;
payload: Uint8Array;
}
interface RPCPing {
type: RPCMessageType.PING;
timestamp: number;
}
interface RPCPong {
type: RPCMessageType.PONG;
timestamp: number;
}
type RPCMessage = RPCRequest | RPCResponse | RPCStream | RPCPing | RPCPong;
class RPCCodec {
static encode(msg: RPCMessage): ArrayBuffer {
const writer = new BinaryWriter();
writer.writeUint8(msg.type);
switch (msg.type) {
case RPCMessageType.REQUEST:
writer.writeUint32LE(msg.requestId);
writer.writeString(msg.method);
writer.writeVarint(msg.payload.length);
writer.writeByteArray(msg.payload);
break;
case RPCMessageType.RESPONSE:
writer.writeUint32LE(msg.requestId);
writer.writeUint8(msg.status);
writer.writeVarint(msg.payload.length);
writer.writeByteArray(msg.payload);
break;
case RPCMessageType.STREAM:
writer.writeUint32LE(msg.streamId);
writer.writeVarint(msg.payload.length);
writer.writeByteArray(msg.payload);
break;
case RPCMessageType.PING:
case RPCMessageType.PONG:
writer.writeFloat64LE(msg.timestamp);
break;
}
return writer.finish();
}
static decode(buffer: ArrayBuffer): RPCMessage {
const reader = new BinaryReader(buffer);
const type = reader.readUint8() as RPCMessageType;
switch (type) {
case RPCMessageType.REQUEST:
return {
type,
requestId: reader.readUint32LE(),
method: reader.readString(),
payload: reader.readLengthPrefixedBytes(),
};
case RPCMessageType.RESPONSE:
return {
type,
requestId: reader.readUint32LE(),
status: reader.readUint8() as RPCStatus,
payload: reader.readLengthPrefixedBytes(),
};
case RPCMessageType.STREAM:
return {
type,
streamId: reader.readUint32LE(),
payload: reader.readLengthPrefixedBytes(),
};
case RPCMessageType.PING:
return { type, timestamp: reader.readFloat64LE() };
case RPCMessageType.PONG:
return { type, timestamp: reader.readFloat64LE() };
default:
throw new Error(`Unknown message type: ${type}`);
}
}
}
/**
* Full RPC client with request tracking and timeouts.
*/
class RPCClient {
private nextId: number = 1;
private pending: Map<number, {
resolve: (resp: RPCResponse) => void;
reject: (err: Error) => void;
timer: ReturnType<typeof setTimeout>;
}> = new Map();
private frameCodec = new FrameCodec();
private decoder = this.frameCodec.createStreamDecoder();
async call(
method: string,
payload: Uint8Array,
timeout: number = 5000
): Promise<RPCResponse> {
const requestId = this.nextId++;
const request: RPCRequest = {
type: RPCMessageType.REQUEST,
requestId,
method,
payload,
};
const encoded = RPCCodec.encode(request);
const frame = this.frameCodec.encodeFrame(new Uint8Array(encoded));
// Send frame over transport (TCP, WebSocket, etc.)
this.send(new Uint8Array(frame));
return new Promise((resolve, reject) => {
const timer = setTimeout(() => {
this.pending.delete(requestId);
reject(new Error(`RPC timeout: ${method} (${timeout}ms)`));
}, timeout);
this.pending.set(requestId, { resolve, reject, timer });
});
}
/**
* Called when bytes arrive from the transport.
*/
onData(data: Uint8Array): void {
const frames = this.decoder.feed(data);
for (const frame of frames) {
const msg = RPCCodec.decode(frame.payload.buffer);
if (msg.type === RPCMessageType.RESPONSE) {
const pending = this.pending.get(msg.requestId);
if (pending) {
clearTimeout(pending.timer);
this.pending.delete(msg.requestId);
if (msg.status === RPCStatus.OK) {
pending.resolve(msg);
} else {
pending.reject(
new Error(`RPC error: status=${RPCStatus[msg.status]}`)
);
}
}
}
if (msg.type === RPCMessageType.PING) {
// Respond with PONG
const pong = RPCCodec.encode({
type: RPCMessageType.PONG,
timestamp: msg.timestamp,
});
this.send(new Uint8Array(this.frameCodec.encodeFrame(new Uint8Array(pong))));
}
}
}
private send(data: Uint8Array): void {
// Placeholder: in production, write to TCP socket / WebSocket
console.log(`[RPC] Sending ${data.length} bytes`);
}
}
Protocol Comparison
| Protocol | Encoding | Schema | Size | Parse Speed | Streaming | Human Readable |
|---|---|---|---|---|---|---|
| JSON | Text | No | Largest | Slow | No | Yes |
| MessagePack | Binary | No | ~60% of JSON | Fast | No | No |
| Protobuf | Binary TLV | Yes (.proto) | ~30% of JSON | Very Fast | Yes (gRPC) | No |
| FlatBuffers | Binary (zero-copy) | Yes (.fbs) | ~35% of JSON | Instant | No | No |
| Cap'n Proto | Binary (zero-copy) | Yes | ~35% of JSON | Instant | Yes | No |
| Avro | Binary | Yes (.avsc) | ~30% of JSON | Fast | Yes | No |
| CBOR | Binary | No | ~50% of JSON | Fast | No | No |
Schema Evolution Rules
SAFE SCHEMA CHANGES (backward + forward compatible):
✅ ADD optional field (new number)
Old reader: ignores unknown field number
New reader: uses default if field missing
✅ REMOVE optional field
Old reader: still reads, just ignores if missing
New reader: skips unknown field
✅ RENAME field (field number stays the same)
Wire format uses numbers, not names
✅ Change int32 ↔ int64 (widening)
Varint encoding handles both
UNSAFE CHANGES:
❌ CHANGE field number
Old and new readers see different fields!
❌ Change wire type (e.g., string → int)
Parser will misinterpret bytes
❌ Make optional → required
Old writers don't send it; new reader rejects
❌ Reuse deleted field number
Old data with that number → wrong semantics
PROTOBUF BEST PRACTICES:
- Never reuse field numbers (use `reserved`)
- Start field numbers at 1 (1-15 use 1-byte key)
- Use field numbers 1-15 for frequently-set fields
- All fields are effectively optional in proto3
Interview Questions & Answers
Q: How would you design a binary protocol for a real-time multiplayer game?
Key requirements: minimal latency, small packets (fit in MTU ~1400 bytes), tolerant of packet loss (UDP). Design: (1) Fixed header: 1-byte message type + 2-byte sequence number + 4-byte timestamp (7 bytes). (2) Delta encoding: send only changes since last acknowledged state, not full state. (3) Bit packing: a player position (x,y,z) doesn't need float64 — quantize to 16-bit fixed-point (0.01 unit precision in a 655-unit world). That's 6 bytes instead of 24. (4) No length-prefixed framing: each UDP datagram is one message. (5) No varints for hot fields: fixed-size fields are faster to parse (direct offset, no loops). Varints save space but cost parse time. (6) Prediction + correction: client predicts, server sends corrections only when prediction error exceeds threshold. Total packet: 20-100 bytes for typical state update, supporting 60Hz tick rate.
Q: What is zero-copy parsing and when should you use it?
Zero-copy parsing means reading data directly from the receive buffer without copying or allocating new objects. FlatBuffers and Cap'n Proto serialize data in a format that IS the in-memory representation — to "parse" a FlatBuffer, you just cast the buffer pointer and start reading. Benefits: no parse step (instant access), no memory allocation (no GC pressure), great for large messages. Tradeoffs: (1) Random access is fast, but sequential scan may be cache-unfriendly (data isn't compact). (2) The buffer must be kept alive as long as any reference exists. (3) Byte order must match (typically little-endian). (4) Schema changes are constrained (can't reorder fields). Use it for: large messages, latency-sensitive systems (game engines, real-time analytics), or when you only need a few fields from a large message (access is O(1), not O(n) like Protobuf).
Q: How does Protocol Buffers achieve backward and forward compatibility?
Via field numbers + wire types. Each field is identified by a number, not a name. On the wire:
(field_number << 3 | wire_type)followed by the value. For backward compatibility (new reader, old data): new fields have defaults, so missing fields get default values. For forward compatibility (old reader, new data): unknown field numbers are skipped using the wire type to determine how many bytes to skip (varint: read until no continuation bit; length-delimited: read length then skip; fixed32/64: skip 4/8 bytes). This means producers and consumers can be updated independently — critical for microservices and mobile apps where you can't deploy all clients simultaneously.
Real-World Problems & How to Solve Them
Problem 1: Parser crashes on truncated packets
Symptom: Production logs show RangeError or random decode values near network boundaries.
Root cause: Reader methods advance cursor without checking remaining bytes before each primitive read.
Fix — add explicit bounds checks in low-level readers:
class SafeBinaryReader {
private view: DataView;
private offset = 0;
constructor(private readonly buffer: ArrayBuffer) {
this.view = new DataView(buffer);
}
private ensure(bytes: number): void {
if (this.offset + bytes > this.buffer.byteLength) {
throw new Error(`Truncated buffer: need ${bytes} bytes`);
}
}
readUint32LE(): number {
this.ensure(4);
const value = this.view.getUint32(this.offset, true);
this.offset += 4;
return value;
}
}
Problem 2: Parsed numbers look byte-swapped
Symptom: Message length 1024 becomes 262144 or IDs are wildly incorrect.
Root cause: Producer writes little-endian while parser reads big-endian (or vice versa).
Fix — lock endianness in protocol header and use one read path:
interface FrameHeader {
magic: number;
version: number;
length: number;
}
function readHeaderLE(reader: BinaryReader): FrameHeader {
const magic = reader.readUint16LE();
if (magic !== 0xCAFE) throw new Error('bad magic');
const version = reader.readUint8();
const length = reader.readUint32LE();
return { magic, version, length };
}
Problem 3: Malformed varints hang decode loop
Symptom: A single bad packet pegs CPU and the parser never returns.
Root cause: Varint decoder loops until continuation bit clears, but malformed input can keep it set forever.
Fix — cap bytes and fail fast on overlong varints:
function readVarintSafe(reader: BinaryReader): number {
let value = 0;
let shift = 0;
for (let i = 0; i < 5; i++) {
const b = reader.readUint8();
value |= (b & 0x7f) << shift;
if ((b & 0x80) === 0) return value >>> 0;
shift += 7;
}
throw new Error('Invalid varint: exceeds 5 bytes for uint32');
}
Problem 4: TCP packet splitting breaks parser state
Symptom: Decoder fails intermittently even though payloads are valid when inspected in full.
Root cause: Code assumes each socket data event contains one whole message; TCP is a byte stream.
Fix — add a frame accumulator with length-prefix decoding:
class LengthPrefixedFramer {
private pending = new Uint8Array(0);
push(chunk: Uint8Array): Uint8Array[] {
const merged = new Uint8Array(this.pending.length + chunk.length);
merged.set(this.pending, 0);
merged.set(chunk, this.pending.length);
const frames: Uint8Array[] = [];
let offset = 0;
while (offset + 4 <= merged.length) {
const len = new DataView(merged.buffer, merged.byteOffset + offset, 4).getUint32(0, true);
if (offset + 4 + len > merged.length) break;
frames.push(merged.subarray(offset + 4, offset + 4 + len));
offset += 4 + len;
}
this.pending = merged.subarray(offset);
return frames;
}
}
Problem 5: High GC pauses under throughput spikes
Symptom: CPU time is spent in garbage collection, not parsing.
Root cause: Decoder repeatedly copies payload slices (buffer.slice) instead of using views.
Fix — use zero-copy Uint8Array views over the original buffer:
function readBytesView(
buffer: ArrayBuffer,
byteOffset: number,
length: number
): Uint8Array {
return new Uint8Array(buffer, byteOffset, length); // zero-copy view
}
// Avoid:
// buffer.slice(byteOffset, byteOffset + length) // allocates + copies
Problem 6: Nested message parser overreads parent frame
Symptom: Decoding one nested field corrupts subsequent top-level fields.
Root cause: Length-delimited submessages are parsed against the parent reader with no strict boundary.
Fix — parse nested sections with bounded sub-readers and consumption checks:
function parseNestedMessage(parent: BinaryReader): Record<string, unknown> {
const length = parent.readVarint();
const start = parent.getOffset();
const nested = parent.subReader(length);
const result: Record<string, unknown> = {
code: nested.readUint16LE(),
value: nested.readString(),
};
if (!nested.isEOF()) {
throw new Error('Nested message has trailing bytes');
}
parent.setOffset(start + length);
return result;
}
Problem 7: Unknown TLV fields crash older clients
Symptom: New server releases break old mobile clients with “unknown type” errors.
Root cause: Parser treats unknown field types as fatal instead of skipping length-delimited payloads.
Fix — implement skip logic for unknown-but-well-formed fields:
function parseTLVField(reader: BinaryReader): { type: number; value: Uint8Array } | null {
if (reader.isEOF()) return null;
const type = reader.readUint8();
const length = reader.readUint16LE();
// Known types: 1..5
if (type >= 1 && type <= 5) {
return { type, value: reader.readBytes(length) };
}
reader.setOffset(reader.getOffset() + length); // skip unknown type
return null;
}
Key Takeaways
BINARY PROTOCOL DESIGN CHECKLIST:
┌─ WIRE FORMAT ──────────────────────────────┐
│ □ Endianness decided (LE is standard) │
│ □ Integer encoding (fixed vs varint) │
│ □ String encoding (length-prefixed UTF-8) │
│ □ Null/optional representation │
└────────────────────────────────────────────┘
┌─ FRAMING ──────────────────────────────────┐
│ □ Message boundaries (length-prefix) │
│ □ Max message size limit │
│ □ Streaming/chunking support │
│ □ Partial read handling (state machine) │
└────────────────────────────────────────────┘
┌─ EVOLUTION ────────────────────────────────┐
│ □ Field numbers (not names) on wire │
│ □ Unknown field skipping │
│ □ Default values for new fields │
│ □ Never reuse field numbers │
└────────────────────────────────────────────┘
┌─ PERFORMANCE ──────────────────────────────┐
│ □ Fixed-size hot path fields │
│ □ Zero-copy where possible │
│ □ Varint for variable-range cold fields │
│ □ Minimize allocations (buffer reuse) │
└────────────────────────────────────────────┘
What did you think?