Binary Protocol Design: From Protobuf to Custom Wire Formats
Binary Protocol Design: From Protobuf to Custom Wire Formats
JSON is the lingua franca of web APIs, but it's terrible for performance-critical paths. A 100-byte JSON payload becomes 30 bytes in Protocol Buffers. Parsing JSON is O(n) string processing; parsing protobuf is near-zero-cost field extraction. When you're sending millions of messages per second or operating on constrained devices, wire format design becomes a core architectural decision.
This article dissects how binary protocols work — from the encoding schemes of Protocol Buffers and FlatBuffers to designing your own wire format — with a focus on the tradeoffs that matter: parse speed, encode speed, message size, schema evolution, and implementation complexity.
Why Binary Protocols: The JSON Tax
// JSON encoding of a simple message
const user = {
id: 12345,
name: "Alice",
email: "alice@example.com",
roles: ["admin", "user"],
lastLogin: 1709654321
};
const jsonStr = JSON.stringify(user);
// 96 bytes: {"id":12345,"name":"Alice","email":"alice@example.com","roles":["admin","user"],"lastLogin":1709654321}
┌─────────────────────────────────────────────────────────────────────────┐
│ JSON OVERHEAD ANALYSIS │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ STRUCTURAL OVERHEAD: │
│ • Braces, brackets, colons, commas: 15+ bytes per object │
│ • Quoted field names repeated every message │
│ • Numbers as ASCII: "12345" = 5 bytes vs 2 bytes binary │
│ • No native binary data (must base64: 33% expansion) │
│ │
│ PARSING OVERHEAD: │
│ • Full text scan required (can't skip to field N) │
│ • Number parsing: atoi for every integer │
│ • String allocation for every value │
│ • Object structure rebuilt on every parse │
│ │
│ SCHEMA OVERHEAD: │
│ • No type information (runtime type checking needed) │
│ • Field names transmitted repeatedly │
│ • Optional fields require existence checks │
│ │
│ JSON: 96 bytes │
│ Protobuf: ~35 bytes (63% smaller) │
│ FlatBuffers: ~60 bytes (buffer overhead, but zero-copy access) │
│ MessagePack: ~50 bytes (JSON semantics, binary encoding) │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Protocol Buffers: The Industry Standard
Protocol Buffers (protobuf) is Google's language-neutral, schema-driven serialization format. Understanding its wire format reveals the core principles of efficient binary encoding.
The Wire Format
Every protobuf message is a sequence of key-value pairs:
message = *field
field = key value
key = (field_number << 3) | wire_type // varint encoded
wire_types:
0 = VARINT (int32, int64, uint32, uint64, sint32, sint64, bool, enum)
1 = I64 (fixed64, sfixed64, double)
2 = LEN (string, bytes, embedded messages, packed repeated fields)
5 = I32 (fixed32, sfixed32, float)
┌─────────────────────────────────────────────────────────────────────────┐
│ PROTOBUF WIRE FORMAT ENCODING │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ message User { │
│ uint64 id = 1; │
│ string name = 2; │
│ repeated string roles = 3; │
│ } │
│ │
│ User { id: 12345, name: "Alice", roles: ["admin", "user"] } │
│ │
│ ENCODED BYTES: │
│ ┌────┬────────┬────┬───────────────┬────┬────────────┬────┬──────────┐│
│ │ 08 │ B9 60 │ 12 │ 05 41 6C 69 │ 1A │ 05 61 64 │ 1A │ 04 75 73 ││
│ │ │ │ │ 63 65 │ │ 6D 69 6E │ │ 65 72 ││
│ └────┴────────┴────┴───────────────┴────┴────────────┴────┴──────────┘│
│ │ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ └───────┴─"user"│
│ │ │ │ │ │ └─"admin" │
│ │ │ │ │ │ │
│ │ │ │ │ └─ 1A = field 3, wire type 2 (LEN)│
│ │ │ │ │ │
│ │ │ │ └─"Alice" (length-prefixed) │
│ │ │ │ │
│ │ │ └─ 12 = field 2 (0010), wire type 2 (010) │
│ │ │ (2 << 3) | 2 = 18 = 0x12 │
│ │ │ │
│ │ └─ 12345 as varint: B9 60 (little-endian 7-bit groups) │
│ │ │
│ └─ 08 = field 1 (0001), wire type 0 (000) │
│ (1 << 3) | 0 = 8 = 0x08 │
│ │
│ VARINT ENCODING (for 12345): │
│ 12345 = 0x3039 = 0011 0000 0011 1001 │
│ Split into 7-bit groups: 01100 0001111001 │
│ Add continuation bits: 1_0111001 0_1100000 │
│ Little-endian: B9 60 (0xB9 = 185, 0x60 = 96) │
│ Decode: (0x39) | (0x60 << 7) = 57 | 12288 = 12345 ✓ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Varint Encoding Implementation
// Encode unsigned integer to varint bytes
function encodeVarint(value: number): Uint8Array {
const bytes: number[] = [];
while (value > 0x7f) {
// Take lowest 7 bits, set continuation bit (0x80)
bytes.push((value & 0x7f) | 0x80);
value >>>= 7;
}
// Last byte: no continuation bit
bytes.push(value & 0x7f);
return new Uint8Array(bytes);
}
// Decode varint from buffer at offset, returns [value, newOffset]
function decodeVarint(buffer: Uint8Array, offset: number): [number, number] {
let result = 0;
let shift = 0;
while (true) {
const byte = buffer[offset++];
result |= (byte & 0x7f) << shift;
if ((byte & 0x80) === 0) {
// No continuation bit — done
return [result, offset];
}
shift += 7;
if (shift >= 35) {
throw new Error('Varint too long');
}
}
}
// Signed integers use ZigZag encoding
// Maps negative numbers to positive: 0 -> 0, -1 -> 1, 1 -> 2, -2 -> 3, ...
function encodeZigZag(value: number): number {
return (value << 1) ^ (value >> 31);
}
function decodeZigZag(value: number): number {
return (value >>> 1) ^ -(value & 1);
}
Schema Evolution Rules
Protobuf's key feature is forward and backward compatible schema evolution:
┌─────────────────────────────────────────────────────────────────────────┐
│ PROTOBUF SCHEMA EVOLUTION │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ SAFE CHANGES (backward + forward compatible): │
│ ──────────────────────────────────────────── │
│ ✓ Add new optional/repeated fields (old code ignores unknown fields) │
│ ✓ Remove optional/repeated fields (new code uses defaults) │
│ ✓ Rename fields (only field NUMBER matters, name is for humans) │
│ ✓ Change int32 ↔ int64, uint32 ↔ uint64 (wire-compatible) │
│ ✓ Add values to enum (old code sees numeric value) │
│ │
│ UNSAFE CHANGES (will break): │
│ ──────────────────────────── │
│ ✗ Change field number │
│ ✗ Change wire type (int32 → string) │
│ ✗ Change repeated ↔ singular (packed encoding differs) │
│ ✗ Change meaning of field while keeping number │
│ ✗ Make optional field required (old data may lack it) │
│ │
│ FIELD NUMBER ALLOCATION STRATEGY: │
│ │
│ message User { │
│ // Core fields: 1-15 (1-byte key encoding) │
│ uint64 id = 1; │
│ string name = 2; │
│ string email = 3; │
│ │
│ // Less common fields: 16-2047 (2-byte key encoding) │
│ optional string phone = 16; │
│ optional bytes avatar = 17; │
│ │
│ // Reserved for future / deprecated │
│ reserved 100 to 199; │
│ reserved "old_field_name"; │
│ │
│ // Extensions (if using proto2) │
│ extensions 1000 to max; │
│ } │
│ │
│ Fields 1-15: 1-byte tag (field << 3 | type fits in 1 byte) │
│ Fields 16-2047: 2-byte tag │
│ Reserve low numbers for frequently-used fields │
│ │
└─────────────────────────────────────────────────────────────────────────┘
FlatBuffers: Zero-Copy Access
FlatBuffers (from Google) takes a different approach: the serialized buffer IS the in-memory representation. No parsing step — you read directly from the buffer.
┌─────────────────────────────────────────────────────────────────────────┐
│ FLATBUFFERS MEMORY LAYOUT │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ schema User { │
│ id: uint64; │
│ name: string; │
│ email: string; │
│ } │
│ │
│ BUFFER LAYOUT (little-endian): │
│ │
│ Offset 0 ┌────────────────┐ │
│ │ Root Table │ ← Offset to root table │
│ │ Offset (4B) │ │
│ Offset 4 ├────────────────┤ │
│ │ File ID (opt) │ ← Optional file identifier │
│ │ (4B) │ │
│ Offset 8 ├────────────────┤ │
│ │ VTable │ ← Field offset table │
│ │ ┌────────────┐ │ │
│ │ │ vtable_size│ │ 2 bytes: size of vtable │
│ │ │ (2B) │ │ │
│ │ ├────────────┤ │ │
│ │ │ object_size│ │ 2 bytes: size of object │
│ │ │ (2B) │ │ │
│ │ ├────────────┤ │ │
│ │ │ field 0 off│ │ 2 bytes: offset of id from obj start │
│ │ ├────────────┤ │ │
│ │ │ field 1 off│ │ 2 bytes: offset of name offset │
│ │ ├────────────┤ │ │
│ │ │ field 2 off│ │ 2 bytes: offset of email offset │
│ │ └────────────┘ │ │
│ Offset N ├────────────────┤ │
│ │ User Object │ │
│ │ ┌────────────┐ │ │
│ │ │ vtable off │ │ 4 bytes: negative offset to vtable │
│ │ ├────────────┤ │ │
│ │ │ id (8B) │ │ Inline: 8 bytes for uint64 │
│ │ ├────────────┤ │ │
│ │ │ name off │ │ 4 bytes: offset to string │
│ │ ├────────────┤ │ │
│ │ │ email off │ │ 4 bytes: offset to string │
│ │ └────────────┘ │ │
│ Offset M ├────────────────┤ │
│ │ "Alice" string │ │
│ │ ┌────────────┐ │ │
│ │ │ length (4B)│ │ │
│ │ ├────────────┤ │ │
│ │ │ "Alice\0" │ │ null-terminated for C compat │
│ │ └────────────┘ │ │
│ └────────────────┘ │
│ │
│ ACCESS PATTERN (no parsing!): │
│ user.name(): │
│ 1. Read vtable offset from object start │
│ 2. Look up field 1 offset in vtable │
│ 3. Add to object position to get string offset location │
│ 4. Read string offset, follow it │
│ 5. Read length, return pointer to string data │
│ │
│ All operations are pointer arithmetic + memory reads. │
│ No allocations. No copies. O(1) field access. │
│ │
└─────────────────────────────────────────────────────────────────────────┘
FlatBuffers Access Code
// Generated accessor pattern (simplified)
class User {
private bb: ByteBuffer;
private bb_pos: number;
static getRootAsUser(buffer: ByteBuffer): User {
// Root table offset is at position 0
const offset = buffer.readInt32(0);
const user = new User();
user.bb = buffer;
user.bb_pos = offset;
return user;
}
// Read vtable offset (stored as negative offset at object start)
private __offset(vtableOffset: number): number {
const vtable = this.bb_pos - this.bb.readInt32(this.bb_pos);
return vtableOffset < this.bb.readInt16(vtable)
? this.bb.readInt16(vtable + vtableOffset)
: 0;
}
id(): bigint {
const offset = this.__offset(4); // Field 0 at vtable offset 4
return offset ? this.bb.readUint64(this.bb_pos + offset) : 0n;
}
name(): string | null {
const offset = this.__offset(6); // Field 1 at vtable offset 6
if (!offset) return null;
// Read the offset to string, then read string at that location
const stringOffset = this.bb_pos + offset;
const stringLocation = stringOffset + this.bb.readInt32(stringOffset);
const length = this.bb.readInt32(stringLocation);
// Return view into existing buffer — no copy!
return this.bb.readStringAt(stringLocation + 4, length);
}
}
When to Use FlatBuffers vs Protobuf
┌─────────────────────────────────────────────────────────────────────────┐
│ FLATBUFFERS vs PROTOBUF DECISION MATRIX │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ USE FLATBUFFERS WHEN: │
│ • Zero-copy access is critical (game state, video frames) │
│ • Messages are large and you only read a few fields │
│ • Memory allocation is expensive (embedded, real-time) │
│ • Data is memory-mapped from files │
│ • Same data read multiple times without modification │
│ │
│ USE PROTOBUF WHEN: │
│ • Message size is critical (protobuf is ~30% smaller) │
│ • You need to read most/all fields anyway │
│ • You modify data after reading (FlatBuffers are immutable) │
│ • Streaming / incremental parsing needed │
│ • Ecosystem / tooling matters (protobuf is more mature) │
│ │
│ PERFORMANCE CHARACTERISTICS: │
│ │
│ Operation │ Protobuf │ FlatBuffers │
│ ───────────────────┼───────────────┼───────────────────────────────────│
│ Encode │ Moderate │ Faster (just memcpy structured) │
│ Decode (full) │ O(n) parse │ None (zero-copy) │
│ Decode (1 field) │ O(n) scan │ O(1) pointer chase │
│ Wire size │ Smaller │ Larger (offsets, padding) │
│ Memory after parse │ Native objects│ Buffer reference │
│ Mutation │ Easy │ Rebuild required │
│ Random access │ After parse │ Immediate │
│ │
└─────────────────────────────────────────────────────────────────────────┘
MessagePack: Binary JSON
MessagePack is "JSON but binary" — same data model, smaller encoding:
// MessagePack type prefixes
const FORMATS = {
// Positive fixint: 0x00 - 0x7f (value is the byte itself)
POSITIVE_FIXINT_MAX: 0x7f,
// Fixmap: 0x80 - 0x8f (lower 4 bits = element count)
FIXMAP: 0x80,
// Fixarray: 0x90 - 0x9f
FIXARRAY: 0x90,
// Fixstr: 0xa0 - 0xbf (lower 5 bits = length)
FIXSTR: 0xa0,
// nil, false, true
NIL: 0xc0,
FALSE: 0xc2,
TRUE: 0xc3,
// Binary data
BIN8: 0xc4, // + 1 byte length + data
BIN16: 0xc5, // + 2 byte length + data
BIN32: 0xc6, // + 4 byte length + data
// Floats
FLOAT32: 0xca, // + 4 bytes IEEE 754
FLOAT64: 0xcb, // + 8 bytes IEEE 754
// Unsigned integers
UINT8: 0xcc, // + 1 byte
UINT16: 0xcd, // + 2 bytes
UINT32: 0xce, // + 4 bytes
UINT64: 0xcf, // + 8 bytes
// Signed integers
INT8: 0xd0,
INT16: 0xd1,
INT32: 0xd2,
INT64: 0xd3,
// Strings
STR8: 0xd9, // + 1 byte length
STR16: 0xda, // + 2 byte length
STR32: 0xdb, // + 4 byte length
// Arrays
ARRAY16: 0xdc,
ARRAY32: 0xdd,
// Maps
MAP16: 0xde,
MAP32: 0xdf,
// Negative fixint: 0xe0 - 0xff (value is byte - 256)
NEGATIVE_FIXINT_MIN: 0xe0,
};
┌─────────────────────────────────────────────────────────────────────────┐
│ MESSAGEPACK ENCODING EXAMPLE │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Object: { id: 12345, name: "Alice", active: true } │
│ │
│ JSON: {"id":12345,"name":"Alice","active":true} │
│ 42 bytes │
│ │
│ MessagePack: │
│ ┌────┬────────────────────────────────────────────────────────────┐ │
│ │ 83 │ fixmap with 3 elements (0x80 | 3) │ │
│ ├────┼────────────────────────────────────────────────────────────┤ │
│ │ A2 │ fixstr length 2 (0xA0 | 2) │ │
│ │ 69 │ 'i' │ │
│ │ 64 │ 'd' │ │
│ ├────┼────────────────────────────────────────────────────────────┤ │
│ │ CD │ uint16 follows │ │
│ │ 30 │ 12345 >> 8 │ │
│ │ 39 │ 12345 & 0xFF │ │
│ ├────┼────────────────────────────────────────────────────────────┤ │
│ │ A4 │ fixstr length 4 │ │
│ │ 6E │ 'n' │ │
│ │ 61 │ 'a' │ │
│ │ 6D │ 'm' │ │
│ │ 65 │ 'e' │ │
│ ├────┼────────────────────────────────────────────────────────────┤ │
│ │ A5 │ fixstr length 5 │ │
│ │ 41 │ 'A' │ │
│ │ 6C │ 'l' │ │
│ │ 69 │ 'i' │ │
│ │ 63 │ 'c' │ │
│ │ 65 │ 'e' │ │
│ ├────┼────────────────────────────────────────────────────────────┤ │
│ │ A6 │ fixstr length 6 │ │
│ │ 61 │ 'a' │ │
│ │ 63 │ 'c' │ │
│ │ 74 │ 't' │ │
│ │ 69 │ 'i' │ │
│ │ 76 │ 'v' │ │
│ │ 65 │ 'e' │ │
│ ├────┼────────────────────────────────────────────────────────────┤ │
│ │ C3 │ true │ │
│ └────┴────────────────────────────────────────────────────────────┘ │
│ 26 bytes (38% smaller than JSON) │
│ │
└─────────────────────────────────────────────────────────────────────────┘
MessagePack Implementation
class MessagePackEncoder {
private buffer: Uint8Array;
private offset: number = 0;
constructor(initialSize: number = 256) {
this.buffer = new Uint8Array(initialSize);
}
private ensureCapacity(needed: number): void {
if (this.offset + needed > this.buffer.length) {
const newBuffer = new Uint8Array(this.buffer.length * 2);
newBuffer.set(this.buffer);
this.buffer = newBuffer;
}
}
encode(value: unknown): Uint8Array {
this.encodeValue(value);
return this.buffer.slice(0, this.offset);
}
private encodeValue(value: unknown): void {
if (value === null) {
this.ensureCapacity(1);
this.buffer[this.offset++] = 0xc0;
return;
}
if (typeof value === 'boolean') {
this.ensureCapacity(1);
this.buffer[this.offset++] = value ? 0xc3 : 0xc2;
return;
}
if (typeof value === 'number') {
this.encodeNumber(value);
return;
}
if (typeof value === 'string') {
this.encodeString(value);
return;
}
if (Array.isArray(value)) {
this.encodeArray(value);
return;
}
if (typeof value === 'object') {
this.encodeObject(value as Record<string, unknown>);
return;
}
throw new Error(`Unsupported type: ${typeof value}`);
}
private encodeNumber(num: number): void {
if (Number.isInteger(num)) {
if (num >= 0) {
if (num <= 0x7f) {
// Positive fixint
this.ensureCapacity(1);
this.buffer[this.offset++] = num;
} else if (num <= 0xff) {
this.ensureCapacity(2);
this.buffer[this.offset++] = 0xcc;
this.buffer[this.offset++] = num;
} else if (num <= 0xffff) {
this.ensureCapacity(3);
this.buffer[this.offset++] = 0xcd;
this.buffer[this.offset++] = num >> 8;
this.buffer[this.offset++] = num & 0xff;
} else if (num <= 0xffffffff) {
this.ensureCapacity(5);
this.buffer[this.offset++] = 0xce;
new DataView(this.buffer.buffer).setUint32(this.offset, num);
this.offset += 4;
}
} else {
if (num >= -32) {
// Negative fixint
this.ensureCapacity(1);
this.buffer[this.offset++] = 0x100 + num;
} else if (num >= -128) {
this.ensureCapacity(2);
this.buffer[this.offset++] = 0xd0;
this.buffer[this.offset++] = num + 256;
}
// ... handle larger negative integers
}
} else {
// Float64
this.ensureCapacity(9);
this.buffer[this.offset++] = 0xcb;
new DataView(this.buffer.buffer).setFloat64(this.offset, num);
this.offset += 8;
}
}
private encodeString(str: string): void {
const encoded = new TextEncoder().encode(str);
const length = encoded.length;
if (length <= 31) {
this.ensureCapacity(1 + length);
this.buffer[this.offset++] = 0xa0 | length;
} else if (length <= 0xff) {
this.ensureCapacity(2 + length);
this.buffer[this.offset++] = 0xd9;
this.buffer[this.offset++] = length;
} else if (length <= 0xffff) {
this.ensureCapacity(3 + length);
this.buffer[this.offset++] = 0xda;
this.buffer[this.offset++] = length >> 8;
this.buffer[this.offset++] = length & 0xff;
}
this.buffer.set(encoded, this.offset);
this.offset += length;
}
private encodeArray(arr: unknown[]): void {
const length = arr.length;
if (length <= 15) {
this.ensureCapacity(1);
this.buffer[this.offset++] = 0x90 | length;
} else if (length <= 0xffff) {
this.ensureCapacity(3);
this.buffer[this.offset++] = 0xdc;
this.buffer[this.offset++] = length >> 8;
this.buffer[this.offset++] = length & 0xff;
}
for (const item of arr) {
this.encodeValue(item);
}
}
private encodeObject(obj: Record<string, unknown>): void {
const keys = Object.keys(obj);
const length = keys.length;
if (length <= 15) {
this.ensureCapacity(1);
this.buffer[this.offset++] = 0x80 | length;
} else if (length <= 0xffff) {
this.ensureCapacity(3);
this.buffer[this.offset++] = 0xde;
this.buffer[this.offset++] = length >> 8;
this.buffer[this.offset++] = length & 0xff;
}
for (const key of keys) {
this.encodeString(key);
this.encodeValue(obj[key]);
}
}
}
Designing a Custom Wire Format
When existing formats don't fit, you may need a custom wire format. Here's the design process:
Step 1: Define Requirements
┌─────────────────────────────────────────────────────────────────────────┐
│ CUSTOM WIRE FORMAT REQUIREMENTS CHECKLIST │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ SIZE REQUIREMENTS: │
│ □ Typical message size: _____ bytes │
│ □ Maximum message size: _____ bytes │
│ □ Size budget constraints: _____ bytes/sec bandwidth │
│ □ String-heavy or numeric-heavy? │
│ │
│ PERFORMANCE REQUIREMENTS: │
│ □ Encode latency budget: _____ µs │
│ □ Decode latency budget: _____ µs │
│ □ Allocations allowed during decode? Yes / No │
│ □ Random field access needed? Yes / No │
│ │
│ SCHEMA REQUIREMENTS: │
│ □ Schema known at compile time? Yes / No │
│ □ Forward compatibility needed? Yes / No │
│ □ Backward compatibility needed? Yes / No │
│ □ Self-describing (schema in message)? Yes / No │
│ │
│ IMPLEMENTATION REQUIREMENTS: │
│ □ Languages to support: _____ │
│ □ Code generation acceptable? Yes / No │
│ □ External dependencies allowed? Yes / No │
│ □ Debug/inspection tooling needed? Yes / No │
│ │
│ SPECIAL REQUIREMENTS: │
│ □ Streaming support? Yes / No │
│ □ Partial message parsing? Yes / No │
│ □ Encryption integration? Yes / No │
│ □ Compression integration? Yes / No │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Step 2: Choose Encoding Strategies
// Example: Custom game state protocol
// Requirements: Low latency, small size, no allocations on read
/*
* WIRE FORMAT DESIGN:
*
* Message Header (4 bytes):
* ┌────────────────────────────────────────────────────────────┐
* │ Bits 0-7 │ Message type (0-255 message types) │
* │ Bits 8-15 │ Flags (compressed, encrypted, etc.) │
* │ Bits 16-31 │ Payload length (0-65535 bytes) │
* └────────────────────────────────────────────────────────────┘
*
* Payload: Type-specific, schema-defined fields
*
* Field Encoding (based on type):
* - Integers: Fixed-width (4 bytes for common case)
* - Floats: Fixed-width IEEE 754 (4 bytes)
* - Strings: Length-prefixed (2 bytes length + UTF-8 data)
* - Arrays: Count-prefixed (2 bytes count + elements)
* - Optional: 1-byte presence flag + value if present
*/
const MESSAGE_TYPES = {
PLAYER_POSITION: 0x01,
PLAYER_ACTION: 0x02,
GAME_STATE: 0x03,
CHAT_MESSAGE: 0x04,
} as const;
interface MessageHeader {
type: number;
flags: number;
length: number;
}
class GameProtocol {
private view: DataView;
private buffer: ArrayBuffer;
constructor(bufferSize: number = 4096) {
this.buffer = new ArrayBuffer(bufferSize);
this.view = new DataView(this.buffer);
}
// Zero-allocation position encoding
encodePlayerPosition(
playerId: number,
x: number,
y: number,
z: number,
rotation: number,
timestamp: number
): ArrayBuffer {
// Header: 4 bytes
// Payload: 4 + 4 + 4 + 4 + 4 + 8 = 28 bytes
// Total: 32 bytes
let offset = 0;
// Header
this.view.setUint8(offset++, MESSAGE_TYPES.PLAYER_POSITION);
this.view.setUint8(offset++, 0); // flags
this.view.setUint16(offset, 28, true); offset += 2; // little-endian length
// Payload
this.view.setUint32(offset, playerId, true); offset += 4;
this.view.setFloat32(offset, x, true); offset += 4;
this.view.setFloat32(offset, y, true); offset += 4;
this.view.setFloat32(offset, z, true); offset += 4;
this.view.setFloat32(offset, rotation, true); offset += 4;
this.view.setBigUint64(offset, BigInt(timestamp), true); offset += 8;
return this.buffer.slice(0, offset);
}
// Zero-copy read: returns view into existing buffer
decodePlayerPosition(buffer: ArrayBuffer): {
playerId: number;
x: number;
y: number;
z: number;
rotation: number;
timestamp: bigint;
} {
const view = new DataView(buffer);
let offset = 4; // Skip header
return {
playerId: view.getUint32(offset, true),
x: view.getFloat32(offset + 4, true),
y: view.getFloat32(offset + 8, true),
z: view.getFloat32(offset + 12, true),
rotation: view.getFloat32(offset + 16, true),
timestamp: view.getBigUint64(offset + 20, true),
};
}
}
Step 3: Implement Schema Evolution
// Version-aware decoding with field presence bitmap
/*
* EVOLVED FORMAT with backward compatibility:
*
* Message Header (6 bytes):
* ┌────────────────────────────────────────────────────────────┐
* │ Bits 0-7 │ Message type │
* │ Bits 8-15 │ Schema version (0-255 versions) │
* │ Bits 16-31 │ Field presence bitmap (16 optional fields) │
* │ Bits 32-47 │ Payload length │
* └────────────────────────────────────────────────────────────┘
*/
interface SchemaVersion {
version: number;
fields: FieldDef[];
}
interface FieldDef {
name: string;
type: 'uint32' | 'float32' | 'string' | 'bytes';
optional: boolean;
addedInVersion: number;
}
const PLAYER_POSITION_SCHEMA: SchemaVersion[] = [
{
version: 1,
fields: [
{ name: 'playerId', type: 'uint32', optional: false, addedInVersion: 1 },
{ name: 'x', type: 'float32', optional: false, addedInVersion: 1 },
{ name: 'y', type: 'float32', optional: false, addedInVersion: 1 },
{ name: 'z', type: 'float32', optional: false, addedInVersion: 1 },
],
},
{
version: 2,
fields: [
// All v1 fields plus:
{ name: 'rotation', type: 'float32', optional: true, addedInVersion: 2 },
{ name: 'velocity', type: 'float32', optional: true, addedInVersion: 2 },
],
},
];
function decodeWithSchema(
buffer: ArrayBuffer,
schema: SchemaVersion,
readerVersion: number
): Record<string, unknown> {
const view = new DataView(buffer);
const messageVersion = view.getUint8(1);
const fieldBitmap = view.getUint16(2, true);
const result: Record<string, unknown> = {};
let offset = 6; // After header
for (let i = 0; i < schema.fields.length; i++) {
const field = schema.fields[i];
// Field not present in this message
if (field.optional && !(fieldBitmap & (1 << i))) {
continue;
}
// Field added in newer version than message
if (messageVersion < field.addedInVersion) {
continue;
}
// Decode based on type
switch (field.type) {
case 'uint32':
result[field.name] = view.getUint32(offset, true);
offset += 4;
break;
case 'float32':
result[field.name] = view.getFloat32(offset, true);
offset += 4;
break;
case 'string':
const strLen = view.getUint16(offset, true);
offset += 2;
result[field.name] = new TextDecoder().decode(
new Uint8Array(buffer, offset, strLen)
);
offset += strLen;
break;
}
}
// Skip unknown fields (newer version has more fields)
// The length in header tells us where message ends
return result;
}
Performance Benchmarks
┌─────────────────────────────────────────────────────────────────────────┐
│ SERIALIZATION BENCHMARKS (1M messages) │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Test Message: { id: 12345, name: "Alice", values: [1, 2, 3, 4, 5] } │
│ │
│ FORMAT │ SIZE │ ENCODE │ DECODE │ NOTES │
│ ──────────────┼────────┼───────────┼───────────┼───────────────────── │
│ JSON │ 52B │ 180ms │ 220ms │ Baseline │
│ MessagePack │ 30B │ 85ms │ 95ms │ 42% smaller │
│ Protobuf │ 22B │ 45ms │ 55ms │ 58% smaller │
│ FlatBuffers │ 48B │ 35ms │ 5ms* │ *Zero-copy read │
│ Custom binary │ 20B │ 15ms │ 8ms │ Schema-specific │
│ │
│ * FlatBuffers decode is near-zero because it's pointer arithmetic │
│ Full object materialization would be ~40ms │
│ │
│ TAKEAWAYS: │
│ • JSON is 4-10x slower than binary formats │
│ • Protobuf is best general-purpose choice │
│ • FlatBuffers wins when you don't need all fields │
│ • Custom formats can be fastest but highest effort │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Conclusion
Binary protocol design is about tradeoffs:
-
Size vs Speed: Varints save bytes but cost decode cycles. Fixed-width is faster but larger.
-
Flexibility vs Performance: Self-describing formats (JSON, MessagePack) work anywhere but carry schema overhead. Schema-driven formats (Protobuf, FlatBuffers) are faster but require coordination.
-
Upfront vs Runtime Cost: Code generation moves work to compile time. Reflection-based serialization is flexible but slower.
For most applications, Protocol Buffers is the right default — mature ecosystem, good performance, excellent schema evolution. FlatBuffers when zero-copy matters. MessagePack when you need JSON semantics. Custom formats only when you've profiled and proven the existing options are insufficient.
Real-World Problems & How to Solve Them
Problem 1: A malformed packet causes decoder CPU spikes
Symptom: One bad payload can stall a worker thread with a runaway varint loop.
Root cause: Varint decode logic has no upper bound for continuation bytes.
Fix — enforce maximum varint length and fail fast:
function decodeVarint64Safe(
buffer: Uint8Array,
offset: number
): [bigint, number] {
let value = 0n;
let shift = 0n;
for (let i = 0; i < 10; i++) {
const byte = BigInt(buffer[offset++]);
value |= (byte & 0x7fn) << shift;
if ((byte & 0x80n) === 0n) return [value, offset];
shift += 7n;
}
throw new Error('Invalid varint: exceeds 10 bytes for uint64');
}
Problem 2: Negative numbers bloat message size dramatically
Symptom: Small negative values consume 10 bytes in protobuf-like encoding.
Root cause: Signed fields are encoded as plain varints instead of ZigZag-transformed values.
Fix — use ZigZag for signed integer fields:
function encodeZigZag32(value: number): number {
return (value << 1) ^ (value >> 31);
}
function decodeZigZag32(value: number): number {
return (value >>> 1) ^ -(value & 1);
}
function encodeSint32(value: number): Uint8Array {
return encodeVarint(encodeZigZag32(value));
}
Problem 3: Deploy breaks because old clients can’t parse new messages
Symptom: After a schema update, older services fail to decode or read incorrect fields.
Root cause: Field numbers or wire types were changed for existing fields.
Fix — enforce schema compatibility in CI before publishing:
interface FieldDef {
name: string;
number: number;
wireType: number;
}
function assertCompatibleSchema(oldFields: FieldDef[], nextFields: FieldDef[]): void {
const oldByNumber = new Map(oldFields.map((f) => [f.number, f]));
for (const field of nextFields) {
const previous = oldByNumber.get(field.number);
if (!previous) continue;
if (previous.wireType !== field.wireType) {
throw new Error(`Incompatible change for field #${field.number}`);
}
}
}
Problem 4: Stream parser loses message boundaries
Symptom: Concatenated TCP payloads decode as one corrupt message.
Root cause: Wire format defines field encoding but not transport framing boundaries.
Fix — add explicit length-prefixed framing around each message:
function encodeFrame(message: Uint8Array): Uint8Array {
const frame = new Uint8Array(4 + message.length);
new DataView(frame.buffer).setUint32(0, message.length, true);
frame.set(message, 4);
return frame;
}
function decodeFrames(chunks: Uint8Array[]): Uint8Array[] {
const merged = chunks.reduce((n, c) => n + c.length, 0);
const all = new Uint8Array(merged);
let p = 0;
for (const chunk of chunks) {
all.set(chunk, p);
p += chunk.length;
}
const out: Uint8Array[] = [];
let offset = 0;
while (offset + 4 <= all.length) {
const len = new DataView(all.buffer, all.byteOffset + offset, 4).getUint32(0, true);
if (offset + 4 + len > all.length) break;
out.push(all.subarray(offset + 4, offset + 4 + len));
offset += 4 + len;
}
return out;
}
Problem 5: Cross-language clients decode different numeric values
Symptom: Java backend and JS client disagree on IDs and counters.
Root cause: Endianness assumptions differ between implementations.
Fix — define byte order in spec and centralize read/write helpers:
const LITTLE_ENDIAN = true;
function writeUint32(view: DataView, offset: number, value: number): void {
view.setUint32(offset, value, LITTLE_ENDIAN);
}
function readUint32(view: DataView, offset: number): number {
return view.getUint32(offset, LITTLE_ENDIAN);
}
Problem 6: Gateway drops fields it doesn’t understand
Symptom: New optional fields disappear after passing through older proxy services.
Root cause: Unknown fields are skipped but not preserved for re-encoding.
Fix — store unknown fields and emit them unchanged on reserialize:
interface ParsedMessage {
known: Record<number, Uint8Array>;
unknown: Array<{ tag: number; raw: Uint8Array }>;
}
function reencode(msg: ParsedMessage): Uint8Array {
const chunks: Uint8Array[] = [];
for (const [tag, value] of Object.entries(msg.known)) {
chunks.push(encodeVarint(Number(tag)));
chunks.push(value);
}
for (const field of msg.unknown) {
chunks.push(encodeVarint(field.tag));
chunks.push(field.raw);
}
return concatBytes(chunks);
}
Problem 7: Zero-copy readers return garbage after async boundaries
Symptom: FlatBuffers reads are correct immediately, then corrupted later in async handlers.
Root cause: A pooled network buffer is reused while code still holds zero-copy views.
Fix — parse synchronously or clone buffer before async handoff:
function handoffSafely(frame: Uint8Array): Uint8Array {
// Clone once when crossing async boundary; keep zero-copy for sync parsing path.
return new Uint8Array(frame);
}
async function processAsync(frame: Uint8Array): Promise<void> {
const owned = handoffSafely(frame);
await queueJob(owned);
}
What did you think?