Binary Protocol Design: From Protobuf to Custom Wire Formats

March 22, 202697 min read0 views

binary protocols

performance engineering

backward compatibility

infrastructure engineering

Binary Protocol Design: From Protobuf to Custom Wire Formats

JSON is the lingua franca of web APIs, but it's terrible for performance-critical paths. A 100-byte JSON payload becomes 30 bytes in Protocol Buffers. Parsing JSON is O(n) string processing; parsing protobuf is near-zero-cost field extraction. When you're sending millions of messages per second or operating on constrained devices, wire format design becomes a core architectural decision.

This article dissects how binary protocols work — from the encoding schemes of Protocol Buffers and FlatBuffers to designing your own wire format — with a focus on the tradeoffs that matter: parse speed, encode speed, message size, schema evolution, and implementation complexity.

Why Binary Protocols: The JSON Tax

// JSON encoding of a simple message
const user = {
  id: 12345,
  name: "Alice",
  email: "alice@example.com",
  roles: ["admin", "user"],
  lastLogin: 1709654321
};

const jsonStr = JSON.stringify(user);
// 96 bytes: {"id":12345,"name":"Alice","email":"alice@example.com","roles":["admin","user"],"lastLogin":1709654321}

┌─────────────────────────────────────────────────────────────────────────┐
│              JSON OVERHEAD ANALYSIS                                      │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  STRUCTURAL OVERHEAD:                                                    │
│  • Braces, brackets, colons, commas: 15+ bytes per object               │
│  • Quoted field names repeated every message                            │
│  • Numbers as ASCII: "12345" = 5 bytes vs 2 bytes binary                │
│  • No native binary data (must base64: 33% expansion)                   │
│                                                                          │
│  PARSING OVERHEAD:                                                       │
│  • Full text scan required (can't skip to field N)                      │
│  • Number parsing: atoi for every integer                               │
│  • String allocation for every value                                    │
│  • Object structure rebuilt on every parse                              │
│                                                                          │
│  SCHEMA OVERHEAD:                                                        │
│  • No type information (runtime type checking needed)                   │
│  • Field names transmitted repeatedly                                    │
│  • Optional fields require existence checks                             │
│                                                                          │
│  JSON: 96 bytes                                                          │
│  Protobuf: ~35 bytes (63% smaller)                                      │
│  FlatBuffers: ~60 bytes (buffer overhead, but zero-copy access)        │
│  MessagePack: ~50 bytes (JSON semantics, binary encoding)              │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Protocol Buffers: The Industry Standard

Protocol Buffers (protobuf) is Google's language-neutral, schema-driven serialization format. Understanding its wire format reveals the core principles of efficient binary encoding.

The Wire Format

Every protobuf message is a sequence of key-value pairs:

message = *field
field = key value
key = (field_number << 3) | wire_type  // varint encoded

wire_types:
  0 = VARINT (int32, int64, uint32, uint64, sint32, sint64, bool, enum)
  1 = I64 (fixed64, sfixed64, double)
  2 = LEN (string, bytes, embedded messages, packed repeated fields)
  5 = I32 (fixed32, sfixed32, float)

┌─────────────────────────────────────────────────────────────────────────┐
│              PROTOBUF WIRE FORMAT ENCODING                               │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  message User {                                                          │
│    uint64 id = 1;                                                       │
│    string name = 2;                                                     │
│    repeated string roles = 3;                                           │
│  }                                                                       │
│                                                                          │
│  User { id: 12345, name: "Alice", roles: ["admin", "user"] }           │
│                                                                          │
│  ENCODED BYTES:                                                          │
│  ┌────┬────────┬────┬───────────────┬────┬────────────┬────┬──────────┐│
│  │ 08 │ B9 60  │ 12 │ 05 41 6C 69  │ 1A │ 05 61 64  │ 1A │ 04 75 73 ││
│  │    │        │    │ 63 65        │    │ 6D 69 6E  │    │ 65 72    ││
│  └────┴────────┴────┴───────────────┴────┴────────────┴────┴──────────┘│
│    │      │      │       │           │       │         │       │       │
│    │      │      │       │           │       │         │       │       │
│    │      │      │       │           │       │         └───────┴─"user"│
│    │      │      │       │           │       └─"admin"                 │
│    │      │      │       │           │                                 │
│    │      │      │       │           └─ 1A = field 3, wire type 2 (LEN)│
│    │      │      │       │                                             │
│    │      │      │       └─"Alice" (length-prefixed)                  │
│    │      │      │                                                     │
│    │      │      └─ 12 = field 2 (0010), wire type 2 (010)            │
│    │      │            (2 << 3) | 2 = 18 = 0x12                        │
│    │      │                                                            │
│    │      └─ 12345 as varint: B9 60 (little-endian 7-bit groups)      │
│    │                                                                   │
│    └─ 08 = field 1 (0001), wire type 0 (000)                          │
│         (1 << 3) | 0 = 8 = 0x08                                        │
│                                                                         │
│  VARINT ENCODING (for 12345):                                           │
│  12345 = 0x3039 = 0011 0000 0011 1001                                  │
│  Split into 7-bit groups: 01100 0001111001                             │
│  Add continuation bits: 1_0111001 0_1100000                            │
│  Little-endian: B9 60 (0xB9 = 185, 0x60 = 96)                         │
│  Decode: (0x39) | (0x60 << 7) = 57 | 12288 = 12345 ✓                  │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Varint Encoding Implementation

// Encode unsigned integer to varint bytes
function encodeVarint(value: number): Uint8Array {
  const bytes: number[] = [];

  while (value > 0x7f) {
    // Take lowest 7 bits, set continuation bit (0x80)
    bytes.push((value & 0x7f) | 0x80);
    value >>>= 7;
  }
  // Last byte: no continuation bit
  bytes.push(value & 0x7f);

  return new Uint8Array(bytes);
}

// Decode varint from buffer at offset, returns [value, newOffset]
function decodeVarint(buffer: Uint8Array, offset: number): [number, number] {
  let result = 0;
  let shift = 0;

  while (true) {
    const byte = buffer[offset++];
    result |= (byte & 0x7f) << shift;

    if ((byte & 0x80) === 0) {
      // No continuation bit — done
      return [result, offset];
    }

    shift += 7;
    if (shift >= 35) {
      throw new Error('Varint too long');
    }
  }
}

// Signed integers use ZigZag encoding
// Maps negative numbers to positive: 0 -> 0, -1 -> 1, 1 -> 2, -2 -> 3, ...
function encodeZigZag(value: number): number {
  return (value << 1) ^ (value >> 31);
}

function decodeZigZag(value: number): number {
  return (value >>> 1) ^ -(value & 1);
}

Schema Evolution Rules

Protobuf's key feature is forward and backward compatible schema evolution:

┌─────────────────────────────────────────────────────────────────────────┐
│              PROTOBUF SCHEMA EVOLUTION                                   │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  SAFE CHANGES (backward + forward compatible):                          │
│  ────────────────────────────────────────────                           │
│  ✓ Add new optional/repeated fields (old code ignores unknown fields)  │
│  ✓ Remove optional/repeated fields (new code uses defaults)            │
│  ✓ Rename fields (only field NUMBER matters, name is for humans)       │
│  ✓ Change int32 ↔ int64, uint32 ↔ uint64 (wire-compatible)           │
│  ✓ Add values to enum (old code sees numeric value)                    │
│                                                                          │
│  UNSAFE CHANGES (will break):                                            │
│  ────────────────────────────                                            │
│  ✗ Change field number                                                  │
│  ✗ Change wire type (int32 → string)                                   │
│  ✗ Change repeated ↔ singular (packed encoding differs)               │
│  ✗ Change meaning of field while keeping number                        │
│  ✗ Make optional field required (old data may lack it)                 │
│                                                                          │
│  FIELD NUMBER ALLOCATION STRATEGY:                                       │
│                                                                          │
│  message User {                                                          │
│    // Core fields: 1-15 (1-byte key encoding)                          │
│    uint64 id = 1;                                                       │
│    string name = 2;                                                     │
│    string email = 3;                                                    │
│                                                                          │
│    // Less common fields: 16-2047 (2-byte key encoding)                │
│    optional string phone = 16;                                          │
│    optional bytes avatar = 17;                                          │
│                                                                          │
│    // Reserved for future / deprecated                                  │
│    reserved 100 to 199;                                                 │
│    reserved "old_field_name";                                           │
│                                                                          │
│    // Extensions (if using proto2)                                      │
│    extensions 1000 to max;                                              │
│  }                                                                       │
│                                                                          │
│  Fields 1-15: 1-byte tag (field << 3 | type fits in 1 byte)            │
│  Fields 16-2047: 2-byte tag                                             │
│  Reserve low numbers for frequently-used fields                         │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

FlatBuffers: Zero-Copy Access

FlatBuffers (from Google) takes a different approach: the serialized buffer IS the in-memory representation. No parsing step — you read directly from the buffer.

┌─────────────────────────────────────────────────────────────────────────┐
│              FLATBUFFERS MEMORY LAYOUT                                   │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  schema User {                                                          │
│    id: uint64;                                                          │
│    name: string;                                                        │
│    email: string;                                                       │
│  }                                                                       │
│                                                                          │
│  BUFFER LAYOUT (little-endian):                                         │
│                                                                          │
│  Offset 0    ┌────────────────┐                                        │
│              │ Root Table     │ ← Offset to root table                 │
│              │ Offset (4B)    │                                        │
│  Offset 4    ├────────────────┤                                        │
│              │ File ID (opt)  │ ← Optional file identifier             │
│              │ (4B)           │                                        │
│  Offset 8    ├────────────────┤                                        │
│              │ VTable         │ ← Field offset table                   │
│              │ ┌────────────┐ │                                        │
│              │ │ vtable_size│ │  2 bytes: size of vtable               │
│              │ │ (2B)       │ │                                        │
│              │ ├────────────┤ │                                        │
│              │ │ object_size│ │  2 bytes: size of object               │
│              │ │ (2B)       │ │                                        │
│              │ ├────────────┤ │                                        │
│              │ │ field 0 off│ │  2 bytes: offset of id from obj start │
│              │ ├────────────┤ │                                        │
│              │ │ field 1 off│ │  2 bytes: offset of name offset        │
│              │ ├────────────┤ │                                        │
│              │ │ field 2 off│ │  2 bytes: offset of email offset       │
│              │ └────────────┘ │                                        │
│  Offset N    ├────────────────┤                                        │
│              │ User Object    │                                        │
│              │ ┌────────────┐ │                                        │
│              │ │ vtable off │ │  4 bytes: negative offset to vtable   │
│              │ ├────────────┤ │                                        │
│              │ │ id (8B)    │ │  Inline: 8 bytes for uint64           │
│              │ ├────────────┤ │                                        │
│              │ │ name off   │ │  4 bytes: offset to string            │
│              │ ├────────────┤ │                                        │
│              │ │ email off  │ │  4 bytes: offset to string            │
│              │ └────────────┘ │                                        │
│  Offset M    ├────────────────┤                                        │
│              │ "Alice" string │                                        │
│              │ ┌────────────┐ │                                        │
│              │ │ length (4B)│ │                                        │
│              │ ├────────────┤ │                                        │
│              │ │ "Alice\0"  │ │  null-terminated for C compat         │
│              │ └────────────┘ │                                        │
│              └────────────────┘                                        │
│                                                                          │
│  ACCESS PATTERN (no parsing!):                                          │
│  user.name():                                                           │
│    1. Read vtable offset from object start                             │
│    2. Look up field 1 offset in vtable                                 │
│    3. Add to object position to get string offset location             │
│    4. Read string offset, follow it                                    │
│    5. Read length, return pointer to string data                       │
│                                                                          │
│  All operations are pointer arithmetic + memory reads.                  │
│  No allocations. No copies. O(1) field access.                         │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

FlatBuffers Access Code

// Generated accessor pattern (simplified)
class User {
  private bb: ByteBuffer;
  private bb_pos: number;

  static getRootAsUser(buffer: ByteBuffer): User {
    // Root table offset is at position 0
    const offset = buffer.readInt32(0);
    const user = new User();
    user.bb = buffer;
    user.bb_pos = offset;
    return user;
  }

  // Read vtable offset (stored as negative offset at object start)
  private __offset(vtableOffset: number): number {
    const vtable = this.bb_pos - this.bb.readInt32(this.bb_pos);
    return vtableOffset < this.bb.readInt16(vtable)
      ? this.bb.readInt16(vtable + vtableOffset)
      : 0;
  }

  id(): bigint {
    const offset = this.__offset(4); // Field 0 at vtable offset 4
    return offset ? this.bb.readUint64(this.bb_pos + offset) : 0n;
  }

  name(): string | null {
    const offset = this.__offset(6); // Field 1 at vtable offset 6
    if (!offset) return null;

    // Read the offset to string, then read string at that location
    const stringOffset = this.bb_pos + offset;
    const stringLocation = stringOffset + this.bb.readInt32(stringOffset);
    const length = this.bb.readInt32(stringLocation);

    // Return view into existing buffer — no copy!
    return this.bb.readStringAt(stringLocation + 4, length);
  }
}

When to Use FlatBuffers vs Protobuf

┌─────────────────────────────────────────────────────────────────────────┐
│              FLATBUFFERS vs PROTOBUF DECISION MATRIX                     │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  USE FLATBUFFERS WHEN:                                                   │
│  • Zero-copy access is critical (game state, video frames)              │
│  • Messages are large and you only read a few fields                    │
│  • Memory allocation is expensive (embedded, real-time)                 │
│  • Data is memory-mapped from files                                     │
│  • Same data read multiple times without modification                   │
│                                                                          │
│  USE PROTOBUF WHEN:                                                      │
│  • Message size is critical (protobuf is ~30% smaller)                  │
│  • You need to read most/all fields anyway                              │
│  • You modify data after reading (FlatBuffers are immutable)            │
│  • Streaming / incremental parsing needed                               │
│  • Ecosystem / tooling matters (protobuf is more mature)                │
│                                                                          │
│  PERFORMANCE CHARACTERISTICS:                                            │
│                                                                          │
│  Operation          │ Protobuf      │ FlatBuffers                       │
│  ───────────────────┼───────────────┼───────────────────────────────────│
│  Encode             │ Moderate      │ Faster (just memcpy structured)  │
│  Decode (full)      │ O(n) parse    │ None (zero-copy)                 │
│  Decode (1 field)   │ O(n) scan     │ O(1) pointer chase               │
│  Wire size          │ Smaller       │ Larger (offsets, padding)        │
│  Memory after parse │ Native objects│ Buffer reference                 │
│  Mutation           │ Easy          │ Rebuild required                 │
│  Random access      │ After parse   │ Immediate                        │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

MessagePack: Binary JSON

MessagePack is "JSON but binary" — same data model, smaller encoding:

// MessagePack type prefixes
const FORMATS = {
  // Positive fixint: 0x00 - 0x7f (value is the byte itself)
  POSITIVE_FIXINT_MAX: 0x7f,

  // Fixmap: 0x80 - 0x8f (lower 4 bits = element count)
  FIXMAP: 0x80,

  // Fixarray: 0x90 - 0x9f
  FIXARRAY: 0x90,

  // Fixstr: 0xa0 - 0xbf (lower 5 bits = length)
  FIXSTR: 0xa0,

  // nil, false, true
  NIL: 0xc0,
  FALSE: 0xc2,
  TRUE: 0xc3,

  // Binary data
  BIN8: 0xc4,   // + 1 byte length + data
  BIN16: 0xc5,  // + 2 byte length + data
  BIN32: 0xc6,  // + 4 byte length + data

  // Floats
  FLOAT32: 0xca, // + 4 bytes IEEE 754
  FLOAT64: 0xcb, // + 8 bytes IEEE 754

  // Unsigned integers
  UINT8: 0xcc,   // + 1 byte
  UINT16: 0xcd,  // + 2 bytes
  UINT32: 0xce,  // + 4 bytes
  UINT64: 0xcf,  // + 8 bytes

  // Signed integers
  INT8: 0xd0,
  INT16: 0xd1,
  INT32: 0xd2,
  INT64: 0xd3,

  // Strings
  STR8: 0xd9,   // + 1 byte length
  STR16: 0xda,  // + 2 byte length
  STR32: 0xdb,  // + 4 byte length

  // Arrays
  ARRAY16: 0xdc,
  ARRAY32: 0xdd,

  // Maps
  MAP16: 0xde,
  MAP32: 0xdf,

  // Negative fixint: 0xe0 - 0xff (value is byte - 256)
  NEGATIVE_FIXINT_MIN: 0xe0,
};

┌─────────────────────────────────────────────────────────────────────────┐
│              MESSAGEPACK ENCODING EXAMPLE                                │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Object: { id: 12345, name: "Alice", active: true }                     │
│                                                                          │
│  JSON:       {"id":12345,"name":"Alice","active":true}                  │
│              42 bytes                                                   │
│                                                                          │
│  MessagePack:                                                            │
│  ┌────┬────────────────────────────────────────────────────────────┐    │
│  │ 83 │ fixmap with 3 elements (0x80 | 3)                          │    │
│  ├────┼────────────────────────────────────────────────────────────┤    │
│  │ A2 │ fixstr length 2 (0xA0 | 2)                                 │    │
│  │ 69 │ 'i'                                                        │    │
│  │ 64 │ 'd'                                                        │    │
│  ├────┼────────────────────────────────────────────────────────────┤    │
│  │ CD │ uint16 follows                                             │    │
│  │ 30 │ 12345 >> 8                                                 │    │
│  │ 39 │ 12345 & 0xFF                                               │    │
│  ├────┼────────────────────────────────────────────────────────────┤    │
│  │ A4 │ fixstr length 4                                            │    │
│  │ 6E │ 'n'                                                        │    │
│  │ 61 │ 'a'                                                        │    │
│  │ 6D │ 'm'                                                        │    │
│  │ 65 │ 'e'                                                        │    │
│  ├────┼────────────────────────────────────────────────────────────┤    │
│  │ A5 │ fixstr length 5                                            │    │
│  │ 41 │ 'A'                                                        │    │
│  │ 6C │ 'l'                                                        │    │
│  │ 69 │ 'i'                                                        │    │
│  │ 63 │ 'c'                                                        │    │
│  │ 65 │ 'e'                                                        │    │
│  ├────┼────────────────────────────────────────────────────────────┤    │
│  │ A6 │ fixstr length 6                                            │    │
│  │ 61 │ 'a'                                                        │    │
│  │ 63 │ 'c'                                                        │    │
│  │ 74 │ 't'                                                        │    │
│  │ 69 │ 'i'                                                        │    │
│  │ 76 │ 'v'                                                        │    │
│  │ 65 │ 'e'                                                        │    │
│  ├────┼────────────────────────────────────────────────────────────┤    │
│  │ C3 │ true                                                       │    │
│  └────┴────────────────────────────────────────────────────────────┘    │
│  26 bytes (38% smaller than JSON)                                      │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

MessagePack Implementation

class MessagePackEncoder {
  private buffer: Uint8Array;
  private offset: number = 0;

  constructor(initialSize: number = 256) {
    this.buffer = new Uint8Array(initialSize);
  }

  private ensureCapacity(needed: number): void {
    if (this.offset + needed > this.buffer.length) {
      const newBuffer = new Uint8Array(this.buffer.length * 2);
      newBuffer.set(this.buffer);
      this.buffer = newBuffer;
    }
  }

  encode(value: unknown): Uint8Array {
    this.encodeValue(value);
    return this.buffer.slice(0, this.offset);
  }

  private encodeValue(value: unknown): void {
    if (value === null) {
      this.ensureCapacity(1);
      this.buffer[this.offset++] = 0xc0;
      return;
    }

    if (typeof value === 'boolean') {
      this.ensureCapacity(1);
      this.buffer[this.offset++] = value ? 0xc3 : 0xc2;
      return;
    }

    if (typeof value === 'number') {
      this.encodeNumber(value);
      return;
    }

    if (typeof value === 'string') {
      this.encodeString(value);
      return;
    }

    if (Array.isArray(value)) {
      this.encodeArray(value);
      return;
    }

    if (typeof value === 'object') {
      this.encodeObject(value as Record<string, unknown>);
      return;
    }

    throw new Error(`Unsupported type: ${typeof value}`);
  }

  private encodeNumber(num: number): void {
    if (Number.isInteger(num)) {
      if (num >= 0) {
        if (num <= 0x7f) {
          // Positive fixint
          this.ensureCapacity(1);
          this.buffer[this.offset++] = num;
        } else if (num <= 0xff) {
          this.ensureCapacity(2);
          this.buffer[this.offset++] = 0xcc;
          this.buffer[this.offset++] = num;
        } else if (num <= 0xffff) {
          this.ensureCapacity(3);
          this.buffer[this.offset++] = 0xcd;
          this.buffer[this.offset++] = num >> 8;
          this.buffer[this.offset++] = num & 0xff;
        } else if (num <= 0xffffffff) {
          this.ensureCapacity(5);
          this.buffer[this.offset++] = 0xce;
          new DataView(this.buffer.buffer).setUint32(this.offset, num);
          this.offset += 4;
        }
      } else {
        if (num >= -32) {
          // Negative fixint
          this.ensureCapacity(1);
          this.buffer[this.offset++] = 0x100 + num;
        } else if (num >= -128) {
          this.ensureCapacity(2);
          this.buffer[this.offset++] = 0xd0;
          this.buffer[this.offset++] = num + 256;
        }
        // ... handle larger negative integers
      }
    } else {
      // Float64
      this.ensureCapacity(9);
      this.buffer[this.offset++] = 0xcb;
      new DataView(this.buffer.buffer).setFloat64(this.offset, num);
      this.offset += 8;
    }
  }

  private encodeString(str: string): void {
    const encoded = new TextEncoder().encode(str);
    const length = encoded.length;

    if (length <= 31) {
      this.ensureCapacity(1 + length);
      this.buffer[this.offset++] = 0xa0 | length;
    } else if (length <= 0xff) {
      this.ensureCapacity(2 + length);
      this.buffer[this.offset++] = 0xd9;
      this.buffer[this.offset++] = length;
    } else if (length <= 0xffff) {
      this.ensureCapacity(3 + length);
      this.buffer[this.offset++] = 0xda;
      this.buffer[this.offset++] = length >> 8;
      this.buffer[this.offset++] = length & 0xff;
    }

    this.buffer.set(encoded, this.offset);
    this.offset += length;
  }

  private encodeArray(arr: unknown[]): void {
    const length = arr.length;

    if (length <= 15) {
      this.ensureCapacity(1);
      this.buffer[this.offset++] = 0x90 | length;
    } else if (length <= 0xffff) {
      this.ensureCapacity(3);
      this.buffer[this.offset++] = 0xdc;
      this.buffer[this.offset++] = length >> 8;
      this.buffer[this.offset++] = length & 0xff;
    }

    for (const item of arr) {
      this.encodeValue(item);
    }
  }

  private encodeObject(obj: Record<string, unknown>): void {
    const keys = Object.keys(obj);
    const length = keys.length;

    if (length <= 15) {
      this.ensureCapacity(1);
      this.buffer[this.offset++] = 0x80 | length;
    } else if (length <= 0xffff) {
      this.ensureCapacity(3);
      this.buffer[this.offset++] = 0xde;
      this.buffer[this.offset++] = length >> 8;
      this.buffer[this.offset++] = length & 0xff;
    }

    for (const key of keys) {
      this.encodeString(key);
      this.encodeValue(obj[key]);
    }
  }
}

Designing a Custom Wire Format

When existing formats don't fit, you may need a custom wire format. Here's the design process:

Step 1: Define Requirements

┌─────────────────────────────────────────────────────────────────────────┐
│              CUSTOM WIRE FORMAT REQUIREMENTS CHECKLIST                   │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  SIZE REQUIREMENTS:                                                      │
│  □ Typical message size: _____ bytes                                   │
│  □ Maximum message size: _____ bytes                                   │
│  □ Size budget constraints: _____ bytes/sec bandwidth                  │
│  □ String-heavy or numeric-heavy?                                       │
│                                                                          │
│  PERFORMANCE REQUIREMENTS:                                               │
│  □ Encode latency budget: _____ µs                                     │
│  □ Decode latency budget: _____ µs                                     │
│  □ Allocations allowed during decode? Yes / No                         │
│  □ Random field access needed? Yes / No                                 │
│                                                                          │
│  SCHEMA REQUIREMENTS:                                                    │
│  □ Schema known at compile time? Yes / No                               │
│  □ Forward compatibility needed? Yes / No                               │
│  □ Backward compatibility needed? Yes / No                              │
│  □ Self-describing (schema in message)? Yes / No                       │
│                                                                          │
│  IMPLEMENTATION REQUIREMENTS:                                            │
│  □ Languages to support: _____                                          │
│  □ Code generation acceptable? Yes / No                                 │
│  □ External dependencies allowed? Yes / No                              │
│  □ Debug/inspection tooling needed? Yes / No                           │
│                                                                          │
│  SPECIAL REQUIREMENTS:                                                   │
│  □ Streaming support? Yes / No                                          │
│  □ Partial message parsing? Yes / No                                    │
│  □ Encryption integration? Yes / No                                     │
│  □ Compression integration? Yes / No                                    │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Step 2: Choose Encoding Strategies

// Example: Custom game state protocol
// Requirements: Low latency, small size, no allocations on read

/*
 * WIRE FORMAT DESIGN:
 *
 * Message Header (4 bytes):
 * ┌────────────────────────────────────────────────────────────┐
 * │ Bits 0-7   │ Message type (0-255 message types)           │
 * │ Bits 8-15  │ Flags (compressed, encrypted, etc.)          │
 * │ Bits 16-31 │ Payload length (0-65535 bytes)               │
 * └────────────────────────────────────────────────────────────┘
 *
 * Payload: Type-specific, schema-defined fields
 *
 * Field Encoding (based on type):
 * - Integers: Fixed-width (4 bytes for common case)
 * - Floats: Fixed-width IEEE 754 (4 bytes)
 * - Strings: Length-prefixed (2 bytes length + UTF-8 data)
 * - Arrays: Count-prefixed (2 bytes count + elements)
 * - Optional: 1-byte presence flag + value if present
 */

const MESSAGE_TYPES = {
  PLAYER_POSITION: 0x01,
  PLAYER_ACTION: 0x02,
  GAME_STATE: 0x03,
  CHAT_MESSAGE: 0x04,
} as const;

interface MessageHeader {
  type: number;
  flags: number;
  length: number;
}

class GameProtocol {
  private view: DataView;
  private buffer: ArrayBuffer;

  constructor(bufferSize: number = 4096) {
    this.buffer = new ArrayBuffer(bufferSize);
    this.view = new DataView(this.buffer);
  }

  // Zero-allocation position encoding
  encodePlayerPosition(
    playerId: number,
    x: number,
    y: number,
    z: number,
    rotation: number,
    timestamp: number
  ): ArrayBuffer {
    // Header: 4 bytes
    // Payload: 4 + 4 + 4 + 4 + 4 + 8 = 28 bytes
    // Total: 32 bytes

    let offset = 0;

    // Header
    this.view.setUint8(offset++, MESSAGE_TYPES.PLAYER_POSITION);
    this.view.setUint8(offset++, 0); // flags
    this.view.setUint16(offset, 28, true); offset += 2; // little-endian length

    // Payload
    this.view.setUint32(offset, playerId, true); offset += 4;
    this.view.setFloat32(offset, x, true); offset += 4;
    this.view.setFloat32(offset, y, true); offset += 4;
    this.view.setFloat32(offset, z, true); offset += 4;
    this.view.setFloat32(offset, rotation, true); offset += 4;
    this.view.setBigUint64(offset, BigInt(timestamp), true); offset += 8;

    return this.buffer.slice(0, offset);
  }

  // Zero-copy read: returns view into existing buffer
  decodePlayerPosition(buffer: ArrayBuffer): {
    playerId: number;
    x: number;
    y: number;
    z: number;
    rotation: number;
    timestamp: bigint;
  } {
    const view = new DataView(buffer);
    let offset = 4; // Skip header

    return {
      playerId: view.getUint32(offset, true),
      x: view.getFloat32(offset + 4, true),
      y: view.getFloat32(offset + 8, true),
      z: view.getFloat32(offset + 12, true),
      rotation: view.getFloat32(offset + 16, true),
      timestamp: view.getBigUint64(offset + 20, true),
    };
  }
}

Step 3: Implement Schema Evolution

// Version-aware decoding with field presence bitmap

/*
 * EVOLVED FORMAT with backward compatibility:
 *
 * Message Header (6 bytes):
 * ┌────────────────────────────────────────────────────────────┐
 * │ Bits 0-7   │ Message type                                 │
 * │ Bits 8-15  │ Schema version (0-255 versions)              │
 * │ Bits 16-31 │ Field presence bitmap (16 optional fields)   │
 * │ Bits 32-47 │ Payload length                               │
 * └────────────────────────────────────────────────────────────┘
 */

interface SchemaVersion {
  version: number;
  fields: FieldDef[];
}

interface FieldDef {
  name: string;
  type: 'uint32' | 'float32' | 'string' | 'bytes';
  optional: boolean;
  addedInVersion: number;
}

const PLAYER_POSITION_SCHEMA: SchemaVersion[] = [
  {
    version: 1,
    fields: [
      { name: 'playerId', type: 'uint32', optional: false, addedInVersion: 1 },
      { name: 'x', type: 'float32', optional: false, addedInVersion: 1 },
      { name: 'y', type: 'float32', optional: false, addedInVersion: 1 },
      { name: 'z', type: 'float32', optional: false, addedInVersion: 1 },
    ],
  },
  {
    version: 2,
    fields: [
      // All v1 fields plus:
      { name: 'rotation', type: 'float32', optional: true, addedInVersion: 2 },
      { name: 'velocity', type: 'float32', optional: true, addedInVersion: 2 },
    ],
  },
];

function decodeWithSchema(
  buffer: ArrayBuffer,
  schema: SchemaVersion,
  readerVersion: number
): Record<string, unknown> {
  const view = new DataView(buffer);
  const messageVersion = view.getUint8(1);
  const fieldBitmap = view.getUint16(2, true);

  const result: Record<string, unknown> = {};
  let offset = 6; // After header

  for (let i = 0; i < schema.fields.length; i++) {
    const field = schema.fields[i];

    // Field not present in this message
    if (field.optional && !(fieldBitmap & (1 << i))) {
      continue;
    }

    // Field added in newer version than message
    if (messageVersion < field.addedInVersion) {
      continue;
    }

    // Decode based on type
    switch (field.type) {
      case 'uint32':
        result[field.name] = view.getUint32(offset, true);
        offset += 4;
        break;
      case 'float32':
        result[field.name] = view.getFloat32(offset, true);
        offset += 4;
        break;
      case 'string':
        const strLen = view.getUint16(offset, true);
        offset += 2;
        result[field.name] = new TextDecoder().decode(
          new Uint8Array(buffer, offset, strLen)
        );
        offset += strLen;
        break;
    }
  }

  // Skip unknown fields (newer version has more fields)
  // The length in header tells us where message ends

  return result;
}

Performance Benchmarks

┌─────────────────────────────────────────────────────────────────────────┐
│              SERIALIZATION BENCHMARKS (1M messages)                      │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Test Message: { id: 12345, name: "Alice", values: [1, 2, 3, 4, 5] }   │
│                                                                          │
│  FORMAT        │ SIZE   │ ENCODE    │ DECODE    │ NOTES                 │
│  ──────────────┼────────┼───────────┼───────────┼─────────────────────  │
│  JSON          │ 52B    │ 180ms     │ 220ms     │ Baseline              │
│  MessagePack   │ 30B    │ 85ms      │ 95ms      │ 42% smaller           │
│  Protobuf      │ 22B    │ 45ms      │ 55ms      │ 58% smaller           │
│  FlatBuffers   │ 48B    │ 35ms      │ 5ms*      │ *Zero-copy read       │
│  Custom binary │ 20B    │ 15ms      │ 8ms       │ Schema-specific       │
│                                                                          │
│  * FlatBuffers decode is near-zero because it's pointer arithmetic     │
│    Full object materialization would be ~40ms                          │
│                                                                          │
│  TAKEAWAYS:                                                              │
│  • JSON is 4-10x slower than binary formats                             │
│  • Protobuf is best general-purpose choice                              │
│  • FlatBuffers wins when you don't need all fields                      │
│  • Custom formats can be fastest but highest effort                     │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Conclusion

Binary protocol design is about tradeoffs:

Size vs Speed: Varints save bytes but cost decode cycles. Fixed-width is faster but larger.
Flexibility vs Performance: Self-describing formats (JSON, MessagePack) work anywhere but carry schema overhead. Schema-driven formats (Protobuf, FlatBuffers) are faster but require coordination.
Upfront vs Runtime Cost: Code generation moves work to compile time. Reflection-based serialization is flexible but slower.

For most applications, Protocol Buffers is the right default — mature ecosystem, good performance, excellent schema evolution. FlatBuffers when zero-copy matters. MessagePack when you need JSON semantics. Custom formats only when you've profiled and proven the existing options are insufficient.

Real-World Problems & How to Solve Them

Problem 1: A malformed packet causes decoder CPU spikes

Symptom: One bad payload can stall a worker thread with a runaway varint loop.

Root cause: Varint decode logic has no upper bound for continuation bytes.

Fix — enforce maximum varint length and fail fast:

function decodeVarint64Safe(
  buffer: Uint8Array,
  offset: number
): [bigint, number] {
  let value = 0n;
  let shift = 0n;

  for (let i = 0; i < 10; i++) {
    const byte = BigInt(buffer[offset++]);
    value |= (byte & 0x7fn) << shift;
    if ((byte & 0x80n) === 0n) return [value, offset];
    shift += 7n;
  }

  throw new Error('Invalid varint: exceeds 10 bytes for uint64');
}

Problem 2: Negative numbers bloat message size dramatically

Symptom: Small negative values consume 10 bytes in protobuf-like encoding.

Root cause: Signed fields are encoded as plain varints instead of ZigZag-transformed values.

Fix — use ZigZag for signed integer fields:

function encodeZigZag32(value: number): number {
  return (value << 1) ^ (value >> 31);
}

function decodeZigZag32(value: number): number {
  return (value >>> 1) ^ -(value & 1);
}

function encodeSint32(value: number): Uint8Array {
  return encodeVarint(encodeZigZag32(value));
}

Problem 3: Deploy breaks because old clients can’t parse new messages

Symptom: After a schema update, older services fail to decode or read incorrect fields.

Root cause: Field numbers or wire types were changed for existing fields.

Fix — enforce schema compatibility in CI before publishing:

interface FieldDef {
  name: string;
  number: number;
  wireType: number;
}

function assertCompatibleSchema(oldFields: FieldDef[], nextFields: FieldDef[]): void {
  const oldByNumber = new Map(oldFields.map((f) => [f.number, f]));

  for (const field of nextFields) {
    const previous = oldByNumber.get(field.number);
    if (!previous) continue;
    if (previous.wireType !== field.wireType) {
      throw new Error(`Incompatible change for field #${field.number}`);
    }
  }
}

Problem 4: Stream parser loses message boundaries

Symptom: Concatenated TCP payloads decode as one corrupt message.

Root cause: Wire format defines field encoding but not transport framing boundaries.

Fix — add explicit length-prefixed framing around each message:

function encodeFrame(message: Uint8Array): Uint8Array {
  const frame = new Uint8Array(4 + message.length);
  new DataView(frame.buffer).setUint32(0, message.length, true);
  frame.set(message, 4);
  return frame;
}

function decodeFrames(chunks: Uint8Array[]): Uint8Array[] {
  const merged = chunks.reduce((n, c) => n + c.length, 0);
  const all = new Uint8Array(merged);
  let p = 0;
  for (const chunk of chunks) {
    all.set(chunk, p);
    p += chunk.length;
  }

  const out: Uint8Array[] = [];
  let offset = 0;
  while (offset + 4 <= all.length) {
    const len = new DataView(all.buffer, all.byteOffset + offset, 4).getUint32(0, true);
    if (offset + 4 + len > all.length) break;
    out.push(all.subarray(offset + 4, offset + 4 + len));
    offset += 4 + len;
  }
  return out;
}

Problem 5: Cross-language clients decode different numeric values

Symptom: Java backend and JS client disagree on IDs and counters.

Root cause: Endianness assumptions differ between implementations.

Fix — define byte order in spec and centralize read/write helpers:

const LITTLE_ENDIAN = true;

function writeUint32(view: DataView, offset: number, value: number): void {
  view.setUint32(offset, value, LITTLE_ENDIAN);
}

function readUint32(view: DataView, offset: number): number {
  return view.getUint32(offset, LITTLE_ENDIAN);
}

Problem 6: Gateway drops fields it doesn’t understand

Symptom: New optional fields disappear after passing through older proxy services.

Root cause: Unknown fields are skipped but not preserved for re-encoding.

Fix — store unknown fields and emit them unchanged on reserialize:

interface ParsedMessage {
  known: Record<number, Uint8Array>;
  unknown: Array<{ tag: number; raw: Uint8Array }>;
}

function reencode(msg: ParsedMessage): Uint8Array {
  const chunks: Uint8Array[] = [];
  for (const [tag, value] of Object.entries(msg.known)) {
    chunks.push(encodeVarint(Number(tag)));
    chunks.push(value);
  }
  for (const field of msg.unknown) {
    chunks.push(encodeVarint(field.tag));
    chunks.push(field.raw);
  }
  return concatBytes(chunks);
}

Problem 7: Zero-copy readers return garbage after async boundaries

Symptom: FlatBuffers reads are correct immediately, then corrupted later in async handlers.

Root cause: A pooled network buffer is reused while code still holds zero-copy views.

Fix — parse synchronously or clone buffer before async handoff:

function handoffSafely(frame: Uint8Array): Uint8Array {
  // Clone once when crossing async boundary; keep zero-copy for sync parsing path.
  return new Uint8Array(frame);
}

async function processAsync(frame: Uint8Array): Promise<void> {
  const owned = handoffSafely(frame);
  await queueJob(owned);
}

What did you think?