Frontend Logging Pipelines at Scale

June 19, 2026142 min read0 views

frontend logging

observability

log pipelines

telemetry

frontend architecture

Frontend Logging Pipelines at Scale

Introduction

Backend logging is a solved problem. Structured logs, centralized aggregation, retention policies—every ops team knows the playbook. Frontend logging at scale is a different beast entirely.

You're not logging from a dozen servers in a controlled datacenter. You're logging from millions of browsers across every network condition, device type, and geographic location imaginable. The volume is staggering: a single page view might generate 50 log entries across performance events, errors, network requests, and user interactions. Multiply by 200 million page views per day, and you're looking at 10 billion log entries daily.

This deep dive examines how to build frontend logging pipelines that can handle this scale: from the browser-side SDK that must minimize performance impact while maximizing signal, to the ingestion layer that must handle burst traffic without dropping data, to the storage and query systems that make petabytes of logs actually useful for debugging production issues.

Scale Context

Production frontend logging we're architecting for:

Metric	Value
Daily Page Views	200M
Logs per Page View	30-100
Raw Log Events per Day	10B+
Compressed Log Volume	5-20TB/day
Peak Events per Second	500K
Log Entry P50 Size	500 bytes
Log Entry P99 Size	5KB
Source Map Lookups per Day	50M
Active Sessions per Hour	2M
Log Retention (hot)	7 days
Log Retention (warm)	30 days
Log Retention (cold)	1 year
Query P95 Latency (hot data)	<5 seconds

At this scale, every byte matters. Every unnecessary log is millions of wasted dollars.

Log Classification

Frontend Log Types

┌─────────────────────────────────────────────────────────────────────────────┐
│                    FRONTEND LOG TAXONOMY                                     │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  ERROR LOGS (High Priority, Always Capture)                          │    │
│  │                                                                      │    │
│  │  • JavaScript Exceptions                                             │    │
│  │    - Uncaught errors (window.onerror)                               │    │
│  │    - Unhandled promise rejections                                   │    │
│  │    - React error boundaries                                         │    │
│  │                                                                      │    │
│  │  • Network Failures                                                  │    │
│  │    - API 4xx/5xx responses                                          │    │
│  │    - Network timeouts                                               │    │
│  │    - CORS failures                                                   │    │
│  │                                                                      │    │
│  │  • Application Errors                                                │    │
│  │    - Business logic errors                                          │    │
│  │    - Validation failures                                            │    │
│  │    - State inconsistencies                                          │    │
│  │                                                                      │    │
│  │  Volume: ~1% of total logs                                          │    │
│  │  Sampling: 100% (never sample errors)                               │    │
│  │                                                                      │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  PERFORMANCE LOGS (Medium Priority, Sample Allowed)                  │    │
│  │                                                                      │    │
│  │  • Core Web Vitals                                                   │    │
│  │    - LCP, FID/INP, CLS events                                       │    │
│  │    - Navigation timing                                              │    │
│  │                                                                      │    │
│  │  • Resource Timing                                                   │    │
│  │    - Individual resource loads                                      │    │
│  │    - Critical path analysis                                         │    │
│  │                                                                      │    │
│  │  • Long Tasks                                                        │    │
│  │    - Main thread blocking                                           │    │
│  │    - JS execution times                                             │    │
│  │                                                                      │    │
│  │  Volume: ~20% of total logs                                         │    │
│  │  Sampling: 10-50% depending on page type                            │    │
│  │                                                                      │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  BEHAVIORAL LOGS (Lower Priority, Aggressive Sampling)              │    │
│  │                                                                      │    │
│  │  • User Interactions                                                 │    │
│  │    - Clicks, scrolls, form interactions                            │    │
│  │    - Navigation events                                              │    │
│  │    - Feature usage                                                   │    │
│  │                                                                      │    │
│  │  • Session Events                                                    │    │
│  │    - Page views                                                     │    │
│  │    - Session start/end                                              │    │
│  │    - Tab visibility changes                                         │    │
│  │                                                                      │    │
│  │  Volume: ~60% of total logs                                         │    │
│  │  Sampling: 1-10% (session-level sampling)                           │    │
│  │                                                                      │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  DEBUG LOGS (Conditional, Feature-Flag Controlled)                   │    │
│  │                                                                      │    │
│  │  • Verbose Logging                                                   │    │
│  │    - State changes                                                   │    │
│  │    - API request/response bodies                                    │    │
│  │    - Internal function traces                                       │    │
│  │                                                                      │    │
│  │  Volume: Potentially huge                                           │    │
│  │  Sampling: 0% normally, 100% for flagged sessions                   │    │
│  │                                                                      │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Log Schema Design

// Production frontend log schema

interface FrontendLogEntry {
  // Identity
  id: string;                     // Unique log ID
  timestamp: number;              // Unix ms
  type: LogType;                  // error | performance | interaction | debug

  // Context (added to every log)
  session: SessionContext;
  page: PageContext;
  device: DeviceContext;

  // Payload (varies by type)
  payload: ErrorPayload | PerformancePayload | InteractionPayload | DebugPayload;

  // Metadata
  version: string;                // SDK version
  appVersion: string;             // App version/commit
  sampling: SamplingInfo;
}

interface SessionContext {
  id: string;                     // Session ID
  userId?: string;                // If authenticated
  startTime: number;
  pageViews: number;
  isNewUser: boolean;
}

interface PageContext {
  url: string;                    // Current URL (sanitized)
  path: string;                   // URL path only
  referrer?: string;
  title: string;
  loadTime: number;               // Time since navigation
}

interface DeviceContext {
  userAgent: string;
  browser: string;                // Parsed browser name
  browserVersion: string;
  os: string;
  osVersion: string;
  deviceType: 'mobile' | 'tablet' | 'desktop';
  viewport: { width: number; height: number };
  connection?: {
    type: string;                 // 4g, 3g, wifi, etc.
    effectiveType: string;
    downlink?: number;
    rtt?: number;
  };
  memory?: number;                // Device memory GB
  cores?: number;                 // Hardware concurrency
}

interface ErrorPayload {
  type: 'uncaught' | 'promise' | 'network' | 'custom';
  message: string;
  stack?: string;
  filename?: string;
  lineno?: number;
  colno?: number;
  componentStack?: string;        // React component stack
  breadcrumbs: Breadcrumb[];
  tags: Record<string, string>;
  extra: Record<string, unknown>;
}

interface PerformancePayload {
  metric: string;                 // LCP, FID, CLS, etc.
  value: number;
  rating: 'good' | 'needs-improvement' | 'poor';
  attribution?: Record<string, unknown>;
  resources?: ResourceEntry[];    // Relevant resources
}

interface Breadcrumb {
  timestamp: number;
  category: string;               // ui.click, fetch, console, navigation
  message: string;
  level: 'info' | 'warning' | 'error';
  data?: Record<string, unknown>;
}

// Compact wire format (minimized for transmission)
interface CompactLogEntry {
  i: string;   // id
  t: number;   // timestamp
  y: number;   // type (enum)
  s: string;   // session id
  u?: string;  // user id
  p: string;   // page path
  d: number;   // device type (enum)
  v: unknown;  // payload (varies)
}

// Compression: Full schema ~2KB → Compact ~400 bytes → Gzip ~150 bytes

Client-Side Collection

SDK Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                    FRONTEND LOGGING SDK ARCHITECTURE                         │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                        COLLECTORS                                    │    │
│  │                                                                      │    │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐              │    │
│  │  │    Error     │  │ Performance  │  │  Interaction │              │    │
│  │  │  Collector   │  │  Collector   │  │  Collector   │              │    │
│  │  │              │  │              │  │              │              │    │
│  │  │ • onerror    │  │ • Perf Obs   │  │ • Click      │              │    │
│  │  │ • rejection  │  │ • Nav Timing │  │ • Scroll     │              │    │
│  │  │ • network    │  │ • Resource   │  │ • Input      │              │    │
│  │  │ • console    │  │ • Long Task  │  │ • Navigation │              │    │
│  │  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘              │    │
│  │         │                 │                 │                       │    │
│  │         └─────────────────┼─────────────────┘                       │    │
│  │                           │                                         │    │
│  │                           ▼                                         │    │
│  │  ┌─────────────────────────────────────────────────────────────┐   │    │
│  │  │                     PROCESSOR                                │   │    │
│  │  │                                                              │   │    │
│  │  │  • Sampling decision                                         │   │    │
│  │  │  • Context enrichment                                        │   │    │
│  │  │  • PII scrubbing                                             │   │    │
│  │  │  • Deduplication                                             │   │    │
│  │  │  • Rate limiting                                             │   │    │
│  │  │                                                              │   │    │
│  │  └─────────────────────────────────────────────────────────────┘   │    │
│  │                           │                                         │    │
│  │                           ▼                                         │    │
│  │  ┌─────────────────────────────────────────────────────────────┐   │    │
│  │  │                      BUFFER                                  │   │    │
│  │  │                                                              │   │    │
│  │  │  ┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐         │   │    │
│  │  │  │ Log │ Log │ Log │ Log │ Log │ Log │ Log │ ... │         │   │    │
│  │  │  └─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘         │   │    │
│  │  │                                                              │   │    │
│  │  │  Max size: 100 entries or 50KB                              │   │    │
│  │  │  Max age: 10 seconds                                         │   │    │
│  │  │  Overflow: Drop oldest (FIFO)                                │   │    │
│  │  │                                                              │   │    │
│  │  └─────────────────────────────────────────────────────────────┘   │    │
│  │                           │                                         │    │
│  │                           ▼                                         │    │
│  │  ┌─────────────────────────────────────────────────────────────┐   │    │
│  │  │                    TRANSPORT                                 │   │    │
│  │  │                                                              │   │    │
│  │  │  Priority order:                                             │   │    │
│  │  │  1. sendBeacon (page unload)                                │   │    │
│  │  │  2. fetch + keepalive (normal)                              │   │    │
│  │  │  3. XHR (fallback)                                           │   │    │
│  │  │                                                              │   │    │
│  │  │  Features:                                                   │   │    │
│  │  │  • Compression (gzip/brotli)                                │   │    │
│  │  │  • Retry with backoff                                        │   │    │
│  │  │  • Offline queue (IndexedDB)                                │   │    │
│  │  │                                                              │   │    │
│  │  └─────────────────────────────────────────────────────────────┘   │    │
│  │                                                                      │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

SDK Implementation

// Production logging SDK

interface LoggerConfig {
  endpoint: string;
  apiKey: string;
  appVersion: string;

  // Sampling
  errorSampleRate: number;        // Default 1.0
  perfSampleRate: number;         // Default 0.1
  interactionSampleRate: number;  // Default 0.01

  // Batching
  batchSize: number;              // Default 100
  flushInterval: number;          // Default 10000ms
  maxBufferSize: number;          // Default 500

  // Privacy
  scrubFields: string[];
  blockSelectors: string[];

  // Feature flags
  enableConsoleCapture: boolean;
  enableNetworkCapture: boolean;
  enableOfflineQueue: boolean;
}

class FrontendLogger {
  private config: LoggerConfig;
  private buffer: CompactLogEntry[] = [];
  private sessionContext: SessionContext;
  private deviceContext: DeviceContext;
  private flushTimer: number | null = null;
  private offlineDb: IDBDatabase | null = null;

  constructor(config: LoggerConfig) {
    this.config = config;
    this.initialize();
  }

  private async initialize(): Promise<void> {
    this.sessionContext = this.initSession();
    this.deviceContext = this.getDeviceContext();

    this.setupCollectors();
    this.startFlushTimer();

    if (this.config.enableOfflineQueue) {
      await this.initOfflineDb();
      await this.flushOfflineQueue();
    }

    this.setupLifecycleHandlers();
  }

  // Public API
  error(message: string, extra?: Record<string, unknown>): void {
    this.log('error', {
      type: 'custom',
      message,
      stack: new Error().stack,
      extra
    });
  }

  info(message: string, data?: Record<string, unknown>): void {
    if (!this.shouldSample('interaction')) return;

    this.log('debug', {
      level: 'info',
      message,
      data
    });
  }

  private log(type: LogType, payload: unknown): void {
    const entry = this.createLogEntry(type, payload);

    // Apply processing
    const processed = this.process(entry);
    if (!processed) return;

    // Add to buffer
    this.addToBuffer(processed);

    // Immediate flush for errors
    if (type === 'error') {
      this.flush();
    }
  }

  private createLogEntry(type: LogType, payload: unknown): CompactLogEntry {
    return {
      i: this.generateId(),
      t: Date.now(),
      y: this.typeToEnum(type),
      s: this.sessionContext.id,
      u: this.sessionContext.userId,
      p: window.location.pathname,
      d: this.deviceTypeToEnum(this.deviceContext.deviceType),
      v: payload
    };
  }

  private process(entry: CompactLogEntry): CompactLogEntry | null {
    // Rate limiting
    if (!this.checkRateLimit(entry)) return null;

    // Deduplication
    if (this.isDuplicate(entry)) return null;

    // PII scrubbing
    entry.v = this.scrubPII(entry.v);

    return entry;
  }

  private addToBuffer(entry: CompactLogEntry): void {
    this.buffer.push(entry);

    // Overflow protection
    if (this.buffer.length > this.config.maxBufferSize) {
      // Drop oldest entries
      this.buffer = this.buffer.slice(-this.config.batchSize);
    }

    // Flush if batch size reached
    if (this.buffer.length >= this.config.batchSize) {
      this.flush();
    }
  }

  private async flush(): Promise<void> {
    if (this.buffer.length === 0) return;

    const batch = this.buffer.splice(0, this.config.batchSize);
    const payload = this.serializeBatch(batch);

    try {
      const success = await this.send(payload);

      if (!success && this.config.enableOfflineQueue) {
        // Store for retry
        await this.storeOffline(batch);
      }
    } catch (error) {
      // Store for retry
      if (this.config.enableOfflineQueue) {
        await this.storeOffline(batch);
      }
    }
  }

  private serializeBatch(batch: CompactLogEntry[]): ArrayBuffer {
    const json = JSON.stringify({
      logs: batch,
      meta: {
        sdk: '1.0.0',
        app: this.config.appVersion,
        device: this.deviceContext
      }
    });

    return this.compress(json);
  }

  private async compress(data: string): Promise<ArrayBuffer> {
    if ('CompressionStream' in window) {
      const encoder = new TextEncoder();
      const stream = new CompressionStream('gzip');
      const writer = stream.writable.getWriter();

      writer.write(encoder.encode(data));
      writer.close();

      const reader = stream.readable.getReader();
      const chunks: Uint8Array[] = [];

      while (true) {
        const { done, value } = await reader.read();
        if (done) break;
        chunks.push(value);
      }

      const totalLength = chunks.reduce((acc, chunk) => acc + chunk.length, 0);
      const result = new Uint8Array(totalLength);
      let offset = 0;
      for (const chunk of chunks) {
        result.set(chunk, offset);
        offset += chunk.length;
      }

      return result.buffer;
    }

    // Fallback: no compression
    return new TextEncoder().encode(data).buffer;
  }

  private async send(payload: ArrayBuffer): Promise<boolean> {
    // Prefer sendBeacon for reliability
    if (navigator.sendBeacon && payload.byteLength < 65536) {
      const blob = new Blob([payload], { type: 'application/octet-stream' });
      return navigator.sendBeacon(this.config.endpoint, blob);
    }

    // fetch with keepalive
    try {
      const response = await fetch(this.config.endpoint, {
        method: 'POST',
        headers: {
          'Content-Type': 'application/octet-stream',
          'Content-Encoding': 'gzip',
          'X-API-Key': this.config.apiKey
        },
        body: payload,
        keepalive: true
      });

      return response.ok;
    } catch {
      return false;
    }
  }

  private setupLifecycleHandlers(): void {
    // Flush on page hide (more reliable than unload)
    document.addEventListener('visibilitychange', () => {
      if (document.visibilityState === 'hidden') {
        this.flush();
      }
    });

    // Backup: pagehide
    window.addEventListener('pagehide', () => {
      this.flush();
    });
  }

  // Sampling
  private shouldSample(type: string): boolean {
    const rate = this.getSampleRate(type);

    // Session-level sampling for consistency
    const sessionHash = this.hashString(this.sessionContext.id + type);
    return sessionHash < rate;
  }

  private getSampleRate(type: string): number {
    switch (type) {
      case 'error': return this.config.errorSampleRate;
      case 'performance': return this.config.perfSampleRate;
      case 'interaction': return this.config.interactionSampleRate;
      default: return 0.1;
    }
  }

  private hashString(str: string): number {
    let hash = 0;
    for (let i = 0; i < str.length; i++) {
      hash = ((hash << 5) - hash) + str.charCodeAt(i);
      hash = hash & hash;
    }
    return Math.abs(hash) / 2147483647; // Normalize to 0-1
  }

  // PII Scrubbing
  private scrubPII(value: unknown): unknown {
    if (typeof value === 'string') {
      return this.scrubString(value);
    }

    if (typeof value === 'object' && value !== null) {
      const scrubbed: Record<string, unknown> = {};
      for (const [key, val] of Object.entries(value)) {
        if (this.config.scrubFields.includes(key.toLowerCase())) {
          scrubbed[key] = '[REDACTED]';
        } else {
          scrubbed[key] = this.scrubPII(val);
        }
      }
      return scrubbed;
    }

    return value;
  }

  private scrubString(str: string): string {
    // Email pattern
    str = str.replace(/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g, '[EMAIL]');

    // Credit card pattern
    str = str.replace(/\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b/g, '[CARD]');

    // SSN pattern
    str = str.replace(/\b\d{3}-\d{2}-\d{4}\b/g, '[SSN]');

    // Phone patterns
    str = str.replace(/\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g, '[PHONE]');

    return str;
  }
}

Ingestion Architecture

High-Volume Ingestion

┌─────────────────────────────────────────────────────────────────────────────┐
│                    LOG INGESTION ARCHITECTURE                                │
│                                                                              │
│  Browser Traffic ──────────────────────────────────────────────────────▶    │
│        │                                                                     │
│        │ HTTPS POST (gzipped batches)                                       │
│        │ 500K requests/sec peak                                             │
│        ▼                                                                     │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                    EDGE INGESTION                                    │    │
│  │                    (CDN Workers / Edge Functions)                    │    │
│  │                                                                      │    │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                 │    │
│  │  │  Edge PoP   │  │  Edge PoP   │  │  Edge PoP   │  ... (200+)    │    │
│  │  │  (NYC)      │  │  (London)   │  │  (Tokyo)    │                 │    │
│  │  │             │  │             │  │             │                 │    │
│  │  │ • Validate  │  │ • Validate  │  │ • Validate  │                 │    │
│  │  │ • Decompress│  │ • Decompress│  │ • Decompress│                 │    │
│  │  │ • Enrich    │  │ • Enrich    │  │ • Enrich    │                 │    │
│  │  │ • Route     │  │ • Route     │  │ • Route     │                 │    │
│  │  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘                 │    │
│  │         │                │                │                         │    │
│  │         └────────────────┼────────────────┘                         │    │
│  │                          │                                          │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                             │                                                │
│                             ▼                                                │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                    MESSAGE QUEUE                                     │    │
│  │                    (Kafka / Kinesis / Pub/Sub)                       │    │
│  │                                                                      │    │
│  │  ┌───────────────────────────────────────────────────────────────┐  │    │
│  │  │  Topic: frontend-logs                                          │  │    │
│  │  │  Partitions: 256                                               │  │    │
│  │  │  Replication: 3                                                │  │    │
│  │  │  Retention: 24 hours                                           │  │    │
│  │  │                                                                │  │    │
│  │  │  Partition Strategy: hash(customer_id) % partitions           │  │    │
│  │  │  Throughput: 500K msgs/sec write, 2M msgs/sec read            │  │    │
│  │  │                                                                │  │    │
│  │  └───────────────────────────────────────────────────────────────┘  │    │
│  │                                                                      │    │
│  │  Separate topics for:                                               │    │
│  │  • frontend-logs-errors (high priority)                            │    │
│  │  • frontend-logs-performance                                       │    │
│  │  • frontend-logs-interactions                                      │    │
│  │                                                                      │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                             │                                                │
│              ┌──────────────┼──────────────┐                                │
│              │              │              │                                │
│              ▼              ▼              ▼                                │
│  ┌────────────────┐ ┌────────────────┐ ┌────────────────┐                  │
│  │    Stream      │ │  Aggregation   │ │   Alerting     │                  │
│  │   Processor    │ │    Engine      │ │    Engine      │                  │
│  │   (Flink)      │ │   (Custom)     │ │   (Custom)     │                  │
│  │                │ │                │ │                │                  │
│  │ • Parse        │ │ • Time-window  │ │ • Threshold    │                  │
│  │ • Symbolicate  │ │   aggregation  │ │ • Anomaly      │                  │
│  │ • Sessionize   │ │ • Dimensional  │ │ • Correlation  │                  │
│  │ • Error group  │ │   rollups      │ │                │                  │
│  └────────────────┘ └────────────────┘ └────────────────┘                  │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Edge Ingestion Worker

// Edge worker for log ingestion

interface IngestRequest {
  logs: CompactLogEntry[];
  meta: {
    sdk: string;
    app: string;
    device: DeviceContext;
  };
}

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    // Rate limiting
    const clientIP = request.headers.get('CF-Connecting-IP') || 'unknown';
    const rateLimited = await checkRateLimit(env, clientIP);
    if (rateLimited) {
      return new Response('Rate limited', { status: 429 });
    }

    // Validate API key
    const apiKey = request.headers.get('X-API-Key');
    if (!apiKey || !await validateApiKey(env, apiKey)) {
      return new Response('Unauthorized', { status: 401 });
    }

    // Parse request
    const contentEncoding = request.headers.get('Content-Encoding');
    let body: string;

    if (contentEncoding === 'gzip') {
      const decompressed = await decompressGzip(await request.arrayBuffer());
      body = new TextDecoder().decode(decompressed);
    } else {
      body = await request.text();
    }

    let data: IngestRequest;
    try {
      data = JSON.parse(body);
    } catch {
      return new Response('Invalid JSON', { status: 400 });
    }

    // Validate structure
    if (!Array.isArray(data.logs) || data.logs.length === 0) {
      return new Response('Invalid payload', { status: 400 });
    }

    // Enrich logs
    const enriched = data.logs.map(log => enrichLog(log, request, env));

    // Route to Kafka
    const customerId = await getCustomerIdFromApiKey(env, apiKey);

    // Separate errors (high priority) from other logs
    const errors = enriched.filter(l => l.y === 0); // type enum for error
    const others = enriched.filter(l => l.y !== 0);

    // Send to appropriate queues
    const promises: Promise<void>[] = [];

    if (errors.length > 0) {
      promises.push(
        env.KAFKA.send({
          topic: 'frontend-logs-errors',
          messages: errors.map(e => ({
            key: customerId,
            value: JSON.stringify(e)
          }))
        })
      );
    }

    if (others.length > 0) {
      promises.push(
        env.KAFKA.send({
          topic: 'frontend-logs',
          messages: others.map(e => ({
            key: customerId,
            value: JSON.stringify(e)
          }))
        })
      );
    }

    await Promise.all(promises);

    return new Response('OK', {
      status: 202,
      headers: {
        'X-Logs-Received': String(enriched.length)
      }
    });
  }
};

function enrichLog(
  log: CompactLogEntry,
  request: Request,
  env: Env
): EnrichedLogEntry {
  const cf = request.cf as any;

  return {
    ...log,
    // Server-side enrichment
    ingestTime: Date.now(),
    geo: {
      country: cf?.country,
      region: cf?.region,
      city: cf?.city,
      colo: cf?.colo
    },
    clientIP: hashIP(request.headers.get('CF-Connecting-IP') || ''),
    asn: cf?.asn
  };
}

async function decompressGzip(buffer: ArrayBuffer): Promise<ArrayBuffer> {
  const stream = new DecompressionStream('gzip');
  const writer = stream.writable.getWriter();
  writer.write(new Uint8Array(buffer));
  writer.close();

  const reader = stream.readable.getReader();
  const chunks: Uint8Array[] = [];

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    chunks.push(value);
  }

  const totalLength = chunks.reduce((acc, chunk) => acc + chunk.length, 0);
  const result = new Uint8Array(totalLength);
  let offset = 0;
  for (const chunk of chunks) {
    result.set(chunk, offset);
    offset += chunk.length;
  }

  return result.buffer;
}

Processing Pipeline

Stream Processing

// Flink-style stream processor (pseudo-code)

class LogStreamProcessor {
  private sourceMapCache = new Map<string, SourceMap>();

  async process(stream: LogStream): Promise<void> {
    stream
      .filter(log => this.isValid(log))
      .map(log => this.parse(log))
      .keyBy(log => log.sessionId)
      .window(ProcessingTimeWindows.of(Time.seconds(30)))
      .process(new SessionWindowProcessor())
      .sink(this.createSinks());
  }

  private parse(rawLog: string): ParsedLog {
    const log = JSON.parse(rawLog);

    // Symbolicate error stack traces
    if (log.type === 'error' && log.payload.stack) {
      log.payload.symbolicatedStack = this.symbolicate(
        log.payload.stack,
        log.meta.appVersion
      );
    }

    // Parse user agent
    log.device = this.parseUserAgent(log.device.userAgent);

    return log;
  }

  private symbolicate(stack: string, version: string): string {
    // Get source map (cached)
    const sourceMap = this.getSourceMap(version);
    if (!sourceMap) return stack;

    // Parse stack frames
    const frames = this.parseStackFrames(stack);

    // Map each frame to original source
    const symbolicatedFrames = frames.map(frame => {
      const original = sourceMap.originalPositionFor({
        line: frame.line,
        column: frame.column
      });

      if (original.source) {
        return {
          ...frame,
          file: original.source,
          line: original.line,
          column: original.column,
          function: original.name || frame.function
        };
      }

      return frame;
    });

    return this.formatStack(symbolicatedFrames);
  }
}

class SessionWindowProcessor {
  process(
    key: string,
    context: ProcessWindowContext,
    logs: Iterable<ParsedLog>,
    out: Collector<SessionAggregate>
  ): void {
    const session: SessionAggregate = {
      sessionId: key,
      windowStart: context.window().start(),
      windowEnd: context.window().end(),
      pageViews: 0,
      errors: [],
      performance: {
        lcp: [],
        fid: [],
        cls: []
      },
      interactions: 0
    };

    for (const log of logs) {
      switch (log.type) {
        case 'pageview':
          session.pageViews++;
          break;

        case 'error':
          session.errors.push({
            message: log.payload.message,
            stack: log.payload.symbolicatedStack,
            count: 1
          });
          break;

        case 'performance':
          if (log.payload.metric === 'LCP') {
            session.performance.lcp.push(log.payload.value);
          }
          // ... other metrics
          break;

        case 'interaction':
          session.interactions++;
          break;
      }
    }

    // Group and dedupe errors
    session.errors = this.dedupeErrors(session.errors);

    // Aggregate performance
    session.performance.lcpP75 = this.percentile(session.performance.lcp, 75);

    out.collect(session);
  }

  private dedupeErrors(errors: ErrorEntry[]): ErrorEntry[] {
    const grouped = new Map<string, ErrorEntry>();

    for (const error of errors) {
      const key = this.errorFingerprint(error);
      const existing = grouped.get(key);

      if (existing) {
        existing.count++;
      } else {
        grouped.set(key, { ...error });
      }
    }

    return Array.from(grouped.values());
  }

  private errorFingerprint(error: ErrorEntry): string {
    // Fingerprint by message + first stack frame
    const firstFrame = error.stack?.split('\n')[0] || '';
    return `${error.message}|${firstFrame}`;
  }
}

Error Grouping

// Intelligent error grouping

interface ErrorGroup {
  id: string;
  fingerprint: string;
  message: string;
  stack: string;
  firstSeen: number;
  lastSeen: number;
  count: number;
  affectedUsers: number;
  affectedSessions: number;
  browsers: Record<string, number>;
  devices: Record<string, number>;
  pages: Record<string, number>;
  status: 'new' | 'ongoing' | 'regressed' | 'resolved';
}

class ErrorGrouper {
  private groups = new Map<string, ErrorGroup>();

  addError(error: ParsedError): ErrorGroup {
    const fingerprint = this.computeFingerprint(error);
    const existing = this.groups.get(fingerprint);

    if (existing) {
      return this.updateGroup(existing, error);
    } else {
      const group = this.createGroup(fingerprint, error);
      this.groups.set(fingerprint, group);
      return group;
    }
  }

  private computeFingerprint(error: ParsedError): string {
    // Strategy: Combine normalized message + top stack frames

    // Normalize message (remove variable parts)
    const normalizedMessage = this.normalizeMessage(error.message);

    // Get top N stack frames (symbolicated)
    const topFrames = this.getTopFrames(error.symbolicatedStack, 3);

    // Combine
    const parts = [normalizedMessage, ...topFrames];

    // Hash
    return this.hash(parts.join('|'));
  }

  private normalizeMessage(message: string): string {
    return message
      // Remove UUIDs
      .replace(/[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}/gi, '{uuid}')
      // Remove numbers
      .replace(/\b\d+\b/g, '{number}')
      // Remove URLs
      .replace(/https?:\/\/[^\s]+/g, '{url}')
      // Remove quotes strings
      .replace(/"[^"]+"/g, '"{string}"')
      .replace(/'[^']+'/g, "'{string}'");
  }

  private getTopFrames(stack: string | undefined, count: number): string[] {
    if (!stack) return [];

    const lines = stack.split('\n');
    const frames: string[] = [];

    for (const line of lines) {
      // Parse frame
      const match = line.match(/at\s+(\S+)\s+\((.+):(\d+):(\d+)\)/);
      if (match) {
        const [, func, file, line, col] = match;
        // Normalize
        const normalizedFile = file.replace(/\?.*$/, ''); // Remove query string
        frames.push(`${func}@${normalizedFile}:${line}`);

        if (frames.length >= count) break;
      }
    }

    return frames;
  }

  private updateGroup(group: ErrorGroup, error: ParsedError): ErrorGroup {
    group.lastSeen = Date.now();
    group.count++;

    // Update affected counts (using HyperLogLog in production)
    if (error.userId) {
      // Track unique users
    }
    if (error.sessionId) {
      // Track unique sessions
    }

    // Update distributions
    const browser = error.device.browser;
    group.browsers[browser] = (group.browsers[browser] || 0) + 1;

    const device = error.device.deviceType;
    group.devices[device] = (group.devices[device] || 0) + 1;

    const page = error.page.path;
    group.pages[page] = (group.pages[page] || 0) + 1;

    // Check for regression
    if (group.status === 'resolved') {
      group.status = 'regressed';
    }

    return group;
  }
}

Storage Architecture

Tiered Storage

┌─────────────────────────────────────────────────────────────────────────────┐
│                    LOG STORAGE TIERS                                         │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  HOT TIER (0-7 days)                                                 │    │
│  │                                                                      │    │
│  │  Storage: ClickHouse / Elasticsearch                                │    │
│  │  Volume: ~100TB                                                      │    │
│  │  Query latency: <1 second                                           │    │
│  │  Cost: $$$                                                           │    │
│  │                                                                      │    │
│  │  Use cases:                                                          │    │
│  │  • Real-time debugging                                              │    │
│  │  • Active incident investigation                                    │    │
│  │  • Alert evaluation                                                 │    │
│  │  • Dashboard queries                                                │    │
│  │                                                                      │    │
│  │  Features:                                                           │    │
│  │  • Full-text search                                                 │    │
│  │  • All columns indexed                                              │    │
│  │  • Sub-second aggregations                                          │    │
│  │                                                                      │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                      │                                       │
│                                      │ Age > 7 days                         │
│                                      ▼                                       │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  WARM TIER (7-30 days)                                               │    │
│  │                                                                      │    │
│  │  Storage: ClickHouse (cold replica) / S3 + Athena                   │    │
│  │  Volume: ~500TB                                                      │    │
│  │  Query latency: 5-30 seconds                                        │    │
│  │  Cost: $$                                                            │    │
│  │                                                                      │    │
│  │  Use cases:                                                          │    │
│  │  • Trend analysis                                                   │    │
│  │  • Post-mortem investigations                                       │    │
│  │  • Weekly/monthly reports                                           │    │
│  │                                                                      │    │
│  │  Features:                                                           │    │
│  │  • Columnar storage                                                 │    │
│  │  • Compressed                                                       │    │
│  │  • Partitioned by date                                              │    │
│  │                                                                      │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                      │                                       │
│                                      │ Age > 30 days                        │
│                                      ▼                                       │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  COLD TIER (30 days - 1 year)                                        │    │
│  │                                                                      │    │
│  │  Storage: S3 + Glacier / BigQuery (cold)                            │    │
│  │  Volume: ~5PB (sampled/aggregated)                                  │    │
│  │  Query latency: Minutes to hours                                    │    │
│  │  Cost: $                                                             │    │
│  │                                                                      │    │
│  │  Use cases:                                                          │    │
│  │  • Compliance / audit                                               │    │
│  │  • Year-over-year analysis                                          │    │
│  │  • ML training data                                                 │    │
│  │                                                                      │    │
│  │  Features:                                                           │    │
│  │  • Heavily compressed                                               │    │
│  │  • Sampled (10% of original)                                        │    │
│  │  • Pre-aggregated metrics                                           │    │
│  │                                                                      │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

ClickHouse Schema

-- ClickHouse schema for frontend logs

CREATE TABLE frontend_logs
(
    -- Identity
    id UUID,
    timestamp DateTime64(3),
    type Enum8('error' = 0, 'performance' = 1, 'interaction' = 2, 'debug' = 3),

    -- Context
    customer_id String,
    session_id String,
    user_id Nullable(String),

    -- Page
    page_url String,
    page_path LowCardinality(String),

    -- Device
    device_type Enum8('mobile' = 0, 'tablet' = 1, 'desktop' = 2),
    browser LowCardinality(String),
    browser_version LowCardinality(String),
    os LowCardinality(String),

    -- Geo
    country LowCardinality(String),
    region LowCardinality(String),
    city String,

    -- Payload (varies by type)
    error_message Nullable(String),
    error_stack Nullable(String),
    error_fingerprint Nullable(String),

    perf_metric Nullable(String),
    perf_value Nullable(Float64),

    interaction_name Nullable(String),

    -- App
    app_version LowCardinality(String),
    sdk_version LowCardinality(String)
)
ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY (customer_id, type, timestamp)
TTL timestamp + INTERVAL 7 DAY TO DISK 'warm',
    timestamp + INTERVAL 30 DAY TO DISK 'cold',
    timestamp + INTERVAL 365 DAY DELETE
SETTINGS index_granularity = 8192;

-- Materialized view for error aggregates
CREATE MATERIALIZED VIEW frontend_errors_hourly
ENGINE = SummingMergeTree()
PARTITION BY toYYYYMMDD(hour)
ORDER BY (customer_id, error_fingerprint, hour)
AS SELECT
    customer_id,
    error_fingerprint,
    toStartOfHour(timestamp) AS hour,
    count() AS count,
    uniqExact(user_id) AS affected_users,
    uniqExact(session_id) AS affected_sessions,
    anyLast(error_message) AS message,
    anyLast(error_stack) AS stack
FROM frontend_logs
WHERE type = 'error'
GROUP BY customer_id, error_fingerprint, hour;

-- Materialized view for performance percentiles
CREATE MATERIALIZED VIEW frontend_perf_hourly
ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMMDD(hour)
ORDER BY (customer_id, page_path, perf_metric, hour)
AS SELECT
    customer_id,
    page_path,
    perf_metric,
    toStartOfHour(timestamp) AS hour,
    quantileState(0.5)(perf_value) AS p50,
    quantileState(0.75)(perf_value) AS p75,
    quantileState(0.95)(perf_value) AS p95,
    quantileState(0.99)(perf_value) AS p99,
    count() AS sample_count
FROM frontend_logs
WHERE type = 'performance' AND perf_metric IS NOT NULL
GROUP BY customer_id, page_path, perf_metric, hour;

Query Patterns

Common Queries

-- Find errors by fingerprint with full context
SELECT
    error_message,
    error_stack,
    page_path,
    browser,
    device_type,
    country,
    count() AS occurrences,
    uniqExact(session_id) AS sessions,
    min(timestamp) AS first_seen,
    max(timestamp) AS last_seen
FROM frontend_logs
WHERE
    customer_id = 'cust_123'
    AND type = 'error'
    AND timestamp > now() - INTERVAL 24 HOUR
GROUP BY error_message, error_stack, page_path, browser, device_type, country
ORDER BY occurrences DESC
LIMIT 100;

-- Performance percentiles by page over time
SELECT
    toStartOfFifteenMinutes(timestamp) AS bucket,
    page_path,
    quantile(0.75)(perf_value) AS p75,
    quantile(0.95)(perf_value) AS p95,
    count() AS samples
FROM frontend_logs
WHERE
    customer_id = 'cust_123'
    AND type = 'performance'
    AND perf_metric = 'LCP'
    AND timestamp > now() - INTERVAL 6 HOUR
GROUP BY bucket, page_path
ORDER BY bucket;

-- Session reconstruction
SELECT
    timestamp,
    type,
    page_path,
    CASE
        WHEN type = 'error' THEN error_message
        WHEN type = 'performance' THEN concat(perf_metric, ': ', toString(perf_value))
        WHEN type = 'interaction' THEN interaction_name
        ELSE 'debug'
    END AS event_detail
FROM frontend_logs
WHERE
    customer_id = 'cust_123'
    AND session_id = 'sess_abc123'
ORDER BY timestamp;

-- Error rate by browser/version
SELECT
    browser,
    browser_version,
    countIf(type = 'error') AS errors,
    countIf(type = 'pageview') AS pageviews,
    errors / pageviews * 100 AS error_rate_pct
FROM frontend_logs
WHERE
    customer_id = 'cust_123'
    AND timestamp > now() - INTERVAL 7 DAY
GROUP BY browser, browser_version
HAVING pageviews > 1000
ORDER BY error_rate_pct DESC;

Alerting on Logs

Alert Rule Engine

// Log-based alerting

interface LogAlertRule {
  id: string;
  name: string;
  query: LogQuery;
  condition: AlertCondition;
  window: number; // seconds
  severity: 'critical' | 'warning' | 'info';
  channels: string[];
}

interface LogQuery {
  type?: LogType[];
  filters: Record<string, string | string[]>;
  aggregation?: 'count' | 'rate' | 'distinct';
  groupBy?: string[];
}

const alertRules: LogAlertRule[] = [
  // Error spike detection
  {
    id: 'error-spike',
    name: 'Error Rate Spike',
    query: {
      type: ['error'],
      aggregation: 'rate'
    },
    condition: {
      type: 'anomaly',
      baseline: 'rolling-7d',
      threshold: 3 // 3 standard deviations
    },
    window: 300, // 5 minutes
    severity: 'critical',
    channels: ['pagerduty', 'slack-critical']
  },

  // New error type
  {
    id: 'new-error',
    name: 'New Error Type Detected',
    query: {
      type: ['error'],
      aggregation: 'distinct',
      groupBy: ['error_fingerprint']
    },
    condition: {
      type: 'new-value',
      lookbackDays: 7
    },
    window: 60,
    severity: 'warning',
    channels: ['slack-frontend']
  },

  // Performance regression
  {
    id: 'lcp-regression',
    name: 'LCP P75 Regression',
    query: {
      type: ['performance'],
      filters: { perf_metric: 'LCP' },
      aggregation: 'percentile',
      percentile: 75
    },
    condition: {
      type: 'threshold',
      operator: 'gt',
      value: 2500,
      sustainedMinutes: 10
    },
    window: 600,
    severity: 'warning',
    channels: ['slack-frontend']
  },

  // Session error cascade
  {
    id: 'session-cascade',
    name: 'Session Error Cascade',
    query: {
      type: ['error'],
      aggregation: 'count',
      groupBy: ['session_id']
    },
    condition: {
      type: 'threshold',
      operator: 'gt',
      value: 5 // More than 5 errors in one session
    },
    window: 300,
    severity: 'info',
    channels: ['slack-frontend']
  }
];

class LogAlertEvaluator {
  async evaluate(rule: LogAlertRule): Promise<Alert | null> {
    const results = await this.executeQuery(rule.query, rule.window);
    const triggered = this.checkCondition(rule.condition, results);

    if (triggered) {
      return {
        ruleId: rule.id,
        severity: rule.severity,
        message: this.buildAlertMessage(rule, results),
        context: results
      };
    }

    return null;
  }

  private async executeQuery(
    query: LogQuery,
    windowSeconds: number
  ): Promise<QueryResult> {
    // Build and execute ClickHouse query
    const sql = this.buildSQL(query, windowSeconds);
    return await this.clickhouse.query(sql);
  }

  private buildSQL(query: LogQuery, windowSeconds: number): string {
    let sql = 'SELECT ';

    if (query.aggregation === 'count') {
      sql += 'count() AS value';
    } else if (query.aggregation === 'rate') {
      sql += `count() / ${windowSeconds} AS value`;
    } else if (query.aggregation === 'distinct') {
      sql += `uniqExact(${query.groupBy?.[0] || 'session_id'}) AS value`;
    }

    if (query.groupBy) {
      sql += `, ${query.groupBy.join(', ')}`;
    }

    sql += ' FROM frontend_logs WHERE timestamp > now() - INTERVAL ';
    sql += `${windowSeconds} SECOND`;

    if (query.type) {
      sql += ` AND type IN (${query.type.map(t => `'${t}'`).join(', ')})`;
    }

    for (const [key, value] of Object.entries(query.filters || {})) {
      if (Array.isArray(value)) {
        sql += ` AND ${key} IN (${value.map(v => `'${v}'`).join(', ')})`;
      } else {
        sql += ` AND ${key} = '${value}'`;
      }
    }

    if (query.groupBy) {
      sql += ` GROUP BY ${query.groupBy.join(', ')}`;
    }

    return sql;
  }
}

Summary

Frontend logging at scale is not just "backend logging but in browsers." The constraints are fundamentally different: you're collecting from untrusted, uncontrolled environments with strict performance budgets and massive volume.

Key Architectural Principles:

Sample ruthlessly, but never errors - Behavioral logs can be 1% sampled. Errors are always 100%.
Batch and compress client-side - Never send individual log entries. Batch 50-100, gzip, send.
Use sendBeacon for reliability - fetch fails on page unload. sendBeacon survives navigation.
Fingerprint and group errors server-side - Don't send duplicate stacks. Dedupe by fingerprint.
Symbolicate early in the pipeline - Stack traces are useless without source maps. Do it in stream processing.
Tier your storage by access pattern - Hot (7d), warm (30d), cold (1y). Different cost/latency tradeoffs.
Pre-aggregate for dashboards - Don't query raw logs for percentiles. Use materialized views.
PII scrub everywhere - Emails, cards, SSNs can appear anywhere. Scrub client-side AND server-side.
Rate limit at the edge - Runaway clients can DoS your pipeline. Enforce limits per-client.
Alert on log patterns, not just metrics - New error fingerprints, error cascades, and behavioral anomalies are all detectable from logs.

Frontend logging done right gives you superpowers: instant visibility into production issues, session-level debugging, and real user experience data. Done wrong, it's an expensive, noisy pipeline that drowns signal in volume.

Build for scale from day one—retrofitting sampling and tiering onto an overloaded system is painful.

What did you think?