Frontend Observability Architecture for Enterprise Apps

February 25, 20262 min read6 views

frontend observability

enterprise architecture

frontend architecture

production engineering

Frontend Observability Architecture for Enterprise Apps

Beyond Console.log: Building Production-Grade Visibility

Enterprise frontend applications generate millions of events daily across thousands of users, browsers, and network conditions. Traditional logging approaches—scattered console.log statements and basic error tracking—fail to provide the visibility needed to understand system behavior, diagnose production issues, or optimize performance at scale.

This article presents a comprehensive observability architecture that treats frontend telemetry as a distributed systems problem, implementing the three pillars of observability (metrics, logs, traces) with correlation capabilities that enable true end-to-end visibility from user interaction through API response.

The Three Pillars in Frontend Context

┌─────────────────────────────────────────────────────────────────────────┐
│                    Frontend Observability Stack                         │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────────────────────┐ │
│  │   METRICS   │    │    LOGS     │    │          TRACES             │ │
│  │             │    │             │    │                             │ │
│  │ Aggregated  │    │ Structured  │    │  Request flow across       │ │
│  │ numerical   │    │ events with │    │  browser → edge → API      │ │
│  │ measurements│    │ context     │    │                             │ │
│  └──────┬──────┘    └──────┬──────┘    └─────────────┬───────────────┘ │
│         │                  │                         │                 │
│         └──────────────────┼─────────────────────────┘                 │
│                            │                                           │
│                    ┌───────▼───────┐                                   │
│                    │ CORRELATION   │                                   │
│                    │               │                                   │
│                    │  request_id   │                                   │
│                    │  session_id   │                                   │
│                    │  trace_id     │                                   │
│                    │  span_id      │                                   │
│                    └───────────────┘                                   │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Correlation Identity Model

Every telemetry event must carry identifiers that enable correlation across the entire request lifecycle:

// src/observability/correlation.ts

interface CorrelationContext {
  // Persists across the entire user session
  sessionId: string;

  // Unique per page load, survives navigation in SPA
  pageViewId: string;

  // Unique per user interaction (click, form submit, etc.)
  interactionId: string;

  // W3C Trace Context - propagated to backend
  traceId: string;
  spanId: string;

  // Links related requests (e.g., retry of same operation)
  correlationId: string;
}

class CorrelationManager {
  private context: CorrelationContext;
  private spanStack: string[] = [];

  constructor() {
    this.context = {
      sessionId: this.getOrCreateSessionId(),
      pageViewId: this.generateId(),
      interactionId: '',
      traceId: this.generateTraceId(),
      spanId: this.generateSpanId(),
      correlationId: '',
    };
  }

  private getOrCreateSessionId(): string {
    const stored = sessionStorage.getItem('obs_session_id');
    if (stored) return stored;

    const sessionId = this.generateId();
    sessionStorage.setItem('obs_session_id', sessionId);
    return sessionId;
  }

  private generateId(): string {
    return crypto.randomUUID();
  }

  private generateTraceId(): string {
    // W3C Trace Context: 16 bytes / 32 hex chars
    const bytes = new Uint8Array(16);
    crypto.getRandomValues(bytes);
    return Array.from(bytes, b => b.toString(16).padStart(2, '0')).join('');
  }

  private generateSpanId(): string {
    // W3C Trace Context: 8 bytes / 16 hex chars
    const bytes = new Uint8Array(8);
    crypto.getRandomValues(bytes);
    return Array.from(bytes, b => b.toString(16).padStart(2, '0')).join('');
  }

  // Start a new interaction (user action like click, submit)
  startInteraction(name: string): () => void {
    const previousInteractionId = this.context.interactionId;
    const previousCorrelationId = this.context.correlationId;

    this.context.interactionId = this.generateId();
    this.context.correlationId = this.generateId();
    this.context.traceId = this.generateTraceId();
    this.context.spanId = this.generateSpanId();

    // Return cleanup function
    return () => {
      this.context.interactionId = previousInteractionId;
      this.context.correlationId = previousCorrelationId;
    };
  }

  // Create child span for nested operations
  startSpan(name: string): SpanContext {
    const parentSpanId = this.context.spanId;
    const newSpanId = this.generateSpanId();

    this.spanStack.push(parentSpanId);
    this.context.spanId = newSpanId;

    return {
      traceId: this.context.traceId,
      spanId: newSpanId,
      parentSpanId,
      name,
      startTime: performance.now(),
      end: () => {
        this.context.spanId = this.spanStack.pop() || this.generateSpanId();
      },
    };
  }

  // Get headers for outgoing HTTP requests
  getTraceHeaders(): Record<string, string> {
    return {
      'traceparent': `00-${this.context.traceId}-${this.context.spanId}-01`,
      'x-correlation-id': this.context.correlationId,
      'x-session-id': this.context.sessionId,
      'x-interaction-id': this.context.interactionId,
    };
  }

  getContext(): Readonly<CorrelationContext> {
    return { ...this.context };
  }
}

interface SpanContext {
  traceId: string;
  spanId: string;
  parentSpanId: string;
  name: string;
  startTime: number;
  end: () => void;
}

export const correlation = new CorrelationManager();

Distributed Tracing: Browser to Backend

Request Flow Instrumentation

┌──────────────────────────────────────────────────────────────────────────────┐
│                         Distributed Trace Flow                               │
├──────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Browser                    Edge/CDN              API Gateway    Services    │
│  ────────                   ────────              ───────────    ────────    │
│                                                                              │
│  ┌─────────────┐                                                             │
│  │ User Click  │                                                             │
│  │ span_id: a1 │                                                             │
│  └──────┬──────┘                                                             │
│         │                                                                    │
│         ▼                                                                    │
│  ┌─────────────┐       ┌─────────────┐                                       │
│  │ State Update│       │             │                                       │
│  │ span_id: a2 │       │             │                                       │
│  │ parent: a1  │       │             │                                       │
│  └──────┬──────┘       │             │                                       │
│         │              │             │                                       │
│         ▼              ▼             │                                       │
│  ┌─────────────┐  ┌─────────────┐    │                                       │
│  │ Fetch Start │  │ Edge Worker │    │                                       │
│  │ span_id: a3 │──│ span_id: a4 │    │                                       │
│  │ parent: a2  │  │ parent: a3  │    │                                       │
│  └─────────────┘  └──────┬──────┘    │                                       │
│                          │           ▼                                       │
│                          │    ┌─────────────┐    ┌─────────────┐             │
│                          └───▶│ API Gateway │───▶│ User Service│             │
│                               │ span_id: a5 │    │ span_id: a6 │             │
│                               │ parent: a4  │    │ parent: a5  │             │
│                               └─────────────┘    └─────────────┘             │
│                                                                              │
│  ═══════════════════════════════════════════════════════════════════════     │
│  trace_id: 4bf92f3577b34da6a3ce929d0e0e4736 (same across all spans)         │
│                                                                              │
└──────────────────────────────────────────────────────────────────────────────┘

Fetch Instrumentation with Tracing

// src/observability/instrumented-fetch.ts

import { correlation } from './correlation';
import { metrics } from './metrics';
import { logger } from './logger';

interface FetchTiming {
  dnsLookup: number;
  tcpConnection: number;
  tlsHandshake: number;
  requestSent: number;
  waiting: number;
  contentDownload: number;
  total: number;
}

interface InstrumentedResponse<T> extends Response {
  data: T;
  timing: FetchTiming;
  traceId: string;
}

export async function instrumentedFetch<T = unknown>(
  url: string,
  options: RequestInit = {}
): Promise<InstrumentedResponse<T>> {
  const span = correlation.startSpan(`fetch:${new URL(url, location.origin).pathname}`);
  const startTime = performance.now();

  // Inject trace context headers
  const headers = new Headers(options.headers);
  const traceHeaders = correlation.getTraceHeaders();
  Object.entries(traceHeaders).forEach(([key, value]) => {
    headers.set(key, value);
  });

  // Add timing metadata
  const requestId = crypto.randomUUID();
  headers.set('x-request-id', requestId);

  const context = correlation.getContext();

  logger.debug('fetch:start', {
    url,
    method: options.method || 'GET',
    requestId,
    ...context,
  });

  try {
    const response = await fetch(url, {
      ...options,
      headers,
    });

    const endTime = performance.now();
    const duration = endTime - startTime;

    // Extract server timing if available
    const serverTiming = parseServerTiming(response.headers.get('server-timing'));

    // Capture resource timing for this request
    const timing = await captureResourceTiming(url, startTime);

    // Record metrics
    metrics.histogram('http.client.duration', duration, {
      method: options.method || 'GET',
      status: response.status.toString(),
      host: new URL(url, location.origin).host,
    });

    if (!response.ok) {
      metrics.increment('http.client.error', {
        method: options.method || 'GET',
        status: response.status.toString(),
        host: new URL(url, location.origin).host,
      });

      logger.warn('fetch:error', {
        url,
        status: response.status,
        statusText: response.statusText,
        duration,
        requestId,
        serverTiming,
        ...context,
      });
    }

    // Parse response body
    const contentType = response.headers.get('content-type') || '';
    let data: T;

    if (contentType.includes('application/json')) {
      data = await response.json();
    } else {
      data = await response.text() as unknown as T;
    }

    logger.info('fetch:complete', {
      url,
      method: options.method || 'GET',
      status: response.status,
      duration,
      timing,
      serverTiming,
      requestId,
      responseTraceId: response.headers.get('x-trace-id'),
      ...context,
    });

    span.end();

    return Object.assign(response, {
      data,
      timing,
      traceId: context.traceId,
    }) as InstrumentedResponse<T>;

  } catch (error) {
    const endTime = performance.now();
    const duration = endTime - startTime;

    metrics.increment('http.client.error', {
      method: options.method || 'GET',
      error_type: error instanceof Error ? error.name : 'unknown',
      host: new URL(url, location.origin).host,
    });

    logger.error('fetch:failed', {
      url,
      method: options.method || 'GET',
      error: error instanceof Error ? error.message : String(error),
      errorType: error instanceof Error ? error.name : 'unknown',
      duration,
      requestId,
      ...context,
    });

    span.end();
    throw error;
  }
}

function parseServerTiming(header: string | null): Record<string, number> {
  if (!header) return {};

  const timings: Record<string, number> = {};

  // Parse Server-Timing header: "db;dur=53.2, cache;dur=0.1, app;dur=47.2"
  header.split(',').forEach(entry => {
    const match = entry.trim().match(/^(\w+)(?:;.*dur=(\d+(?:\.\d+)?))?/);
    if (match) {
      const [, name, duration] = match;
      timings[name] = duration ? parseFloat(duration) : 0;
    }
  });

  return timings;
}

async function captureResourceTiming(
  url: string,
  requestStartTime: number
): Promise<FetchTiming> {
  // Wait for resource timing entry to be available
  await new Promise(resolve => setTimeout(resolve, 0));

  const entries = performance.getEntriesByType('resource') as PerformanceResourceTiming[];
  const entry = entries
    .filter(e => e.name.includes(url) && e.startTime >= requestStartTime - 10)
    .pop();

  if (!entry) {
    return {
      dnsLookup: 0,
      tcpConnection: 0,
      tlsHandshake: 0,
      requestSent: 0,
      waiting: 0,
      contentDownload: 0,
      total: 0,
    };
  }

  return {
    dnsLookup: entry.domainLookupEnd - entry.domainLookupStart,
    tcpConnection: entry.connectEnd - entry.connectStart,
    tlsHandshake: entry.secureConnectionStart > 0
      ? entry.connectEnd - entry.secureConnectionStart
      : 0,
    requestSent: entry.responseStart - entry.requestStart,
    waiting: entry.responseStart - entry.requestStart,
    contentDownload: entry.responseEnd - entry.responseStart,
    total: entry.responseEnd - entry.startTime,
  };
}

Structured Logging System

Log Levels and Semantic Structure

// src/observability/logger.ts

import { correlation } from './correlation';

type LogLevel = 'debug' | 'info' | 'warn' | 'error' | 'fatal';

interface LogEntry {
  timestamp: string;
  level: LogLevel;
  message: string;
  context: Record<string, unknown>;
  correlation: {
    sessionId: string;
    pageViewId: string;
    interactionId: string;
    traceId: string;
    spanId: string;
  };
  environment: {
    url: string;
    userAgent: string;
    viewport: { width: number; height: number };
    connection?: {
      effectiveType: string;
      downlink: number;
      rtt: number;
    };
  };
}

interface LoggerConfig {
  minLevel: LogLevel;
  sampleRate: number;
  batchSize: number;
  flushInterval: number;
  endpoint: string;
}

const LOG_LEVELS: Record<LogLevel, number> = {
  debug: 0,
  info: 1,
  warn: 2,
  error: 3,
  fatal: 4,
};

class StructuredLogger {
  private config: LoggerConfig;
  private buffer: LogEntry[] = [];
  private flushTimer: number | null = null;

  constructor(config: Partial<LoggerConfig> = {}) {
    this.config = {
      minLevel: config.minLevel ?? 'info',
      sampleRate: config.sampleRate ?? 1.0,
      batchSize: config.batchSize ?? 50,
      flushInterval: config.flushInterval ?? 5000,
      endpoint: config.endpoint ?? '/api/telemetry/logs',
    };

    this.setupAutoFlush();
    this.setupUnloadFlush();
  }

  private shouldLog(level: LogLevel): boolean {
    return LOG_LEVELS[level] >= LOG_LEVELS[this.config.minLevel];
  }

  private shouldSample(): boolean {
    return Math.random() < this.config.sampleRate;
  }

  private getEnvironment(): LogEntry['environment'] {
    const nav = navigator as Navigator & {
      connection?: {
        effectiveType: string;
        downlink: number;
        rtt: number;
      };
    };

    return {
      url: location.href,
      userAgent: navigator.userAgent,
      viewport: {
        width: window.innerWidth,
        height: window.innerHeight,
      },
      connection: nav.connection ? {
        effectiveType: nav.connection.effectiveType,
        downlink: nav.connection.downlink,
        rtt: nav.connection.rtt,
      } : undefined,
    };
  }

  private createEntry(
    level: LogLevel,
    message: string,
    context: Record<string, unknown>
  ): LogEntry {
    const correlationContext = correlation.getContext();

    return {
      timestamp: new Date().toISOString(),
      level,
      message,
      context: this.sanitizeContext(context),
      correlation: {
        sessionId: correlationContext.sessionId,
        pageViewId: correlationContext.pageViewId,
        interactionId: correlationContext.interactionId,
        traceId: correlationContext.traceId,
        spanId: correlationContext.spanId,
      },
      environment: this.getEnvironment(),
    };
  }

  private sanitizeContext(context: Record<string, unknown>): Record<string, unknown> {
    const sanitized: Record<string, unknown> = {};
    const sensitiveKeys = ['password', 'token', 'secret', 'apiKey', 'authorization'];

    for (const [key, value] of Object.entries(context)) {
      if (sensitiveKeys.some(sk => key.toLowerCase().includes(sk))) {
        sanitized[key] = '[REDACTED]';
      } else if (typeof value === 'object' && value !== null) {
        sanitized[key] = JSON.stringify(value).substring(0, 1000);
      } else {
        sanitized[key] = value;
      }
    }

    return sanitized;
  }

  private log(level: LogLevel, message: string, context: Record<string, unknown> = {}) {
    if (!this.shouldLog(level)) return;

    // Always log errors, sample others
    if (level !== 'error' && level !== 'fatal' && !this.shouldSample()) return;

    const entry = this.createEntry(level, message, context);

    // Console output in development
    if (process.env.NODE_ENV === 'development') {
      const consoleMethod = level === 'fatal' ? 'error' : level;
      console[consoleMethod](`[${level.toUpperCase()}] ${message}`, context);
    }

    this.buffer.push(entry);

    if (this.buffer.length >= this.config.batchSize) {
      this.flush();
    }
  }

  debug(message: string, context?: Record<string, unknown>) {
    this.log('debug', message, context);
  }

  info(message: string, context?: Record<string, unknown>) {
    this.log('info', message, context);
  }

  warn(message: string, context?: Record<string, unknown>) {
    this.log('warn', message, context);
  }

  error(message: string, context?: Record<string, unknown>) {
    this.log('error', message, context);
  }

  fatal(message: string, context?: Record<string, unknown>) {
    this.log('fatal', message, context);
    // Immediately flush fatal errors
    this.flush();
  }

  private setupAutoFlush() {
    this.flushTimer = window.setInterval(() => {
      if (this.buffer.length > 0) {
        this.flush();
      }
    }, this.config.flushInterval);
  }

  private setupUnloadFlush() {
    // Use visibilitychange for more reliable flush
    document.addEventListener('visibilitychange', () => {
      if (document.visibilityState === 'hidden') {
        this.flush(true);
      }
    });

    // Fallback for page unload
    window.addEventListener('pagehide', () => {
      this.flush(true);
    });
  }

  private async flush(useBeacon = false) {
    if (this.buffer.length === 0) return;

    const entries = [...this.buffer];
    this.buffer = [];

    const payload = JSON.stringify({ logs: entries });

    if (useBeacon && navigator.sendBeacon) {
      navigator.sendBeacon(
        this.config.endpoint,
        new Blob([payload], { type: 'application/json' })
      );
      return;
    }

    try {
      await fetch(this.config.endpoint, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: payload,
        keepalive: true,
      });
    } catch (error) {
      // Re-add entries on failure (with limit to prevent memory issues)
      if (this.buffer.length < 200) {
        this.buffer.unshift(...entries);
      }
      console.error('Failed to flush logs:', error);
    }
  }
}

export const logger = new StructuredLogger({
  minLevel: process.env.NODE_ENV === 'development' ? 'debug' : 'info',
  sampleRate: process.env.NODE_ENV === 'development' ? 1.0 : 0.1,
});

Error Taxonomy and Classification

Semantic Error Categories

// src/observability/error-taxonomy.ts

export enum ErrorCategory {
  // Network and connectivity
  NETWORK_OFFLINE = 'network.offline',
  NETWORK_TIMEOUT = 'network.timeout',
  NETWORK_DNS_FAILURE = 'network.dns_failure',
  NETWORK_CONNECTION_REFUSED = 'network.connection_refused',

  // HTTP errors
  HTTP_CLIENT_ERROR = 'http.client_error',      // 4xx
  HTTP_SERVER_ERROR = 'http.server_error',      // 5xx
  HTTP_RATE_LIMITED = 'http.rate_limited',      // 429
  HTTP_UNAUTHORIZED = 'http.unauthorized',      // 401
  HTTP_FORBIDDEN = 'http.forbidden',            // 403
  HTTP_NOT_FOUND = 'http.not_found',            // 404

  // JavaScript runtime
  JS_TYPE_ERROR = 'js.type_error',
  JS_REFERENCE_ERROR = 'js.reference_error',
  JS_SYNTAX_ERROR = 'js.syntax_error',
  JS_RANGE_ERROR = 'js.range_error',

  // React-specific
  REACT_RENDER_ERROR = 'react.render_error',
  REACT_HYDRATION_MISMATCH = 'react.hydration_mismatch',
  REACT_HOOK_ERROR = 'react.hook_error',
  REACT_SUSPENSE_ERROR = 'react.suspense_error',

  // Application-specific
  APP_VALIDATION_ERROR = 'app.validation_error',
  APP_STATE_CORRUPTION = 'app.state_corruption',
  APP_INVARIANT_VIOLATION = 'app.invariant_violation',
  APP_FEATURE_FLAG_ERROR = 'app.feature_flag_error',

  // Resource loading
  RESOURCE_SCRIPT_LOAD = 'resource.script_load',
  RESOURCE_STYLE_LOAD = 'resource.style_load',
  RESOURCE_IMAGE_LOAD = 'resource.image_load',
  RESOURCE_CHUNK_LOAD = 'resource.chunk_load',

  // Storage
  STORAGE_QUOTA_EXCEEDED = 'storage.quota_exceeded',
  STORAGE_UNAVAILABLE = 'storage.unavailable',

  // Unknown
  UNKNOWN = 'unknown',
}

export enum ErrorSeverity {
  LOW = 'low',           // Cosmetic issues, non-blocking
  MEDIUM = 'medium',     // Feature degradation but app usable
  HIGH = 'high',         // Core functionality impacted
  CRITICAL = 'critical', // App unusable
}

interface ClassifiedError {
  category: ErrorCategory;
  severity: ErrorSeverity;
  isRecoverable: boolean;
  suggestedAction: 'retry' | 'refresh' | 'ignore' | 'escalate' | 'offline_fallback';
  userMessage: string;
}

export function classifyError(error: Error, context?: {
  httpStatus?: number;
  url?: string;
  componentStack?: string;
}): ClassifiedError {
  // Network errors
  if (error instanceof TypeError && error.message.includes('Failed to fetch')) {
    if (!navigator.onLine) {
      return {
        category: ErrorCategory.NETWORK_OFFLINE,
        severity: ErrorSeverity.MEDIUM,
        isRecoverable: true,
        suggestedAction: 'offline_fallback',
        userMessage: 'You appear to be offline. Some features may be unavailable.',
      };
    }
    return {
      category: ErrorCategory.NETWORK_TIMEOUT,
      severity: ErrorSeverity.MEDIUM,
      isRecoverable: true,
      suggestedAction: 'retry',
      userMessage: 'Connection issue. Please try again.',
    };
  }

  // HTTP status-based classification
  if (context?.httpStatus) {
    const status = context.httpStatus;

    if (status === 401) {
      return {
        category: ErrorCategory.HTTP_UNAUTHORIZED,
        severity: ErrorSeverity.HIGH,
        isRecoverable: false,
        suggestedAction: 'escalate',
        userMessage: 'Your session has expired. Please sign in again.',
      };
    }

    if (status === 403) {
      return {
        category: ErrorCategory.HTTP_FORBIDDEN,
        severity: ErrorSeverity.HIGH,
        isRecoverable: false,
        suggestedAction: 'escalate',
        userMessage: "You don't have permission to access this resource.",
      };
    }

    if (status === 404) {
      return {
        category: ErrorCategory.HTTP_NOT_FOUND,
        severity: ErrorSeverity.MEDIUM,
        isRecoverable: false,
        suggestedAction: 'ignore',
        userMessage: 'The requested resource was not found.',
      };
    }

    if (status === 429) {
      return {
        category: ErrorCategory.HTTP_RATE_LIMITED,
        severity: ErrorSeverity.MEDIUM,
        isRecoverable: true,
        suggestedAction: 'retry',
        userMessage: 'Too many requests. Please wait a moment.',
      };
    }

    if (status >= 500) {
      return {
        category: ErrorCategory.HTTP_SERVER_ERROR,
        severity: ErrorSeverity.HIGH,
        isRecoverable: true,
        suggestedAction: 'retry',
        userMessage: 'Server error. Our team has been notified.',
      };
    }
  }

  // React-specific errors
  if (context?.componentStack) {
    if (error.message.includes('Hydration')) {
      return {
        category: ErrorCategory.REACT_HYDRATION_MISMATCH,
        severity: ErrorSeverity.MEDIUM,
        isRecoverable: true,
        suggestedAction: 'refresh',
        userMessage: 'Page display issue. Refreshing may help.',
      };
    }

    if (error.message.includes('hook')) {
      return {
        category: ErrorCategory.REACT_HOOK_ERROR,
        severity: ErrorSeverity.HIGH,
        isRecoverable: false,
        suggestedAction: 'refresh',
        userMessage: 'An error occurred. Please refresh the page.',
      };
    }

    return {
      category: ErrorCategory.REACT_RENDER_ERROR,
      severity: ErrorSeverity.HIGH,
      isRecoverable: false,
      suggestedAction: 'refresh',
      userMessage: 'An error occurred while displaying this content.',
    };
  }

  // JavaScript runtime errors
  if (error instanceof TypeError) {
    return {
      category: ErrorCategory.JS_TYPE_ERROR,
      severity: ErrorSeverity.HIGH,
      isRecoverable: false,
      suggestedAction: 'refresh',
      userMessage: 'An unexpected error occurred.',
    };
  }

  if (error instanceof ReferenceError) {
    return {
      category: ErrorCategory.JS_REFERENCE_ERROR,
      severity: ErrorSeverity.HIGH,
      isRecoverable: false,
      suggestedAction: 'refresh',
      userMessage: 'An unexpected error occurred.',
    };
  }

  // Chunk loading errors
  if (error.message.includes('Loading chunk') || error.message.includes('ChunkLoadError')) {
    return {
      category: ErrorCategory.RESOURCE_CHUNK_LOAD,
      severity: ErrorSeverity.HIGH,
      isRecoverable: true,
      suggestedAction: 'refresh',
      userMessage: 'Failed to load application component. Please refresh.',
    };
  }

  // Storage errors
  if (error.name === 'QuotaExceededError') {
    return {
      category: ErrorCategory.STORAGE_QUOTA_EXCEEDED,
      severity: ErrorSeverity.LOW,
      isRecoverable: false,
      suggestedAction: 'ignore',
      userMessage: 'Storage is full. Some features may be limited.',
    };
  }

  // Default classification
  return {
    category: ErrorCategory.UNKNOWN,
    severity: ErrorSeverity.MEDIUM,
    isRecoverable: false,
    suggestedAction: 'escalate',
    userMessage: 'An unexpected error occurred.',
  };
}

// Error fingerprinting for deduplication
export function generateErrorFingerprint(error: Error, componentStack?: string): string {
  const parts: string[] = [
    error.name,
    error.message.replace(/\d+/g, 'N').replace(/['"]/g, ''),
  ];

  // Include first meaningful stack frame
  if (error.stack) {
    const frames = error.stack.split('\n').slice(1, 4);
    const meaningfulFrame = frames.find(f =>
      !f.includes('node_modules') &&
      !f.includes('webpack') &&
      !f.includes('<anonymous>')
    );
    if (meaningfulFrame) {
      // Extract file and line, normalize
      const match = meaningfulFrame.match(/at\s+(\S+)\s+\(([^:]+):(\d+)/);
      if (match) {
        parts.push(`${match[1]}@${match[2]}:${match[3]}`);
      }
    }
  }

  // Include component from React stack
  if (componentStack) {
    const componentMatch = componentStack.match(/at\s+(\w+)/);
    if (componentMatch) {
      parts.push(`component:${componentMatch[1]}`);
    }
  }

  // Generate hash
  const str = parts.join('|');
  let hash = 0;
  for (let i = 0; i < str.length; i++) {
    const char = str.charCodeAt(i);
    hash = ((hash << 5) - hash) + char;
    hash = hash & hash;
  }
  return Math.abs(hash).toString(36);
}

Metrics Collection and Aggregation

Client-Side Metrics System

// src/observability/metrics.ts

type MetricType = 'counter' | 'gauge' | 'histogram';

interface MetricPoint {
  name: string;
  type: MetricType;
  value: number;
  labels: Record<string, string>;
  timestamp: number;
}

interface HistogramBuckets {
  [bucket: string]: number;
  count: number;
  sum: number;
}

class MetricsCollector {
  private counters: Map<string, number> = new Map();
  private gauges: Map<string, number> = new Map();
  private histograms: Map<string, HistogramBuckets> = new Map();
  private buffer: MetricPoint[] = [];
  private flushTimer: number | null = null;

  private readonly histogramBuckets = [
    5, 10, 25, 50, 100, 250, 500, 1000, 2500, 5000, 10000
  ];

  constructor(
    private endpoint: string = '/api/telemetry/metrics',
    private flushInterval: number = 30000
  ) {
    this.setupAutoFlush();
    this.setupUnloadFlush();
  }

  private getKey(name: string, labels: Record<string, string>): string {
    const sortedLabels = Object.entries(labels)
      .sort(([a], [b]) => a.localeCompare(b))
      .map(([k, v]) => `${k}=${v}`)
      .join(',');
    return `${name}{${sortedLabels}}`;
  }

  increment(name: string, labels: Record<string, string> = {}, value = 1) {
    const key = this.getKey(name, labels);
    const current = this.counters.get(key) || 0;
    this.counters.set(key, current + value);

    this.buffer.push({
      name,
      type: 'counter',
      value,
      labels,
      timestamp: Date.now(),
    });
  }

  gauge(name: string, value: number, labels: Record<string, string> = {}) {
    const key = this.getKey(name, labels);
    this.gauges.set(key, value);

    this.buffer.push({
      name,
      type: 'gauge',
      value,
      labels,
      timestamp: Date.now(),
    });
  }

  histogram(name: string, value: number, labels: Record<string, string> = {}) {
    const key = this.getKey(name, labels);
    let buckets = this.histograms.get(key);

    if (!buckets) {
      buckets = { count: 0, sum: 0 };
      this.histogramBuckets.forEach(b => buckets![`le_${b}`] = 0);
      buckets['le_Inf'] = 0;
      this.histograms.set(key, buckets);
    }

    buckets.count++;
    buckets.sum += value;

    // Increment appropriate buckets
    for (const bucket of this.histogramBuckets) {
      if (value <= bucket) {
        buckets[`le_${bucket}`]++;
      }
    }
    buckets['le_Inf']++;

    this.buffer.push({
      name,
      type: 'histogram',
      value,
      labels,
      timestamp: Date.now(),
    });
  }

  // Timer utility for measuring durations
  startTimer(name: string, labels: Record<string, string> = {}): () => void {
    const start = performance.now();
    return () => {
      const duration = performance.now() - start;
      this.histogram(name, duration, labels);
    };
  }

  private setupAutoFlush() {
    this.flushTimer = window.setInterval(() => {
      this.flush();
    }, this.flushInterval);
  }

  private setupUnloadFlush() {
    document.addEventListener('visibilitychange', () => {
      if (document.visibilityState === 'hidden') {
        this.flush(true);
      }
    });
  }

  private async flush(useBeacon = false) {
    if (this.buffer.length === 0) return;

    const points = [...this.buffer];
    this.buffer = [];

    // Aggregate by name and labels
    const aggregated = this.aggregateMetrics(points);
    const payload = JSON.stringify(aggregated);

    if (useBeacon && navigator.sendBeacon) {
      navigator.sendBeacon(
        this.endpoint,
        new Blob([payload], { type: 'application/json' })
      );
      return;
    }

    try {
      await fetch(this.endpoint, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: payload,
        keepalive: true,
      });
    } catch (error) {
      // Re-buffer on failure
      if (this.buffer.length < 1000) {
        this.buffer.unshift(...points);
      }
    }
  }

  private aggregateMetrics(points: MetricPoint[]) {
    const aggregated: Record<string, {
      type: MetricType;
      labels: Record<string, string>;
      value?: number;
      sum?: number;
      count?: number;
      buckets?: Record<string, number>;
    }> = {};

    for (const point of points) {
      const key = this.getKey(point.name, point.labels);

      if (point.type === 'counter') {
        if (!aggregated[key]) {
          aggregated[key] = { type: 'counter', labels: point.labels, value: 0 };
        }
        aggregated[key].value! += point.value;
      } else if (point.type === 'gauge') {
        // For gauges, use the latest value
        aggregated[key] = { type: 'gauge', labels: point.labels, value: point.value };
      } else if (point.type === 'histogram') {
        if (!aggregated[key]) {
          aggregated[key] = {
            type: 'histogram',
            labels: point.labels,
            sum: 0,
            count: 0,
            buckets: {},
          };
          this.histogramBuckets.forEach(b => aggregated[key].buckets![`le_${b}`] = 0);
          aggregated[key].buckets!['le_Inf'] = 0;
        }
        aggregated[key].sum! += point.value;
        aggregated[key].count!++;
        for (const bucket of this.histogramBuckets) {
          if (point.value <= bucket) {
            aggregated[key].buckets![`le_${bucket}`]++;
          }
        }
        aggregated[key].buckets!['le_Inf']++;
      }
    }

    return {
      timestamp: Date.now(),
      metrics: Object.entries(aggregated).map(([name, data]) => ({
        name: name.split('{')[0],
        ...data,
      })),
    };
  }
}

export const metrics = new MetricsCollector();

// Core Web Vitals collection
export function collectWebVitals() {
  // LCP - Largest Contentful Paint
  new PerformanceObserver((entryList) => {
    const entries = entryList.getEntries();
    const lastEntry = entries[entries.length - 1];
    metrics.histogram('web_vitals.lcp', lastEntry.startTime, {
      element: (lastEntry as PerformancePaintTiming).name || 'unknown',
    });
  }).observe({ type: 'largest-contentful-paint', buffered: true });

  // FID - First Input Delay
  new PerformanceObserver((entryList) => {
    for (const entry of entryList.getEntries()) {
      const fidEntry = entry as PerformanceEventTiming;
      metrics.histogram('web_vitals.fid', fidEntry.processingStart - fidEntry.startTime, {
        event_type: fidEntry.name,
      });
    }
  }).observe({ type: 'first-input', buffered: true });

  // CLS - Cumulative Layout Shift
  let clsValue = 0;
  new PerformanceObserver((entryList) => {
    for (const entry of entryList.getEntries()) {
      const clsEntry = entry as PerformanceEntry & { value: number; hadRecentInput: boolean };
      if (!clsEntry.hadRecentInput) {
        clsValue += clsEntry.value;
      }
    }
    metrics.gauge('web_vitals.cls', clsValue);
  }).observe({ type: 'layout-shift', buffered: true });

  // INP - Interaction to Next Paint
  let maxINP = 0;
  new PerformanceObserver((entryList) => {
    for (const entry of entryList.getEntries()) {
      const inpEntry = entry as PerformanceEventTiming;
      const inp = inpEntry.duration;
      if (inp > maxINP) {
        maxINP = inp;
        metrics.gauge('web_vitals.inp', inp, {
          event_type: inpEntry.name,
        });
      }
    }
  }).observe({ type: 'event', buffered: true });

  // TTFB - Time to First Byte
  const navEntry = performance.getEntriesByType('navigation')[0] as PerformanceNavigationTiming;
  if (navEntry) {
    metrics.histogram('web_vitals.ttfb', navEntry.responseStart - navEntry.requestStart);
    metrics.histogram('web_vitals.fcp', navEntry.domContentLoadedEventStart - navEntry.startTime);
  }
}

Performance Sampling Strategy

Adaptive Sampling Based on Performance Budget

┌─────────────────────────────────────────────────────────────────────────┐
│                    Adaptive Sampling Strategy                           │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  Fast Sessions (< P50)      Normal Sessions        Slow Sessions (>P90) │
│  ────────────────────       ────────────────       ──────────────────── │
│                                                                         │
│  ┌─────────────────┐        ┌─────────────────┐    ┌─────────────────┐ │
│  │ Sample Rate: 1% │        │ Sample Rate: 10%│    │ Sample Rate:100%│ │
│  │                 │        │                 │    │                 │ │
│  │ - Basic metrics │        │ - All metrics   │    │ - All metrics   │ │
│  │ - Errors only   │        │ - Sample logs   │    │ - All logs      │ │
│  │                 │        │ - Key traces    │    │ - Full traces   │ │
│  └─────────────────┘        └─────────────────┘    └─────────────────┘ │
│                                                                         │
│  Why: Most users have        Default sampling     Capture details of   │
│  good experience, minimal    for normal perf     performance problems  │
│  telemetry needed                                                       │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

// src/observability/sampling.ts

interface SamplingConfig {
  baseRate: number;
  errorRate: number;        // Always higher for errors
  slowSessionRate: number;  // Higher for slow sessions
  slowThresholdMs: number;  // When to consider session "slow"
  headSamplingEnabled: boolean;
}

class AdaptiveSampler {
  private config: SamplingConfig;
  private sessionDecision: boolean | null = null;
  private isSlowSession = false;
  private performanceScore = 100;

  constructor(config: Partial<SamplingConfig> = {}) {
    this.config = {
      baseRate: config.baseRate ?? 0.1,
      errorRate: config.errorRate ?? 1.0,
      slowSessionRate: config.slowSessionRate ?? 1.0,
      slowThresholdMs: config.slowThresholdMs ?? 3000,
      headSamplingEnabled: config.headSamplingEnabled ?? true,
    };

    this.initializeSessionSampling();
    this.monitorPerformance();
  }

  private initializeSessionSampling() {
    // Head-based sampling: decide at session start
    if (this.config.headSamplingEnabled) {
      const stored = sessionStorage.getItem('obs_sample_decision');
      if (stored !== null) {
        this.sessionDecision = stored === 'true';
      } else {
        this.sessionDecision = Math.random() < this.config.baseRate;
        sessionStorage.setItem('obs_sample_decision', String(this.sessionDecision));
      }
    }
  }

  private monitorPerformance() {
    // Monitor LCP to detect slow sessions
    new PerformanceObserver((entryList) => {
      const entries = entryList.getEntries();
      const lcp = entries[entries.length - 1];

      if (lcp.startTime > this.config.slowThresholdMs) {
        this.isSlowSession = true;
        this.performanceScore = Math.max(0, 100 - (lcp.startTime / 100));

        // Upgrade sampling for slow sessions
        sessionStorage.setItem('obs_sample_decision', 'true');
        this.sessionDecision = true;
      }
    }).observe({ type: 'largest-contentful-paint', buffered: true });

    // Monitor long tasks
    new PerformanceObserver((entryList) => {
      const entries = entryList.getEntries();
      const longTasks = entries.filter(e => e.duration > 50);

      if (longTasks.length > 5) {
        this.performanceScore = Math.max(0, this.performanceScore - 10);
        if (this.performanceScore < 50) {
          this.isSlowSession = true;
          this.sessionDecision = true;
        }
      }
    }).observe({ type: 'longtask', buffered: true });
  }

  shouldSample(type: 'metric' | 'log' | 'trace' | 'error'): boolean {
    // Always sample errors
    if (type === 'error') {
      return Math.random() < this.config.errorRate;
    }

    // Slow sessions get full sampling
    if (this.isSlowSession) {
      return true;
    }

    // Head-based sampling decision
    if (this.sessionDecision !== null) {
      return this.sessionDecision;
    }

    // Tail-based sampling fallback
    return Math.random() < this.config.baseRate;
  }

  getSamplingContext(): {
    sampled: boolean;
    rate: number;
    reason: string;
    performanceScore: number;
  } {
    const sampled = this.shouldSample('trace');

    let rate = this.config.baseRate;
    let reason = 'base_rate';

    if (this.isSlowSession) {
      rate = this.config.slowSessionRate;
      reason = 'slow_session';
    } else if (this.sessionDecision !== null) {
      reason = 'head_sampled';
    }

    return {
      sampled,
      rate,
      reason,
      performanceScore: this.performanceScore,
    };
  }
}

export const sampler = new AdaptiveSampler();

React Error Boundary Integration

Observability-Aware Error Boundaries

// src/observability/error-boundary.tsx

import React, { Component, ErrorInfo, ReactNode } from 'react';
import { logger } from './logger';
import { metrics } from './metrics';
import { correlation } from './correlation';
import { classifyError, generateErrorFingerprint, ErrorCategory, ErrorSeverity } from './error-taxonomy';

interface ErrorBoundaryProps {
  children: ReactNode;
  name: string;
  fallback?: ReactNode | ((error: Error, reset: () => void) => ReactNode);
  onError?: (error: Error, errorInfo: ErrorInfo) => void;
  level?: 'page' | 'section' | 'component';
}

interface ErrorBoundaryState {
  hasError: boolean;
  error: Error | null;
  errorInfo: ErrorInfo | null;
}

export class ObservableErrorBoundary extends Component<
  ErrorBoundaryProps,
  ErrorBoundaryState
> {
  private errorCount = 0;
  private lastErrorTime = 0;

  constructor(props: ErrorBoundaryProps) {
    super(props);
    this.state = {
      hasError: false,
      error: null,
      errorInfo: null,
    };
  }

  static getDerivedStateFromError(error: Error): Partial<ErrorBoundaryState> {
    return { hasError: true, error };
  }

  componentDidCatch(error: Error, errorInfo: ErrorInfo) {
    const now = Date.now();
    this.errorCount++;

    // Detect error loops
    if (now - this.lastErrorTime < 1000 && this.errorCount > 3) {
      logger.fatal('error_boundary:loop_detected', {
        boundaryName: this.props.name,
        errorCount: this.errorCount,
        error: error.message,
      });
      return;
    }
    this.lastErrorTime = now;

    const context = correlation.getContext();
    const classification = classifyError(error, {
      componentStack: errorInfo.componentStack || undefined,
    });
    const fingerprint = generateErrorFingerprint(error, errorInfo.componentStack || undefined);

    // Log the error with full context
    logger.error('error_boundary:caught', {
      boundaryName: this.props.name,
      boundaryLevel: this.props.level || 'component',
      error: error.message,
      errorName: error.name,
      errorStack: error.stack,
      componentStack: errorInfo.componentStack,
      classification: classification.category,
      severity: classification.severity,
      isRecoverable: classification.isRecoverable,
      fingerprint,
      ...context,
    });

    // Record metrics
    metrics.increment('error_boundary.caught', {
      boundary: this.props.name,
      level: this.props.level || 'component',
      category: classification.category,
      severity: classification.severity,
    });

    // Track error rate by component
    metrics.increment(`error_boundary.${this.props.name}.errors`);

    // Call custom error handler
    this.props.onError?.(error, errorInfo);

    this.setState({ errorInfo });
  }

  private handleReset = () => {
    logger.info('error_boundary:reset', {
      boundaryName: this.props.name,
    });

    metrics.increment('error_boundary.reset', {
      boundary: this.props.name,
    });

    this.errorCount = 0;
    this.setState({
      hasError: false,
      error: null,
      errorInfo: null,
    });
  };

  render() {
    if (this.state.hasError) {
      const { fallback } = this.props;
      const { error } = this.state;

      if (typeof fallback === 'function') {
        return fallback(error!, this.handleReset);
      }

      if (fallback) {
        return fallback;
      }

      // Default fallback based on boundary level
      const classification = error
        ? classifyError(error, { componentStack: this.state.errorInfo?.componentStack || undefined })
        : null;

      return (
        <div className="error-boundary-fallback" data-boundary={this.props.name}>
          <p>{classification?.userMessage || 'Something went wrong'}</p>
          {classification?.isRecoverable && (
            <button onClick={this.handleReset}>Try Again</button>
          )}
        </div>
      );
    }

    return this.props.children;
  }
}

// HOC for wrapping components with observability
export function withErrorBoundary<P extends object>(
  Component: React.ComponentType<P>,
  boundaryProps: Omit<ErrorBoundaryProps, 'children'>
) {
  const WrappedComponent = (props: P) => (
    <ObservableErrorBoundary {...boundaryProps}>
      <Component {...props} />
    </ObservableErrorBoundary>
  );

  WrappedComponent.displayName = `withErrorBoundary(${Component.displayName || Component.name})`;

  return WrappedComponent;
}

Global Error Capture

Window-Level Error Instrumentation

// src/observability/global-handlers.ts

import { logger } from './logger';
import { metrics } from './metrics';
import { correlation } from './correlation';
import { classifyError, generateErrorFingerprint } from './error-taxonomy';

interface GlobalErrorConfig {
  captureUnhandledRejections: boolean;
  captureResourceErrors: boolean;
  captureConsoleErrors: boolean;
  maxErrorsPerMinute: number;
}

class GlobalErrorHandler {
  private config: GlobalErrorConfig;
  private errorCount = 0;
  private errorWindow: number[] = [];
  private readonly ERROR_WINDOW_MS = 60000;

  constructor(config: Partial<GlobalErrorConfig> = {}) {
    this.config = {
      captureUnhandledRejections: config.captureUnhandledRejections ?? true,
      captureResourceErrors: config.captureResourceErrors ?? true,
      captureConsoleErrors: config.captureConsoleErrors ?? false,
      maxErrorsPerMinute: config.maxErrorsPerMinute ?? 50,
    };
  }

  install() {
    this.installErrorHandler();

    if (this.config.captureUnhandledRejections) {
      this.installUnhandledRejectionHandler();
    }

    if (this.config.captureResourceErrors) {
      this.installResourceErrorHandler();
    }

    if (this.config.captureConsoleErrors) {
      this.installConsoleErrorHandler();
    }
  }

  private shouldCapture(): boolean {
    const now = Date.now();
    this.errorWindow = this.errorWindow.filter(t => now - t < this.ERROR_WINDOW_MS);

    if (this.errorWindow.length >= this.config.maxErrorsPerMinute) {
      // Log rate limiting
      if (this.errorWindow.length === this.config.maxErrorsPerMinute) {
        logger.warn('error_capture:rate_limited', {
          maxPerMinute: this.config.maxErrorsPerMinute,
        });
      }
      return false;
    }

    this.errorWindow.push(now);
    return true;
  }

  private installErrorHandler() {
    const originalHandler = window.onerror;

    window.onerror = (
      message: string | Event,
      source?: string,
      lineno?: number,
      colno?: number,
      error?: Error
    ) => {
      if (!this.shouldCapture()) return;

      const actualError = error || new Error(String(message));
      const context = correlation.getContext();
      const classification = classifyError(actualError);
      const fingerprint = generateErrorFingerprint(actualError);

      logger.error('global:uncaught_error', {
        message: String(message),
        source,
        lineno,
        colno,
        error: actualError.message,
        stack: actualError.stack,
        classification: classification.category,
        severity: classification.severity,
        fingerprint,
        ...context,
      });

      metrics.increment('global.uncaught_error', {
        category: classification.category,
        severity: classification.severity,
      });

      // Call original handler
      if (typeof originalHandler === 'function') {
        return originalHandler.call(window, message, source, lineno, colno, error);
      }

      return false;
    };
  }

  private installUnhandledRejectionHandler() {
    window.addEventListener('unhandledrejection', (event: PromiseRejectionEvent) => {
      if (!this.shouldCapture()) return;

      const error = event.reason instanceof Error
        ? event.reason
        : new Error(String(event.reason));

      const context = correlation.getContext();
      const classification = classifyError(error);
      const fingerprint = generateErrorFingerprint(error);

      logger.error('global:unhandled_rejection', {
        reason: String(event.reason),
        error: error.message,
        stack: error.stack,
        classification: classification.category,
        severity: classification.severity,
        fingerprint,
        ...context,
      });

      metrics.increment('global.unhandled_rejection', {
        category: classification.category,
      });
    });
  }

  private installResourceErrorHandler() {
    window.addEventListener('error', (event: ErrorEvent) => {
      // Only capture resource loading errors
      const target = event.target as HTMLElement;
      if (!target || !('tagName' in target)) return;
      if (!this.shouldCapture()) return;

      const tagName = target.tagName.toLowerCase();
      const resourceTypes = ['script', 'link', 'img', 'video', 'audio'];

      if (!resourceTypes.includes(tagName)) return;

      const src = (target as HTMLScriptElement | HTMLImageElement).src ||
                  (target as HTMLLinkElement).href ||
                  'unknown';

      logger.error('global:resource_load_error', {
        resourceType: tagName,
        src,
        ...correlation.getContext(),
      });

      metrics.increment('global.resource_error', {
        type: tagName,
      });
    }, true); // Capture phase to catch resource errors
  }

  private installConsoleErrorHandler() {
    const originalConsoleError = console.error;

    console.error = (...args: unknown[]) => {
      if (this.shouldCapture()) {
        logger.warn('console:error', {
          args: args.map(arg =>
            arg instanceof Error
              ? { message: arg.message, stack: arg.stack }
              : String(arg)
          ),
          ...correlation.getContext(),
        });
      }

      return originalConsoleError.apply(console, args);
    };
  }
}

export const globalErrorHandler = new GlobalErrorHandler();

Telemetry Backend Architecture

Collection and Storage Design

┌─────────────────────────────────────────────────────────────────────────────┐
│                      Telemetry Backend Architecture                         │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  Browsers                                                                   │
│  ────────                                                                   │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐                                     │
│  │ Client  │  │ Client  │  │ Client  │                                     │
│  │    A    │  │    B    │  │    C    │                                     │
│  └────┬────┘  └────┬────┘  └────┬────┘                                     │
│       │            │            │                                           │
│       └────────────┼────────────┘                                           │
│                    │                                                        │
│                    ▼                                                        │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                         Edge Workers                                 │   │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────────┐  │   │
│  │  │  Validation  │─▶│   Sampling   │─▶│   Batching + Buffering   │  │   │
│  │  └──────────────┘  └──────────────┘  └──────────────────────────┘  │   │
│  └──────────────────────────────────────────────────────────┬──────────┘   │
│                                                              │              │
│                                                              ▼              │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                       Message Queue (Kafka)                          │   │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────────┐  │   │
│  │  │ logs-topic   │  │metrics-topic │  │     traces-topic         │  │   │
│  │  └───────┬──────┘  └───────┬──────┘  └───────────┬──────────────┘  │   │
│  └──────────┼─────────────────┼─────────────────────┼──────────────────┘   │
│             │                 │                     │                       │
│             ▼                 ▼                     ▼                       │
│  ┌──────────────────┐ ┌────────────────┐  ┌─────────────────────────┐      │
│  │   ClickHouse     │ │  Prometheus    │  │        Jaeger           │      │
│  │   (Logs + Errors)│ │  (Metrics)     │  │      (Traces)           │      │
│  └────────┬─────────┘ └───────┬────────┘  └───────────┬─────────────┘      │
│           │                   │                       │                     │
│           └───────────────────┼───────────────────────┘                     │
│                               ▼                                             │
│                    ┌────────────────────┐                                   │
│                    │      Grafana       │                                   │
│                    │   (Visualization)  │                                   │
│                    └────────────────────┘                                   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Edge Worker for Telemetry Ingestion

// workers/telemetry-ingest.ts (Cloudflare Worker)

interface TelemetryPayload {
  type: 'logs' | 'metrics' | 'traces';
  data: unknown;
  metadata: {
    sessionId: string;
    timestamp: number;
    userAgent: string;
  };
}

interface Env {
  KAFKA_BROKER: string;
  KAFKA_USERNAME: string;
  KAFKA_PASSWORD: string;
  TELEMETRY_BUFFER: DurableObjectNamespace;
}

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    if (request.method !== 'POST') {
      return new Response('Method not allowed', { status: 405 });
    }

    const url = new URL(request.url);
    const telemetryType = url.pathname.split('/').pop();

    if (!['logs', 'metrics', 'traces'].includes(telemetryType!)) {
      return new Response('Invalid telemetry type', { status: 400 });
    }

    try {
      const payload: TelemetryPayload = await request.json();

      // Validate payload
      const validation = validatePayload(payload);
      if (!validation.valid) {
        return new Response(JSON.stringify({ error: validation.error }), {
          status: 400,
          headers: { 'Content-Type': 'application/json' },
        });
      }

      // Apply sampling at edge
      if (!shouldSample(payload)) {
        return new Response(JSON.stringify({ sampled: false }), {
          status: 200,
          headers: { 'Content-Type': 'application/json' },
        });
      }

      // Enrich with edge context
      const enriched = enrichPayload(payload, request);

      // Buffer in Durable Object for batching
      const bufferId = env.TELEMETRY_BUFFER.idFromName(telemetryType!);
      const buffer = env.TELEMETRY_BUFFER.get(bufferId);

      await buffer.fetch(request.url, {
        method: 'POST',
        body: JSON.stringify(enriched),
      });

      return new Response(JSON.stringify({ success: true }), {
        status: 200,
        headers: {
          'Content-Type': 'application/json',
          'Access-Control-Allow-Origin': '*',
        },
      });

    } catch (error) {
      return new Response(
        JSON.stringify({ error: 'Failed to process telemetry' }),
        { status: 500, headers: { 'Content-Type': 'application/json' } }
      );
    }
  },
};

function validatePayload(payload: TelemetryPayload): { valid: boolean; error?: string } {
  if (!payload.type || !payload.data || !payload.metadata) {
    return { valid: false, error: 'Missing required fields' };
  }

  if (!payload.metadata.sessionId) {
    return { valid: false, error: 'Missing sessionId' };
  }

  // Validate payload size
  const size = JSON.stringify(payload).length;
  if (size > 1024 * 100) { // 100KB limit
    return { valid: false, error: 'Payload too large' };
  }

  return { valid: true };
}

function shouldSample(payload: TelemetryPayload): boolean {
  // Always accept errors
  if (payload.type === 'logs') {
    const logs = payload.data as Array<{ level: string }>;
    if (logs.some(l => l.level === 'error' || l.level === 'fatal')) {
      return true;
    }
  }

  // Sample other telemetry at 10%
  const hash = hashString(payload.metadata.sessionId);
  return (hash % 100) < 10;
}

function hashString(str: string): number {
  let hash = 0;
  for (let i = 0; i < str.length; i++) {
    const char = str.charCodeAt(i);
    hash = ((hash << 5) - hash) + char;
    hash = hash & hash;
  }
  return Math.abs(hash);
}

function enrichPayload(payload: TelemetryPayload, request: Request): TelemetryPayload {
  return {
    ...payload,
    metadata: {
      ...payload.metadata,
      edgeRegion: request.cf?.colo as string,
      edgeCountry: request.cf?.country as string,
      clientIP: request.headers.get('CF-Connecting-IP') || 'unknown',
      receivedAt: Date.now(),
    },
  };
}

Dashboard and Alerting

Key Observability Dashboards

┌─────────────────────────────────────────────────────────────────────────────┐
│                    Frontend Observability Dashboard                         │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                        Key Metrics (Last Hour)                       │   │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────┐ │   │
│  │  │ Error Rate   │  │ P95 LCP      │  │ JS Error %   │  │ Sessions │ │   │
│  │  │    0.12%     │  │   2.1s       │  │    0.05%     │  │   12.5k  │ │   │
│  │  │   ↓ 0.02%    │  │   ↓ 150ms    │  │   ↑ 0.01%    │  │   ↑ 5%   │ │   │
│  │  └──────────────┘  └──────────────┘  └──────────────┘  └──────────┘ │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
│  ┌────────────────────────────────┐  ┌──────────────────────────────────┐  │
│  │     Error Distribution         │  │      Performance Trends          │  │
│  │                                │  │                                  │  │
│  │  HTTP 5xx      ████████ 45%    │  │  LCP ────────────────────────   │  │
│  │  JS TypeError  ████     22%    │  │       ╲                         │  │
│  │  Network       ███      15%    │  │        ╲    ╱─────              │  │
│  │  Chunk Load    ██       10%    │  │         ╲──╱                    │  │
│  │  Other         █         8%    │  │                                  │  │
│  │                                │  │  FID ─────────────────────────  │  │
│  └────────────────────────────────┘  └──────────────────────────────────┘  │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                       Top Errors by Fingerprint                      │   │
│  │  ┌───────────────────────────────────────────────────────────────┐  │   │
│  │  │ Fingerprint  │ Message          │ Count │ Users │ Last Seen   │  │   │
│  │  ├──────────────┼──────────────────┼───────┼───────┼─────────────│  │   │
│  │  │ ab3f9c2      │ Cannot read...   │  234  │  156  │ 2 min ago   │  │   │
│  │  │ 7d2e1b4      │ Network timeout  │  189  │  89   │ 5 min ago   │  │   │
│  │  │ c9a8d3f      │ Chunk load fail  │  145  │  67   │ 1 min ago   │  │   │
│  │  └──────────────┴──────────────────┴───────┴───────┴─────────────┘  │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                    Request Trace Waterfall                           │   │
│  │                                                                      │   │
│  │  User Click    ├──────┤                                    25ms     │   │
│  │  State Update    ├───┤                                     15ms     │   │
│  │  Fetch Start       ├───────────────────────────────────┤  350ms     │   │
│  │  └─ DNS              ├─┤                                   12ms     │   │
│  │  └─ TCP                ├──┤                                28ms     │   │
│  │  └─ TLS                  ├───┤                             35ms     │   │
│  │  └─ Request                  ├─┤                           18ms     │   │
│  │  └─ Server                     ├─────────────────────────┤ 245ms    │   │
│  │     └─ API Gateway               ├─┤                       15ms     │   │
│  │     └─ Auth Service                ├──┤                    22ms     │   │
│  │     └─ Database                      ├───────────────────┤ 198ms    │   │
│  │  └─ Response                                             ├─┤ 12ms   │   │
│  │  Render         ├─────────────────────────────────────────────┤ 380ms│   │
│  │                                                                      │   │
│  │  Total: 405ms                                                        │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Alert Configuration

# observability/alerts.yaml

groups:
  - name: frontend-errors
    rules:
      - alert: HighErrorRate
        expr: |
          sum(rate(global_uncaught_error_total[5m]))
          / sum(rate(page_view_total[5m])) > 0.01
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Frontend error rate above 1%"
          description: "Error rate is {{ $value | humanizePercentage }}"

      - alert: NewErrorSpike
        expr: |
          sum(increase(error_boundary_caught_total[5m])) > 100
          AND sum(increase(error_boundary_caught_total[5m] offset 1h)) < 20
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "New error pattern detected"

  - name: frontend-performance
    rules:
      - alert: LCPRegression
        expr: |
          histogram_quantile(0.95, sum(rate(web_vitals_lcp_bucket[15m])) by (le))
          > 3000
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "P95 LCP above 3 seconds"
          description: "Current P95 LCP: {{ $value }}ms"

      - alert: HighINP
        expr: |
          histogram_quantile(0.95, sum(rate(web_vitals_inp_bucket[15m])) by (le))
          > 500
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "P95 INP above 500ms (poor responsiveness)"

      - alert: ChunkLoadFailures
        expr: |
          sum(rate(resource_error_total{type="script"}[5m])) > 10
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High rate of JavaScript chunk load failures"

  - name: frontend-availability
    rules:
      - alert: CDNLatencySpike
        expr: |
          histogram_quantile(0.95,
            sum(rate(http_client_duration_bucket{host=~"cdn.*"}[5m])) by (le)
          ) > 500
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "CDN P95 latency above 500ms"

Initialization and Configuration

Complete Observability Setup

// src/observability/index.ts

import { correlation } from './correlation';
import { logger } from './logger';
import { metrics, collectWebVitals } from './metrics';
import { globalErrorHandler } from './global-handlers';
import { sampler } from './sampling';

interface ObservabilityConfig {
  serviceName: string;
  serviceVersion: string;
  environment: 'development' | 'staging' | 'production';
  telemetryEndpoint: string;
  sampleRate: number;
  debug: boolean;
}

let initialized = false;

export function initializeObservability(config: ObservabilityConfig) {
  if (initialized) {
    console.warn('Observability already initialized');
    return;
  }

  // Set global context
  (window as any).__OBSERVABILITY__ = {
    config,
    correlation,
    logger,
    metrics,
  };

  // Install global error handlers
  globalErrorHandler.install();

  // Start collecting Web Vitals
  collectWebVitals();

  // Log page view
  logger.info('page:view', {
    url: location.href,
    referrer: document.referrer,
    serviceName: config.serviceName,
    serviceVersion: config.serviceVersion,
    environment: config.environment,
  });

  // Track page visibility
  document.addEventListener('visibilitychange', () => {
    if (document.visibilityState === 'hidden') {
      const context = correlation.getContext();
      logger.info('page:hidden', {
        timeOnPage: performance.now(),
        ...context,
      });
    }
  });

  // Track navigation timing
  if ('PerformanceNavigationTiming' in window) {
    const navEntry = performance.getEntriesByType('navigation')[0] as PerformanceNavigationTiming;
    if (navEntry) {
      metrics.histogram('navigation.dns', navEntry.domainLookupEnd - navEntry.domainLookupStart);
      metrics.histogram('navigation.tcp', navEntry.connectEnd - navEntry.connectStart);
      metrics.histogram('navigation.ttfb', navEntry.responseStart - navEntry.requestStart);
      metrics.histogram('navigation.download', navEntry.responseEnd - navEntry.responseStart);
      metrics.histogram('navigation.dom_interactive', navEntry.domInteractive - navEntry.startTime);
      metrics.histogram('navigation.dom_complete', navEntry.domComplete - navEntry.startTime);
      metrics.histogram('navigation.load', navEntry.loadEventEnd - navEntry.startTime);
    }
  }

  initialized = true;

  if (config.debug) {
    console.log('[Observability] Initialized', config);
  }
}

// Export all modules
export { correlation } from './correlation';
export { logger } from './logger';
export { metrics } from './metrics';
export { instrumentedFetch } from './instrumented-fetch';
export { ObservableErrorBoundary, withErrorBoundary } from './error-boundary';
export { classifyError, ErrorCategory, ErrorSeverity } from './error-taxonomy';

// Usage in app entry point:
// initializeObservability({
//   serviceName: 'my-app',
//   serviceVersion: '1.2.3',
//   environment: 'production',
//   telemetryEndpoint: 'https://telemetry.example.com',
//   sampleRate: 0.1,
//   debug: false,
// });

Key Takeaways

Correlation is foundational: Every telemetry event must carry session, interaction, and trace IDs to enable end-to-end debugging
Error taxonomy enables automation: Classifying errors by category and severity allows automated routing, alerting, and user-facing messaging
Adaptive sampling balances cost and visibility: Sample more from slow or problematic sessions where debugging data is most valuable
Edge processing reduces latency and cost: Validate, sample, and batch telemetry at the edge before sending to storage
Fingerprinting enables deduplication: Group similar errors to understand impact and prioritize fixes
Web Vitals provide user-centric metrics: LCP, FID, CLS, and INP correlate with actual user experience
Structured logging beats unstructured: Consistent fields enable querying, alerting, and automated analysis
Error boundaries provide isolation: Contain failures and capture context at the component level
Beacon API ensures delivery: Use sendBeacon for reliable telemetry on page unload
Dashboards tell the story: Combine metrics, errors, and traces to understand system behavior

Enterprise observability isn't about collecting more data—it's about collecting the right data with the context needed to act on it quickly.

What did you think?