Frontend Observability Architecture for Enterprise Apps
Frontend Observability Architecture for Enterprise Apps
Beyond Console.log: Building Production-Grade Visibility
Enterprise frontend applications generate millions of events daily across thousands of users, browsers, and network conditions. Traditional logging approaches—scattered console.log statements and basic error tracking—fail to provide the visibility needed to understand system behavior, diagnose production issues, or optimize performance at scale.
This article presents a comprehensive observability architecture that treats frontend telemetry as a distributed systems problem, implementing the three pillars of observability (metrics, logs, traces) with correlation capabilities that enable true end-to-end visibility from user interaction through API response.
The Three Pillars in Frontend Context
┌─────────────────────────────────────────────────────────────────────────┐
│ Frontend Observability Stack │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────────┐ │
│ │ METRICS │ │ LOGS │ │ TRACES │ │
│ │ │ │ │ │ │ │
│ │ Aggregated │ │ Structured │ │ Request flow across │ │
│ │ numerical │ │ events with │ │ browser → edge → API │ │
│ │ measurements│ │ context │ │ │ │
│ └──────┬──────┘ └──────┬──────┘ └─────────────┬───────────────┘ │
│ │ │ │ │
│ └──────────────────┼─────────────────────────┘ │
│ │ │
│ ┌───────▼───────┐ │
│ │ CORRELATION │ │
│ │ │ │
│ │ request_id │ │
│ │ session_id │ │
│ │ trace_id │ │
│ │ span_id │ │
│ └───────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Correlation Identity Model
Every telemetry event must carry identifiers that enable correlation across the entire request lifecycle:
// src/observability/correlation.ts
interface CorrelationContext {
// Persists across the entire user session
sessionId: string;
// Unique per page load, survives navigation in SPA
pageViewId: string;
// Unique per user interaction (click, form submit, etc.)
interactionId: string;
// W3C Trace Context - propagated to backend
traceId: string;
spanId: string;
// Links related requests (e.g., retry of same operation)
correlationId: string;
}
class CorrelationManager {
private context: CorrelationContext;
private spanStack: string[] = [];
constructor() {
this.context = {
sessionId: this.getOrCreateSessionId(),
pageViewId: this.generateId(),
interactionId: '',
traceId: this.generateTraceId(),
spanId: this.generateSpanId(),
correlationId: '',
};
}
private getOrCreateSessionId(): string {
const stored = sessionStorage.getItem('obs_session_id');
if (stored) return stored;
const sessionId = this.generateId();
sessionStorage.setItem('obs_session_id', sessionId);
return sessionId;
}
private generateId(): string {
return crypto.randomUUID();
}
private generateTraceId(): string {
// W3C Trace Context: 16 bytes / 32 hex chars
const bytes = new Uint8Array(16);
crypto.getRandomValues(bytes);
return Array.from(bytes, b => b.toString(16).padStart(2, '0')).join('');
}
private generateSpanId(): string {
// W3C Trace Context: 8 bytes / 16 hex chars
const bytes = new Uint8Array(8);
crypto.getRandomValues(bytes);
return Array.from(bytes, b => b.toString(16).padStart(2, '0')).join('');
}
// Start a new interaction (user action like click, submit)
startInteraction(name: string): () => void {
const previousInteractionId = this.context.interactionId;
const previousCorrelationId = this.context.correlationId;
this.context.interactionId = this.generateId();
this.context.correlationId = this.generateId();
this.context.traceId = this.generateTraceId();
this.context.spanId = this.generateSpanId();
// Return cleanup function
return () => {
this.context.interactionId = previousInteractionId;
this.context.correlationId = previousCorrelationId;
};
}
// Create child span for nested operations
startSpan(name: string): SpanContext {
const parentSpanId = this.context.spanId;
const newSpanId = this.generateSpanId();
this.spanStack.push(parentSpanId);
this.context.spanId = newSpanId;
return {
traceId: this.context.traceId,
spanId: newSpanId,
parentSpanId,
name,
startTime: performance.now(),
end: () => {
this.context.spanId = this.spanStack.pop() || this.generateSpanId();
},
};
}
// Get headers for outgoing HTTP requests
getTraceHeaders(): Record<string, string> {
return {
'traceparent': `00-${this.context.traceId}-${this.context.spanId}-01`,
'x-correlation-id': this.context.correlationId,
'x-session-id': this.context.sessionId,
'x-interaction-id': this.context.interactionId,
};
}
getContext(): Readonly<CorrelationContext> {
return { ...this.context };
}
}
interface SpanContext {
traceId: string;
spanId: string;
parentSpanId: string;
name: string;
startTime: number;
end: () => void;
}
export const correlation = new CorrelationManager();
Distributed Tracing: Browser to Backend
Request Flow Instrumentation
┌──────────────────────────────────────────────────────────────────────────────┐
│ Distributed Trace Flow │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ Browser Edge/CDN API Gateway Services │
│ ──────── ──────── ─────────── ──────── │
│ │
│ ┌─────────────┐ │
│ │ User Click │ │
│ │ span_id: a1 │ │
│ └──────┬──────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ State Update│ │ │ │
│ │ span_id: a2 │ │ │ │
│ │ parent: a1 │ │ │ │
│ └──────┬──────┘ │ │ │
│ │ │ │ │
│ ▼ ▼ │ │
│ ┌─────────────┐ ┌─────────────┐ │ │
│ │ Fetch Start │ │ Edge Worker │ │ │
│ │ span_id: a3 │──│ span_id: a4 │ │ │
│ │ parent: a2 │ │ parent: a3 │ │ │
│ └─────────────┘ └──────┬──────┘ │ │
│ │ ▼ │
│ │ ┌─────────────┐ ┌─────────────┐ │
│ └───▶│ API Gateway │───▶│ User Service│ │
│ │ span_id: a5 │ │ span_id: a6 │ │
│ │ parent: a4 │ │ parent: a5 │ │
│ └─────────────┘ └─────────────┘ │
│ │
│ ═══════════════════════════════════════════════════════════════════════ │
│ trace_id: 4bf92f3577b34da6a3ce929d0e0e4736 (same across all spans) │
│ │
└──────────────────────────────────────────────────────────────────────────────┘
Fetch Instrumentation with Tracing
// src/observability/instrumented-fetch.ts
import { correlation } from './correlation';
import { metrics } from './metrics';
import { logger } from './logger';
interface FetchTiming {
dnsLookup: number;
tcpConnection: number;
tlsHandshake: number;
requestSent: number;
waiting: number;
contentDownload: number;
total: number;
}
interface InstrumentedResponse<T> extends Response {
data: T;
timing: FetchTiming;
traceId: string;
}
export async function instrumentedFetch<T = unknown>(
url: string,
options: RequestInit = {}
): Promise<InstrumentedResponse<T>> {
const span = correlation.startSpan(`fetch:${new URL(url, location.origin).pathname}`);
const startTime = performance.now();
// Inject trace context headers
const headers = new Headers(options.headers);
const traceHeaders = correlation.getTraceHeaders();
Object.entries(traceHeaders).forEach(([key, value]) => {
headers.set(key, value);
});
// Add timing metadata
const requestId = crypto.randomUUID();
headers.set('x-request-id', requestId);
const context = correlation.getContext();
logger.debug('fetch:start', {
url,
method: options.method || 'GET',
requestId,
...context,
});
try {
const response = await fetch(url, {
...options,
headers,
});
const endTime = performance.now();
const duration = endTime - startTime;
// Extract server timing if available
const serverTiming = parseServerTiming(response.headers.get('server-timing'));
// Capture resource timing for this request
const timing = await captureResourceTiming(url, startTime);
// Record metrics
metrics.histogram('http.client.duration', duration, {
method: options.method || 'GET',
status: response.status.toString(),
host: new URL(url, location.origin).host,
});
if (!response.ok) {
metrics.increment('http.client.error', {
method: options.method || 'GET',
status: response.status.toString(),
host: new URL(url, location.origin).host,
});
logger.warn('fetch:error', {
url,
status: response.status,
statusText: response.statusText,
duration,
requestId,
serverTiming,
...context,
});
}
// Parse response body
const contentType = response.headers.get('content-type') || '';
let data: T;
if (contentType.includes('application/json')) {
data = await response.json();
} else {
data = await response.text() as unknown as T;
}
logger.info('fetch:complete', {
url,
method: options.method || 'GET',
status: response.status,
duration,
timing,
serverTiming,
requestId,
responseTraceId: response.headers.get('x-trace-id'),
...context,
});
span.end();
return Object.assign(response, {
data,
timing,
traceId: context.traceId,
}) as InstrumentedResponse<T>;
} catch (error) {
const endTime = performance.now();
const duration = endTime - startTime;
metrics.increment('http.client.error', {
method: options.method || 'GET',
error_type: error instanceof Error ? error.name : 'unknown',
host: new URL(url, location.origin).host,
});
logger.error('fetch:failed', {
url,
method: options.method || 'GET',
error: error instanceof Error ? error.message : String(error),
errorType: error instanceof Error ? error.name : 'unknown',
duration,
requestId,
...context,
});
span.end();
throw error;
}
}
function parseServerTiming(header: string | null): Record<string, number> {
if (!header) return {};
const timings: Record<string, number> = {};
// Parse Server-Timing header: "db;dur=53.2, cache;dur=0.1, app;dur=47.2"
header.split(',').forEach(entry => {
const match = entry.trim().match(/^(\w+)(?:;.*dur=(\d+(?:\.\d+)?))?/);
if (match) {
const [, name, duration] = match;
timings[name] = duration ? parseFloat(duration) : 0;
}
});
return timings;
}
async function captureResourceTiming(
url: string,
requestStartTime: number
): Promise<FetchTiming> {
// Wait for resource timing entry to be available
await new Promise(resolve => setTimeout(resolve, 0));
const entries = performance.getEntriesByType('resource') as PerformanceResourceTiming[];
const entry = entries
.filter(e => e.name.includes(url) && e.startTime >= requestStartTime - 10)
.pop();
if (!entry) {
return {
dnsLookup: 0,
tcpConnection: 0,
tlsHandshake: 0,
requestSent: 0,
waiting: 0,
contentDownload: 0,
total: 0,
};
}
return {
dnsLookup: entry.domainLookupEnd - entry.domainLookupStart,
tcpConnection: entry.connectEnd - entry.connectStart,
tlsHandshake: entry.secureConnectionStart > 0
? entry.connectEnd - entry.secureConnectionStart
: 0,
requestSent: entry.responseStart - entry.requestStart,
waiting: entry.responseStart - entry.requestStart,
contentDownload: entry.responseEnd - entry.responseStart,
total: entry.responseEnd - entry.startTime,
};
}
Structured Logging System
Log Levels and Semantic Structure
// src/observability/logger.ts
import { correlation } from './correlation';
type LogLevel = 'debug' | 'info' | 'warn' | 'error' | 'fatal';
interface LogEntry {
timestamp: string;
level: LogLevel;
message: string;
context: Record<string, unknown>;
correlation: {
sessionId: string;
pageViewId: string;
interactionId: string;
traceId: string;
spanId: string;
};
environment: {
url: string;
userAgent: string;
viewport: { width: number; height: number };
connection?: {
effectiveType: string;
downlink: number;
rtt: number;
};
};
}
interface LoggerConfig {
minLevel: LogLevel;
sampleRate: number;
batchSize: number;
flushInterval: number;
endpoint: string;
}
const LOG_LEVELS: Record<LogLevel, number> = {
debug: 0,
info: 1,
warn: 2,
error: 3,
fatal: 4,
};
class StructuredLogger {
private config: LoggerConfig;
private buffer: LogEntry[] = [];
private flushTimer: number | null = null;
constructor(config: Partial<LoggerConfig> = {}) {
this.config = {
minLevel: config.minLevel ?? 'info',
sampleRate: config.sampleRate ?? 1.0,
batchSize: config.batchSize ?? 50,
flushInterval: config.flushInterval ?? 5000,
endpoint: config.endpoint ?? '/api/telemetry/logs',
};
this.setupAutoFlush();
this.setupUnloadFlush();
}
private shouldLog(level: LogLevel): boolean {
return LOG_LEVELS[level] >= LOG_LEVELS[this.config.minLevel];
}
private shouldSample(): boolean {
return Math.random() < this.config.sampleRate;
}
private getEnvironment(): LogEntry['environment'] {
const nav = navigator as Navigator & {
connection?: {
effectiveType: string;
downlink: number;
rtt: number;
};
};
return {
url: location.href,
userAgent: navigator.userAgent,
viewport: {
width: window.innerWidth,
height: window.innerHeight,
},
connection: nav.connection ? {
effectiveType: nav.connection.effectiveType,
downlink: nav.connection.downlink,
rtt: nav.connection.rtt,
} : undefined,
};
}
private createEntry(
level: LogLevel,
message: string,
context: Record<string, unknown>
): LogEntry {
const correlationContext = correlation.getContext();
return {
timestamp: new Date().toISOString(),
level,
message,
context: this.sanitizeContext(context),
correlation: {
sessionId: correlationContext.sessionId,
pageViewId: correlationContext.pageViewId,
interactionId: correlationContext.interactionId,
traceId: correlationContext.traceId,
spanId: correlationContext.spanId,
},
environment: this.getEnvironment(),
};
}
private sanitizeContext(context: Record<string, unknown>): Record<string, unknown> {
const sanitized: Record<string, unknown> = {};
const sensitiveKeys = ['password', 'token', 'secret', 'apiKey', 'authorization'];
for (const [key, value] of Object.entries(context)) {
if (sensitiveKeys.some(sk => key.toLowerCase().includes(sk))) {
sanitized[key] = '[REDACTED]';
} else if (typeof value === 'object' && value !== null) {
sanitized[key] = JSON.stringify(value).substring(0, 1000);
} else {
sanitized[key] = value;
}
}
return sanitized;
}
private log(level: LogLevel, message: string, context: Record<string, unknown> = {}) {
if (!this.shouldLog(level)) return;
// Always log errors, sample others
if (level !== 'error' && level !== 'fatal' && !this.shouldSample()) return;
const entry = this.createEntry(level, message, context);
// Console output in development
if (process.env.NODE_ENV === 'development') {
const consoleMethod = level === 'fatal' ? 'error' : level;
console[consoleMethod](`[${level.toUpperCase()}] ${message}`, context);
}
this.buffer.push(entry);
if (this.buffer.length >= this.config.batchSize) {
this.flush();
}
}
debug(message: string, context?: Record<string, unknown>) {
this.log('debug', message, context);
}
info(message: string, context?: Record<string, unknown>) {
this.log('info', message, context);
}
warn(message: string, context?: Record<string, unknown>) {
this.log('warn', message, context);
}
error(message: string, context?: Record<string, unknown>) {
this.log('error', message, context);
}
fatal(message: string, context?: Record<string, unknown>) {
this.log('fatal', message, context);
// Immediately flush fatal errors
this.flush();
}
private setupAutoFlush() {
this.flushTimer = window.setInterval(() => {
if (this.buffer.length > 0) {
this.flush();
}
}, this.config.flushInterval);
}
private setupUnloadFlush() {
// Use visibilitychange for more reliable flush
document.addEventListener('visibilitychange', () => {
if (document.visibilityState === 'hidden') {
this.flush(true);
}
});
// Fallback for page unload
window.addEventListener('pagehide', () => {
this.flush(true);
});
}
private async flush(useBeacon = false) {
if (this.buffer.length === 0) return;
const entries = [...this.buffer];
this.buffer = [];
const payload = JSON.stringify({ logs: entries });
if (useBeacon && navigator.sendBeacon) {
navigator.sendBeacon(
this.config.endpoint,
new Blob([payload], { type: 'application/json' })
);
return;
}
try {
await fetch(this.config.endpoint, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: payload,
keepalive: true,
});
} catch (error) {
// Re-add entries on failure (with limit to prevent memory issues)
if (this.buffer.length < 200) {
this.buffer.unshift(...entries);
}
console.error('Failed to flush logs:', error);
}
}
}
export const logger = new StructuredLogger({
minLevel: process.env.NODE_ENV === 'development' ? 'debug' : 'info',
sampleRate: process.env.NODE_ENV === 'development' ? 1.0 : 0.1,
});
Error Taxonomy and Classification
Semantic Error Categories
// src/observability/error-taxonomy.ts
export enum ErrorCategory {
// Network and connectivity
NETWORK_OFFLINE = 'network.offline',
NETWORK_TIMEOUT = 'network.timeout',
NETWORK_DNS_FAILURE = 'network.dns_failure',
NETWORK_CONNECTION_REFUSED = 'network.connection_refused',
// HTTP errors
HTTP_CLIENT_ERROR = 'http.client_error', // 4xx
HTTP_SERVER_ERROR = 'http.server_error', // 5xx
HTTP_RATE_LIMITED = 'http.rate_limited', // 429
HTTP_UNAUTHORIZED = 'http.unauthorized', // 401
HTTP_FORBIDDEN = 'http.forbidden', // 403
HTTP_NOT_FOUND = 'http.not_found', // 404
// JavaScript runtime
JS_TYPE_ERROR = 'js.type_error',
JS_REFERENCE_ERROR = 'js.reference_error',
JS_SYNTAX_ERROR = 'js.syntax_error',
JS_RANGE_ERROR = 'js.range_error',
// React-specific
REACT_RENDER_ERROR = 'react.render_error',
REACT_HYDRATION_MISMATCH = 'react.hydration_mismatch',
REACT_HOOK_ERROR = 'react.hook_error',
REACT_SUSPENSE_ERROR = 'react.suspense_error',
// Application-specific
APP_VALIDATION_ERROR = 'app.validation_error',
APP_STATE_CORRUPTION = 'app.state_corruption',
APP_INVARIANT_VIOLATION = 'app.invariant_violation',
APP_FEATURE_FLAG_ERROR = 'app.feature_flag_error',
// Resource loading
RESOURCE_SCRIPT_LOAD = 'resource.script_load',
RESOURCE_STYLE_LOAD = 'resource.style_load',
RESOURCE_IMAGE_LOAD = 'resource.image_load',
RESOURCE_CHUNK_LOAD = 'resource.chunk_load',
// Storage
STORAGE_QUOTA_EXCEEDED = 'storage.quota_exceeded',
STORAGE_UNAVAILABLE = 'storage.unavailable',
// Unknown
UNKNOWN = 'unknown',
}
export enum ErrorSeverity {
LOW = 'low', // Cosmetic issues, non-blocking
MEDIUM = 'medium', // Feature degradation but app usable
HIGH = 'high', // Core functionality impacted
CRITICAL = 'critical', // App unusable
}
interface ClassifiedError {
category: ErrorCategory;
severity: ErrorSeverity;
isRecoverable: boolean;
suggestedAction: 'retry' | 'refresh' | 'ignore' | 'escalate' | 'offline_fallback';
userMessage: string;
}
export function classifyError(error: Error, context?: {
httpStatus?: number;
url?: string;
componentStack?: string;
}): ClassifiedError {
// Network errors
if (error instanceof TypeError && error.message.includes('Failed to fetch')) {
if (!navigator.onLine) {
return {
category: ErrorCategory.NETWORK_OFFLINE,
severity: ErrorSeverity.MEDIUM,
isRecoverable: true,
suggestedAction: 'offline_fallback',
userMessage: 'You appear to be offline. Some features may be unavailable.',
};
}
return {
category: ErrorCategory.NETWORK_TIMEOUT,
severity: ErrorSeverity.MEDIUM,
isRecoverable: true,
suggestedAction: 'retry',
userMessage: 'Connection issue. Please try again.',
};
}
// HTTP status-based classification
if (context?.httpStatus) {
const status = context.httpStatus;
if (status === 401) {
return {
category: ErrorCategory.HTTP_UNAUTHORIZED,
severity: ErrorSeverity.HIGH,
isRecoverable: false,
suggestedAction: 'escalate',
userMessage: 'Your session has expired. Please sign in again.',
};
}
if (status === 403) {
return {
category: ErrorCategory.HTTP_FORBIDDEN,
severity: ErrorSeverity.HIGH,
isRecoverable: false,
suggestedAction: 'escalate',
userMessage: "You don't have permission to access this resource.",
};
}
if (status === 404) {
return {
category: ErrorCategory.HTTP_NOT_FOUND,
severity: ErrorSeverity.MEDIUM,
isRecoverable: false,
suggestedAction: 'ignore',
userMessage: 'The requested resource was not found.',
};
}
if (status === 429) {
return {
category: ErrorCategory.HTTP_RATE_LIMITED,
severity: ErrorSeverity.MEDIUM,
isRecoverable: true,
suggestedAction: 'retry',
userMessage: 'Too many requests. Please wait a moment.',
};
}
if (status >= 500) {
return {
category: ErrorCategory.HTTP_SERVER_ERROR,
severity: ErrorSeverity.HIGH,
isRecoverable: true,
suggestedAction: 'retry',
userMessage: 'Server error. Our team has been notified.',
};
}
}
// React-specific errors
if (context?.componentStack) {
if (error.message.includes('Hydration')) {
return {
category: ErrorCategory.REACT_HYDRATION_MISMATCH,
severity: ErrorSeverity.MEDIUM,
isRecoverable: true,
suggestedAction: 'refresh',
userMessage: 'Page display issue. Refreshing may help.',
};
}
if (error.message.includes('hook')) {
return {
category: ErrorCategory.REACT_HOOK_ERROR,
severity: ErrorSeverity.HIGH,
isRecoverable: false,
suggestedAction: 'refresh',
userMessage: 'An error occurred. Please refresh the page.',
};
}
return {
category: ErrorCategory.REACT_RENDER_ERROR,
severity: ErrorSeverity.HIGH,
isRecoverable: false,
suggestedAction: 'refresh',
userMessage: 'An error occurred while displaying this content.',
};
}
// JavaScript runtime errors
if (error instanceof TypeError) {
return {
category: ErrorCategory.JS_TYPE_ERROR,
severity: ErrorSeverity.HIGH,
isRecoverable: false,
suggestedAction: 'refresh',
userMessage: 'An unexpected error occurred.',
};
}
if (error instanceof ReferenceError) {
return {
category: ErrorCategory.JS_REFERENCE_ERROR,
severity: ErrorSeverity.HIGH,
isRecoverable: false,
suggestedAction: 'refresh',
userMessage: 'An unexpected error occurred.',
};
}
// Chunk loading errors
if (error.message.includes('Loading chunk') || error.message.includes('ChunkLoadError')) {
return {
category: ErrorCategory.RESOURCE_CHUNK_LOAD,
severity: ErrorSeverity.HIGH,
isRecoverable: true,
suggestedAction: 'refresh',
userMessage: 'Failed to load application component. Please refresh.',
};
}
// Storage errors
if (error.name === 'QuotaExceededError') {
return {
category: ErrorCategory.STORAGE_QUOTA_EXCEEDED,
severity: ErrorSeverity.LOW,
isRecoverable: false,
suggestedAction: 'ignore',
userMessage: 'Storage is full. Some features may be limited.',
};
}
// Default classification
return {
category: ErrorCategory.UNKNOWN,
severity: ErrorSeverity.MEDIUM,
isRecoverable: false,
suggestedAction: 'escalate',
userMessage: 'An unexpected error occurred.',
};
}
// Error fingerprinting for deduplication
export function generateErrorFingerprint(error: Error, componentStack?: string): string {
const parts: string[] = [
error.name,
error.message.replace(/\d+/g, 'N').replace(/['"]/g, ''),
];
// Include first meaningful stack frame
if (error.stack) {
const frames = error.stack.split('\n').slice(1, 4);
const meaningfulFrame = frames.find(f =>
!f.includes('node_modules') &&
!f.includes('webpack') &&
!f.includes('<anonymous>')
);
if (meaningfulFrame) {
// Extract file and line, normalize
const match = meaningfulFrame.match(/at\s+(\S+)\s+\(([^:]+):(\d+)/);
if (match) {
parts.push(`${match[1]}@${match[2]}:${match[3]}`);
}
}
}
// Include component from React stack
if (componentStack) {
const componentMatch = componentStack.match(/at\s+(\w+)/);
if (componentMatch) {
parts.push(`component:${componentMatch[1]}`);
}
}
// Generate hash
const str = parts.join('|');
let hash = 0;
for (let i = 0; i < str.length; i++) {
const char = str.charCodeAt(i);
hash = ((hash << 5) - hash) + char;
hash = hash & hash;
}
return Math.abs(hash).toString(36);
}
Metrics Collection and Aggregation
Client-Side Metrics System
// src/observability/metrics.ts
type MetricType = 'counter' | 'gauge' | 'histogram';
interface MetricPoint {
name: string;
type: MetricType;
value: number;
labels: Record<string, string>;
timestamp: number;
}
interface HistogramBuckets {
[bucket: string]: number;
count: number;
sum: number;
}
class MetricsCollector {
private counters: Map<string, number> = new Map();
private gauges: Map<string, number> = new Map();
private histograms: Map<string, HistogramBuckets> = new Map();
private buffer: MetricPoint[] = [];
private flushTimer: number | null = null;
private readonly histogramBuckets = [
5, 10, 25, 50, 100, 250, 500, 1000, 2500, 5000, 10000
];
constructor(
private endpoint: string = '/api/telemetry/metrics',
private flushInterval: number = 30000
) {
this.setupAutoFlush();
this.setupUnloadFlush();
}
private getKey(name: string, labels: Record<string, string>): string {
const sortedLabels = Object.entries(labels)
.sort(([a], [b]) => a.localeCompare(b))
.map(([k, v]) => `${k}=${v}`)
.join(',');
return `${name}{${sortedLabels}}`;
}
increment(name: string, labels: Record<string, string> = {}, value = 1) {
const key = this.getKey(name, labels);
const current = this.counters.get(key) || 0;
this.counters.set(key, current + value);
this.buffer.push({
name,
type: 'counter',
value,
labels,
timestamp: Date.now(),
});
}
gauge(name: string, value: number, labels: Record<string, string> = {}) {
const key = this.getKey(name, labels);
this.gauges.set(key, value);
this.buffer.push({
name,
type: 'gauge',
value,
labels,
timestamp: Date.now(),
});
}
histogram(name: string, value: number, labels: Record<string, string> = {}) {
const key = this.getKey(name, labels);
let buckets = this.histograms.get(key);
if (!buckets) {
buckets = { count: 0, sum: 0 };
this.histogramBuckets.forEach(b => buckets![`le_${b}`] = 0);
buckets['le_Inf'] = 0;
this.histograms.set(key, buckets);
}
buckets.count++;
buckets.sum += value;
// Increment appropriate buckets
for (const bucket of this.histogramBuckets) {
if (value <= bucket) {
buckets[`le_${bucket}`]++;
}
}
buckets['le_Inf']++;
this.buffer.push({
name,
type: 'histogram',
value,
labels,
timestamp: Date.now(),
});
}
// Timer utility for measuring durations
startTimer(name: string, labels: Record<string, string> = {}): () => void {
const start = performance.now();
return () => {
const duration = performance.now() - start;
this.histogram(name, duration, labels);
};
}
private setupAutoFlush() {
this.flushTimer = window.setInterval(() => {
this.flush();
}, this.flushInterval);
}
private setupUnloadFlush() {
document.addEventListener('visibilitychange', () => {
if (document.visibilityState === 'hidden') {
this.flush(true);
}
});
}
private async flush(useBeacon = false) {
if (this.buffer.length === 0) return;
const points = [...this.buffer];
this.buffer = [];
// Aggregate by name and labels
const aggregated = this.aggregateMetrics(points);
const payload = JSON.stringify(aggregated);
if (useBeacon && navigator.sendBeacon) {
navigator.sendBeacon(
this.endpoint,
new Blob([payload], { type: 'application/json' })
);
return;
}
try {
await fetch(this.endpoint, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: payload,
keepalive: true,
});
} catch (error) {
// Re-buffer on failure
if (this.buffer.length < 1000) {
this.buffer.unshift(...points);
}
}
}
private aggregateMetrics(points: MetricPoint[]) {
const aggregated: Record<string, {
type: MetricType;
labels: Record<string, string>;
value?: number;
sum?: number;
count?: number;
buckets?: Record<string, number>;
}> = {};
for (const point of points) {
const key = this.getKey(point.name, point.labels);
if (point.type === 'counter') {
if (!aggregated[key]) {
aggregated[key] = { type: 'counter', labels: point.labels, value: 0 };
}
aggregated[key].value! += point.value;
} else if (point.type === 'gauge') {
// For gauges, use the latest value
aggregated[key] = { type: 'gauge', labels: point.labels, value: point.value };
} else if (point.type === 'histogram') {
if (!aggregated[key]) {
aggregated[key] = {
type: 'histogram',
labels: point.labels,
sum: 0,
count: 0,
buckets: {},
};
this.histogramBuckets.forEach(b => aggregated[key].buckets![`le_${b}`] = 0);
aggregated[key].buckets!['le_Inf'] = 0;
}
aggregated[key].sum! += point.value;
aggregated[key].count!++;
for (const bucket of this.histogramBuckets) {
if (point.value <= bucket) {
aggregated[key].buckets![`le_${bucket}`]++;
}
}
aggregated[key].buckets!['le_Inf']++;
}
}
return {
timestamp: Date.now(),
metrics: Object.entries(aggregated).map(([name, data]) => ({
name: name.split('{')[0],
...data,
})),
};
}
}
export const metrics = new MetricsCollector();
// Core Web Vitals collection
export function collectWebVitals() {
// LCP - Largest Contentful Paint
new PerformanceObserver((entryList) => {
const entries = entryList.getEntries();
const lastEntry = entries[entries.length - 1];
metrics.histogram('web_vitals.lcp', lastEntry.startTime, {
element: (lastEntry as PerformancePaintTiming).name || 'unknown',
});
}).observe({ type: 'largest-contentful-paint', buffered: true });
// FID - First Input Delay
new PerformanceObserver((entryList) => {
for (const entry of entryList.getEntries()) {
const fidEntry = entry as PerformanceEventTiming;
metrics.histogram('web_vitals.fid', fidEntry.processingStart - fidEntry.startTime, {
event_type: fidEntry.name,
});
}
}).observe({ type: 'first-input', buffered: true });
// CLS - Cumulative Layout Shift
let clsValue = 0;
new PerformanceObserver((entryList) => {
for (const entry of entryList.getEntries()) {
const clsEntry = entry as PerformanceEntry & { value: number; hadRecentInput: boolean };
if (!clsEntry.hadRecentInput) {
clsValue += clsEntry.value;
}
}
metrics.gauge('web_vitals.cls', clsValue);
}).observe({ type: 'layout-shift', buffered: true });
// INP - Interaction to Next Paint
let maxINP = 0;
new PerformanceObserver((entryList) => {
for (const entry of entryList.getEntries()) {
const inpEntry = entry as PerformanceEventTiming;
const inp = inpEntry.duration;
if (inp > maxINP) {
maxINP = inp;
metrics.gauge('web_vitals.inp', inp, {
event_type: inpEntry.name,
});
}
}
}).observe({ type: 'event', buffered: true });
// TTFB - Time to First Byte
const navEntry = performance.getEntriesByType('navigation')[0] as PerformanceNavigationTiming;
if (navEntry) {
metrics.histogram('web_vitals.ttfb', navEntry.responseStart - navEntry.requestStart);
metrics.histogram('web_vitals.fcp', navEntry.domContentLoadedEventStart - navEntry.startTime);
}
}
Performance Sampling Strategy
Adaptive Sampling Based on Performance Budget
┌─────────────────────────────────────────────────────────────────────────┐
│ Adaptive Sampling Strategy │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Fast Sessions (< P50) Normal Sessions Slow Sessions (>P90) │
│ ──────────────────── ──────────────── ──────────────────── │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Sample Rate: 1% │ │ Sample Rate: 10%│ │ Sample Rate:100%│ │
│ │ │ │ │ │ │ │
│ │ - Basic metrics │ │ - All metrics │ │ - All metrics │ │
│ │ - Errors only │ │ - Sample logs │ │ - All logs │ │
│ │ │ │ - Key traces │ │ - Full traces │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
│ │
│ Why: Most users have Default sampling Capture details of │
│ good experience, minimal for normal perf performance problems │
│ telemetry needed │
│ │
└─────────────────────────────────────────────────────────────────────────┘
// src/observability/sampling.ts
interface SamplingConfig {
baseRate: number;
errorRate: number; // Always higher for errors
slowSessionRate: number; // Higher for slow sessions
slowThresholdMs: number; // When to consider session "slow"
headSamplingEnabled: boolean;
}
class AdaptiveSampler {
private config: SamplingConfig;
private sessionDecision: boolean | null = null;
private isSlowSession = false;
private performanceScore = 100;
constructor(config: Partial<SamplingConfig> = {}) {
this.config = {
baseRate: config.baseRate ?? 0.1,
errorRate: config.errorRate ?? 1.0,
slowSessionRate: config.slowSessionRate ?? 1.0,
slowThresholdMs: config.slowThresholdMs ?? 3000,
headSamplingEnabled: config.headSamplingEnabled ?? true,
};
this.initializeSessionSampling();
this.monitorPerformance();
}
private initializeSessionSampling() {
// Head-based sampling: decide at session start
if (this.config.headSamplingEnabled) {
const stored = sessionStorage.getItem('obs_sample_decision');
if (stored !== null) {
this.sessionDecision = stored === 'true';
} else {
this.sessionDecision = Math.random() < this.config.baseRate;
sessionStorage.setItem('obs_sample_decision', String(this.sessionDecision));
}
}
}
private monitorPerformance() {
// Monitor LCP to detect slow sessions
new PerformanceObserver((entryList) => {
const entries = entryList.getEntries();
const lcp = entries[entries.length - 1];
if (lcp.startTime > this.config.slowThresholdMs) {
this.isSlowSession = true;
this.performanceScore = Math.max(0, 100 - (lcp.startTime / 100));
// Upgrade sampling for slow sessions
sessionStorage.setItem('obs_sample_decision', 'true');
this.sessionDecision = true;
}
}).observe({ type: 'largest-contentful-paint', buffered: true });
// Monitor long tasks
new PerformanceObserver((entryList) => {
const entries = entryList.getEntries();
const longTasks = entries.filter(e => e.duration > 50);
if (longTasks.length > 5) {
this.performanceScore = Math.max(0, this.performanceScore - 10);
if (this.performanceScore < 50) {
this.isSlowSession = true;
this.sessionDecision = true;
}
}
}).observe({ type: 'longtask', buffered: true });
}
shouldSample(type: 'metric' | 'log' | 'trace' | 'error'): boolean {
// Always sample errors
if (type === 'error') {
return Math.random() < this.config.errorRate;
}
// Slow sessions get full sampling
if (this.isSlowSession) {
return true;
}
// Head-based sampling decision
if (this.sessionDecision !== null) {
return this.sessionDecision;
}
// Tail-based sampling fallback
return Math.random() < this.config.baseRate;
}
getSamplingContext(): {
sampled: boolean;
rate: number;
reason: string;
performanceScore: number;
} {
const sampled = this.shouldSample('trace');
let rate = this.config.baseRate;
let reason = 'base_rate';
if (this.isSlowSession) {
rate = this.config.slowSessionRate;
reason = 'slow_session';
} else if (this.sessionDecision !== null) {
reason = 'head_sampled';
}
return {
sampled,
rate,
reason,
performanceScore: this.performanceScore,
};
}
}
export const sampler = new AdaptiveSampler();
React Error Boundary Integration
Observability-Aware Error Boundaries
// src/observability/error-boundary.tsx
import React, { Component, ErrorInfo, ReactNode } from 'react';
import { logger } from './logger';
import { metrics } from './metrics';
import { correlation } from './correlation';
import { classifyError, generateErrorFingerprint, ErrorCategory, ErrorSeverity } from './error-taxonomy';
interface ErrorBoundaryProps {
children: ReactNode;
name: string;
fallback?: ReactNode | ((error: Error, reset: () => void) => ReactNode);
onError?: (error: Error, errorInfo: ErrorInfo) => void;
level?: 'page' | 'section' | 'component';
}
interface ErrorBoundaryState {
hasError: boolean;
error: Error | null;
errorInfo: ErrorInfo | null;
}
export class ObservableErrorBoundary extends Component<
ErrorBoundaryProps,
ErrorBoundaryState
> {
private errorCount = 0;
private lastErrorTime = 0;
constructor(props: ErrorBoundaryProps) {
super(props);
this.state = {
hasError: false,
error: null,
errorInfo: null,
};
}
static getDerivedStateFromError(error: Error): Partial<ErrorBoundaryState> {
return { hasError: true, error };
}
componentDidCatch(error: Error, errorInfo: ErrorInfo) {
const now = Date.now();
this.errorCount++;
// Detect error loops
if (now - this.lastErrorTime < 1000 && this.errorCount > 3) {
logger.fatal('error_boundary:loop_detected', {
boundaryName: this.props.name,
errorCount: this.errorCount,
error: error.message,
});
return;
}
this.lastErrorTime = now;
const context = correlation.getContext();
const classification = classifyError(error, {
componentStack: errorInfo.componentStack || undefined,
});
const fingerprint = generateErrorFingerprint(error, errorInfo.componentStack || undefined);
// Log the error with full context
logger.error('error_boundary:caught', {
boundaryName: this.props.name,
boundaryLevel: this.props.level || 'component',
error: error.message,
errorName: error.name,
errorStack: error.stack,
componentStack: errorInfo.componentStack,
classification: classification.category,
severity: classification.severity,
isRecoverable: classification.isRecoverable,
fingerprint,
...context,
});
// Record metrics
metrics.increment('error_boundary.caught', {
boundary: this.props.name,
level: this.props.level || 'component',
category: classification.category,
severity: classification.severity,
});
// Track error rate by component
metrics.increment(`error_boundary.${this.props.name}.errors`);
// Call custom error handler
this.props.onError?.(error, errorInfo);
this.setState({ errorInfo });
}
private handleReset = () => {
logger.info('error_boundary:reset', {
boundaryName: this.props.name,
});
metrics.increment('error_boundary.reset', {
boundary: this.props.name,
});
this.errorCount = 0;
this.setState({
hasError: false,
error: null,
errorInfo: null,
});
};
render() {
if (this.state.hasError) {
const { fallback } = this.props;
const { error } = this.state;
if (typeof fallback === 'function') {
return fallback(error!, this.handleReset);
}
if (fallback) {
return fallback;
}
// Default fallback based on boundary level
const classification = error
? classifyError(error, { componentStack: this.state.errorInfo?.componentStack || undefined })
: null;
return (
<div className="error-boundary-fallback" data-boundary={this.props.name}>
<p>{classification?.userMessage || 'Something went wrong'}</p>
{classification?.isRecoverable && (
<button onClick={this.handleReset}>Try Again</button>
)}
</div>
);
}
return this.props.children;
}
}
// HOC for wrapping components with observability
export function withErrorBoundary<P extends object>(
Component: React.ComponentType<P>,
boundaryProps: Omit<ErrorBoundaryProps, 'children'>
) {
const WrappedComponent = (props: P) => (
<ObservableErrorBoundary {...boundaryProps}>
<Component {...props} />
</ObservableErrorBoundary>
);
WrappedComponent.displayName = `withErrorBoundary(${Component.displayName || Component.name})`;
return WrappedComponent;
}
Global Error Capture
Window-Level Error Instrumentation
// src/observability/global-handlers.ts
import { logger } from './logger';
import { metrics } from './metrics';
import { correlation } from './correlation';
import { classifyError, generateErrorFingerprint } from './error-taxonomy';
interface GlobalErrorConfig {
captureUnhandledRejections: boolean;
captureResourceErrors: boolean;
captureConsoleErrors: boolean;
maxErrorsPerMinute: number;
}
class GlobalErrorHandler {
private config: GlobalErrorConfig;
private errorCount = 0;
private errorWindow: number[] = [];
private readonly ERROR_WINDOW_MS = 60000;
constructor(config: Partial<GlobalErrorConfig> = {}) {
this.config = {
captureUnhandledRejections: config.captureUnhandledRejections ?? true,
captureResourceErrors: config.captureResourceErrors ?? true,
captureConsoleErrors: config.captureConsoleErrors ?? false,
maxErrorsPerMinute: config.maxErrorsPerMinute ?? 50,
};
}
install() {
this.installErrorHandler();
if (this.config.captureUnhandledRejections) {
this.installUnhandledRejectionHandler();
}
if (this.config.captureResourceErrors) {
this.installResourceErrorHandler();
}
if (this.config.captureConsoleErrors) {
this.installConsoleErrorHandler();
}
}
private shouldCapture(): boolean {
const now = Date.now();
this.errorWindow = this.errorWindow.filter(t => now - t < this.ERROR_WINDOW_MS);
if (this.errorWindow.length >= this.config.maxErrorsPerMinute) {
// Log rate limiting
if (this.errorWindow.length === this.config.maxErrorsPerMinute) {
logger.warn('error_capture:rate_limited', {
maxPerMinute: this.config.maxErrorsPerMinute,
});
}
return false;
}
this.errorWindow.push(now);
return true;
}
private installErrorHandler() {
const originalHandler = window.onerror;
window.onerror = (
message: string | Event,
source?: string,
lineno?: number,
colno?: number,
error?: Error
) => {
if (!this.shouldCapture()) return;
const actualError = error || new Error(String(message));
const context = correlation.getContext();
const classification = classifyError(actualError);
const fingerprint = generateErrorFingerprint(actualError);
logger.error('global:uncaught_error', {
message: String(message),
source,
lineno,
colno,
error: actualError.message,
stack: actualError.stack,
classification: classification.category,
severity: classification.severity,
fingerprint,
...context,
});
metrics.increment('global.uncaught_error', {
category: classification.category,
severity: classification.severity,
});
// Call original handler
if (typeof originalHandler === 'function') {
return originalHandler.call(window, message, source, lineno, colno, error);
}
return false;
};
}
private installUnhandledRejectionHandler() {
window.addEventListener('unhandledrejection', (event: PromiseRejectionEvent) => {
if (!this.shouldCapture()) return;
const error = event.reason instanceof Error
? event.reason
: new Error(String(event.reason));
const context = correlation.getContext();
const classification = classifyError(error);
const fingerprint = generateErrorFingerprint(error);
logger.error('global:unhandled_rejection', {
reason: String(event.reason),
error: error.message,
stack: error.stack,
classification: classification.category,
severity: classification.severity,
fingerprint,
...context,
});
metrics.increment('global.unhandled_rejection', {
category: classification.category,
});
});
}
private installResourceErrorHandler() {
window.addEventListener('error', (event: ErrorEvent) => {
// Only capture resource loading errors
const target = event.target as HTMLElement;
if (!target || !('tagName' in target)) return;
if (!this.shouldCapture()) return;
const tagName = target.tagName.toLowerCase();
const resourceTypes = ['script', 'link', 'img', 'video', 'audio'];
if (!resourceTypes.includes(tagName)) return;
const src = (target as HTMLScriptElement | HTMLImageElement).src ||
(target as HTMLLinkElement).href ||
'unknown';
logger.error('global:resource_load_error', {
resourceType: tagName,
src,
...correlation.getContext(),
});
metrics.increment('global.resource_error', {
type: tagName,
});
}, true); // Capture phase to catch resource errors
}
private installConsoleErrorHandler() {
const originalConsoleError = console.error;
console.error = (...args: unknown[]) => {
if (this.shouldCapture()) {
logger.warn('console:error', {
args: args.map(arg =>
arg instanceof Error
? { message: arg.message, stack: arg.stack }
: String(arg)
),
...correlation.getContext(),
});
}
return originalConsoleError.apply(console, args);
};
}
}
export const globalErrorHandler = new GlobalErrorHandler();
Telemetry Backend Architecture
Collection and Storage Design
┌─────────────────────────────────────────────────────────────────────────────┐
│ Telemetry Backend Architecture │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Browsers │
│ ──────── │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Client │ │ Client │ │ Client │ │
│ │ A │ │ B │ │ C │ │
│ └────┬────┘ └────┬────┘ └────┬────┘ │
│ │ │ │ │
│ └────────────┼────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Edge Workers │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────────┐ │ │
│ │ │ Validation │─▶│ Sampling │─▶│ Batching + Buffering │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────────────────┘ │ │
│ └──────────────────────────────────────────────────────────┬──────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Message Queue (Kafka) │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────────┐ │ │
│ │ │ logs-topic │ │metrics-topic │ │ traces-topic │ │ │
│ │ └───────┬──────┘ └───────┬──────┘ └───────────┬──────────────┘ │ │
│ └──────────┼─────────────────┼─────────────────────┼──────────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────────┐ ┌────────────────┐ ┌─────────────────────────┐ │
│ │ ClickHouse │ │ Prometheus │ │ Jaeger │ │
│ │ (Logs + Errors)│ │ (Metrics) │ │ (Traces) │ │
│ └────────┬─────────┘ └───────┬────────┘ └───────────┬─────────────┘ │
│ │ │ │ │
│ └───────────────────┼───────────────────────┘ │
│ ▼ │
│ ┌────────────────────┐ │
│ │ Grafana │ │
│ │ (Visualization) │ │
│ └────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Edge Worker for Telemetry Ingestion
// workers/telemetry-ingest.ts (Cloudflare Worker)
interface TelemetryPayload {
type: 'logs' | 'metrics' | 'traces';
data: unknown;
metadata: {
sessionId: string;
timestamp: number;
userAgent: string;
};
}
interface Env {
KAFKA_BROKER: string;
KAFKA_USERNAME: string;
KAFKA_PASSWORD: string;
TELEMETRY_BUFFER: DurableObjectNamespace;
}
export default {
async fetch(request: Request, env: Env): Promise<Response> {
if (request.method !== 'POST') {
return new Response('Method not allowed', { status: 405 });
}
const url = new URL(request.url);
const telemetryType = url.pathname.split('/').pop();
if (!['logs', 'metrics', 'traces'].includes(telemetryType!)) {
return new Response('Invalid telemetry type', { status: 400 });
}
try {
const payload: TelemetryPayload = await request.json();
// Validate payload
const validation = validatePayload(payload);
if (!validation.valid) {
return new Response(JSON.stringify({ error: validation.error }), {
status: 400,
headers: { 'Content-Type': 'application/json' },
});
}
// Apply sampling at edge
if (!shouldSample(payload)) {
return new Response(JSON.stringify({ sampled: false }), {
status: 200,
headers: { 'Content-Type': 'application/json' },
});
}
// Enrich with edge context
const enriched = enrichPayload(payload, request);
// Buffer in Durable Object for batching
const bufferId = env.TELEMETRY_BUFFER.idFromName(telemetryType!);
const buffer = env.TELEMETRY_BUFFER.get(bufferId);
await buffer.fetch(request.url, {
method: 'POST',
body: JSON.stringify(enriched),
});
return new Response(JSON.stringify({ success: true }), {
status: 200,
headers: {
'Content-Type': 'application/json',
'Access-Control-Allow-Origin': '*',
},
});
} catch (error) {
return new Response(
JSON.stringify({ error: 'Failed to process telemetry' }),
{ status: 500, headers: { 'Content-Type': 'application/json' } }
);
}
},
};
function validatePayload(payload: TelemetryPayload): { valid: boolean; error?: string } {
if (!payload.type || !payload.data || !payload.metadata) {
return { valid: false, error: 'Missing required fields' };
}
if (!payload.metadata.sessionId) {
return { valid: false, error: 'Missing sessionId' };
}
// Validate payload size
const size = JSON.stringify(payload).length;
if (size > 1024 * 100) { // 100KB limit
return { valid: false, error: 'Payload too large' };
}
return { valid: true };
}
function shouldSample(payload: TelemetryPayload): boolean {
// Always accept errors
if (payload.type === 'logs') {
const logs = payload.data as Array<{ level: string }>;
if (logs.some(l => l.level === 'error' || l.level === 'fatal')) {
return true;
}
}
// Sample other telemetry at 10%
const hash = hashString(payload.metadata.sessionId);
return (hash % 100) < 10;
}
function hashString(str: string): number {
let hash = 0;
for (let i = 0; i < str.length; i++) {
const char = str.charCodeAt(i);
hash = ((hash << 5) - hash) + char;
hash = hash & hash;
}
return Math.abs(hash);
}
function enrichPayload(payload: TelemetryPayload, request: Request): TelemetryPayload {
return {
...payload,
metadata: {
...payload.metadata,
edgeRegion: request.cf?.colo as string,
edgeCountry: request.cf?.country as string,
clientIP: request.headers.get('CF-Connecting-IP') || 'unknown',
receivedAt: Date.now(),
},
};
}
Dashboard and Alerting
Key Observability Dashboards
┌─────────────────────────────────────────────────────────────────────────────┐
│ Frontend Observability Dashboard │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Key Metrics (Last Hour) │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────┐ │ │
│ │ │ Error Rate │ │ P95 LCP │ │ JS Error % │ │ Sessions │ │ │
│ │ │ 0.12% │ │ 2.1s │ │ 0.05% │ │ 12.5k │ │ │
│ │ │ ↓ 0.02% │ │ ↓ 150ms │ │ ↑ 0.01% │ │ ↑ 5% │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ └──────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────┐ ┌──────────────────────────────────┐ │
│ │ Error Distribution │ │ Performance Trends │ │
│ │ │ │ │ │
│ │ HTTP 5xx ████████ 45% │ │ LCP ──────────────────────── │ │
│ │ JS TypeError ████ 22% │ │ ╲ │ │
│ │ Network ███ 15% │ │ ╲ ╱───── │ │
│ │ Chunk Load ██ 10% │ │ ╲──╱ │ │
│ │ Other █ 8% │ │ │ │
│ │ │ │ FID ───────────────────────── │ │
│ └────────────────────────────────┘ └──────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Top Errors by Fingerprint │ │
│ │ ┌───────────────────────────────────────────────────────────────┐ │ │
│ │ │ Fingerprint │ Message │ Count │ Users │ Last Seen │ │ │
│ │ ├──────────────┼──────────────────┼───────┼───────┼─────────────│ │ │
│ │ │ ab3f9c2 │ Cannot read... │ 234 │ 156 │ 2 min ago │ │ │
│ │ │ 7d2e1b4 │ Network timeout │ 189 │ 89 │ 5 min ago │ │ │
│ │ │ c9a8d3f │ Chunk load fail │ 145 │ 67 │ 1 min ago │ │ │
│ │ └──────────────┴──────────────────┴───────┴───────┴─────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Request Trace Waterfall │ │
│ │ │ │
│ │ User Click ├──────┤ 25ms │ │
│ │ State Update ├───┤ 15ms │ │
│ │ Fetch Start ├───────────────────────────────────┤ 350ms │ │
│ │ └─ DNS ├─┤ 12ms │ │
│ │ └─ TCP ├──┤ 28ms │ │
│ │ └─ TLS ├───┤ 35ms │ │
│ │ └─ Request ├─┤ 18ms │ │
│ │ └─ Server ├─────────────────────────┤ 245ms │ │
│ │ └─ API Gateway ├─┤ 15ms │ │
│ │ └─ Auth Service ├──┤ 22ms │ │
│ │ └─ Database ├───────────────────┤ 198ms │ │
│ │ └─ Response ├─┤ 12ms │ │
│ │ Render ├─────────────────────────────────────────────┤ 380ms│ │
│ │ │ │
│ │ Total: 405ms │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Alert Configuration
# observability/alerts.yaml
groups:
- name: frontend-errors
rules:
- alert: HighErrorRate
expr: |
sum(rate(global_uncaught_error_total[5m]))
/ sum(rate(page_view_total[5m])) > 0.01
for: 5m
labels:
severity: critical
annotations:
summary: "Frontend error rate above 1%"
description: "Error rate is {{ $value | humanizePercentage }}"
- alert: NewErrorSpike
expr: |
sum(increase(error_boundary_caught_total[5m])) > 100
AND sum(increase(error_boundary_caught_total[5m] offset 1h)) < 20
for: 2m
labels:
severity: warning
annotations:
summary: "New error pattern detected"
- name: frontend-performance
rules:
- alert: LCPRegression
expr: |
histogram_quantile(0.95, sum(rate(web_vitals_lcp_bucket[15m])) by (le))
> 3000
for: 10m
labels:
severity: warning
annotations:
summary: "P95 LCP above 3 seconds"
description: "Current P95 LCP: {{ $value }}ms"
- alert: HighINP
expr: |
histogram_quantile(0.95, sum(rate(web_vitals_inp_bucket[15m])) by (le))
> 500
for: 10m
labels:
severity: warning
annotations:
summary: "P95 INP above 500ms (poor responsiveness)"
- alert: ChunkLoadFailures
expr: |
sum(rate(resource_error_total{type="script"}[5m])) > 10
for: 5m
labels:
severity: critical
annotations:
summary: "High rate of JavaScript chunk load failures"
- name: frontend-availability
rules:
- alert: CDNLatencySpike
expr: |
histogram_quantile(0.95,
sum(rate(http_client_duration_bucket{host=~"cdn.*"}[5m])) by (le)
) > 500
for: 5m
labels:
severity: warning
annotations:
summary: "CDN P95 latency above 500ms"
Initialization and Configuration
Complete Observability Setup
// src/observability/index.ts
import { correlation } from './correlation';
import { logger } from './logger';
import { metrics, collectWebVitals } from './metrics';
import { globalErrorHandler } from './global-handlers';
import { sampler } from './sampling';
interface ObservabilityConfig {
serviceName: string;
serviceVersion: string;
environment: 'development' | 'staging' | 'production';
telemetryEndpoint: string;
sampleRate: number;
debug: boolean;
}
let initialized = false;
export function initializeObservability(config: ObservabilityConfig) {
if (initialized) {
console.warn('Observability already initialized');
return;
}
// Set global context
(window as any).__OBSERVABILITY__ = {
config,
correlation,
logger,
metrics,
};
// Install global error handlers
globalErrorHandler.install();
// Start collecting Web Vitals
collectWebVitals();
// Log page view
logger.info('page:view', {
url: location.href,
referrer: document.referrer,
serviceName: config.serviceName,
serviceVersion: config.serviceVersion,
environment: config.environment,
});
// Track page visibility
document.addEventListener('visibilitychange', () => {
if (document.visibilityState === 'hidden') {
const context = correlation.getContext();
logger.info('page:hidden', {
timeOnPage: performance.now(),
...context,
});
}
});
// Track navigation timing
if ('PerformanceNavigationTiming' in window) {
const navEntry = performance.getEntriesByType('navigation')[0] as PerformanceNavigationTiming;
if (navEntry) {
metrics.histogram('navigation.dns', navEntry.domainLookupEnd - navEntry.domainLookupStart);
metrics.histogram('navigation.tcp', navEntry.connectEnd - navEntry.connectStart);
metrics.histogram('navigation.ttfb', navEntry.responseStart - navEntry.requestStart);
metrics.histogram('navigation.download', navEntry.responseEnd - navEntry.responseStart);
metrics.histogram('navigation.dom_interactive', navEntry.domInteractive - navEntry.startTime);
metrics.histogram('navigation.dom_complete', navEntry.domComplete - navEntry.startTime);
metrics.histogram('navigation.load', navEntry.loadEventEnd - navEntry.startTime);
}
}
initialized = true;
if (config.debug) {
console.log('[Observability] Initialized', config);
}
}
// Export all modules
export { correlation } from './correlation';
export { logger } from './logger';
export { metrics } from './metrics';
export { instrumentedFetch } from './instrumented-fetch';
export { ObservableErrorBoundary, withErrorBoundary } from './error-boundary';
export { classifyError, ErrorCategory, ErrorSeverity } from './error-taxonomy';
// Usage in app entry point:
// initializeObservability({
// serviceName: 'my-app',
// serviceVersion: '1.2.3',
// environment: 'production',
// telemetryEndpoint: 'https://telemetry.example.com',
// sampleRate: 0.1,
// debug: false,
// });
Key Takeaways
-
Correlation is foundational: Every telemetry event must carry session, interaction, and trace IDs to enable end-to-end debugging
-
Error taxonomy enables automation: Classifying errors by category and severity allows automated routing, alerting, and user-facing messaging
-
Adaptive sampling balances cost and visibility: Sample more from slow or problematic sessions where debugging data is most valuable
-
Edge processing reduces latency and cost: Validate, sample, and batch telemetry at the edge before sending to storage
-
Fingerprinting enables deduplication: Group similar errors to understand impact and prioritize fixes
-
Web Vitals provide user-centric metrics: LCP, FID, CLS, and INP correlate with actual user experience
-
Structured logging beats unstructured: Consistent fields enable querying, alerting, and automated analysis
-
Error boundaries provide isolation: Contain failures and capture context at the component level
-
Beacon API ensures delivery: Use
sendBeaconfor reliable telemetry on page unload -
Dashboards tell the story: Combine metrics, errors, and traces to understand system behavior
Enterprise observability isn't about collecting more data—it's about collecting the right data with the context needed to act on it quickly.
What did you think?