Frontend Logging Pipelines at Scale
Frontend Logging Pipelines at Scale
Introduction
Backend logging is a solved problem. Structured logs, centralized aggregation, retention policies—every ops team knows the playbook. Frontend logging at scale is a different beast entirely.
You're not logging from a dozen servers in a controlled datacenter. You're logging from millions of browsers across every network condition, device type, and geographic location imaginable. The volume is staggering: a single page view might generate 50 log entries across performance events, errors, network requests, and user interactions. Multiply by 200 million page views per day, and you're looking at 10 billion log entries daily.
This deep dive examines how to build frontend logging pipelines that can handle this scale: from the browser-side SDK that must minimize performance impact while maximizing signal, to the ingestion layer that must handle burst traffic without dropping data, to the storage and query systems that make petabytes of logs actually useful for debugging production issues.
Scale Context
Production frontend logging we're architecting for:
| Metric | Value |
|---|---|
| Daily Page Views | 200M |
| Logs per Page View | 30-100 |
| Raw Log Events per Day | 10B+ |
| Compressed Log Volume | 5-20TB/day |
| Peak Events per Second | 500K |
| Log Entry P50 Size | 500 bytes |
| Log Entry P99 Size | 5KB |
| Source Map Lookups per Day | 50M |
| Active Sessions per Hour | 2M |
| Log Retention (hot) | 7 days |
| Log Retention (warm) | 30 days |
| Log Retention (cold) | 1 year |
| Query P95 Latency (hot data) | <5 seconds |
At this scale, every byte matters. Every unnecessary log is millions of wasted dollars.
Log Classification
Frontend Log Types
┌─────────────────────────────────────────────────────────────────────────────┐
│ FRONTEND LOG TAXONOMY │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ ERROR LOGS (High Priority, Always Capture) │ │
│ │ │ │
│ │ • JavaScript Exceptions │ │
│ │ - Uncaught errors (window.onerror) │ │
│ │ - Unhandled promise rejections │ │
│ │ - React error boundaries │ │
│ │ │ │
│ │ • Network Failures │ │
│ │ - API 4xx/5xx responses │ │
│ │ - Network timeouts │ │
│ │ - CORS failures │ │
│ │ │ │
│ │ • Application Errors │ │
│ │ - Business logic errors │ │
│ │ - Validation failures │ │
│ │ - State inconsistencies │ │
│ │ │ │
│ │ Volume: ~1% of total logs │ │
│ │ Sampling: 100% (never sample errors) │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ PERFORMANCE LOGS (Medium Priority, Sample Allowed) │ │
│ │ │ │
│ │ • Core Web Vitals │ │
│ │ - LCP, FID/INP, CLS events │ │
│ │ - Navigation timing │ │
│ │ │ │
│ │ • Resource Timing │ │
│ │ - Individual resource loads │ │
│ │ - Critical path analysis │ │
│ │ │ │
│ │ • Long Tasks │ │
│ │ - Main thread blocking │ │
│ │ - JS execution times │ │
│ │ │ │
│ │ Volume: ~20% of total logs │ │
│ │ Sampling: 10-50% depending on page type │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ BEHAVIORAL LOGS (Lower Priority, Aggressive Sampling) │ │
│ │ │ │
│ │ • User Interactions │ │
│ │ - Clicks, scrolls, form interactions │ │
│ │ - Navigation events │ │
│ │ - Feature usage │ │
│ │ │ │
│ │ • Session Events │ │
│ │ - Page views │ │
│ │ - Session start/end │ │
│ │ - Tab visibility changes │ │
│ │ │ │
│ │ Volume: ~60% of total logs │ │
│ │ Sampling: 1-10% (session-level sampling) │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ DEBUG LOGS (Conditional, Feature-Flag Controlled) │ │
│ │ │ │
│ │ • Verbose Logging │ │
│ │ - State changes │ │
│ │ - API request/response bodies │ │
│ │ - Internal function traces │ │
│ │ │ │
│ │ Volume: Potentially huge │ │
│ │ Sampling: 0% normally, 100% for flagged sessions │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Log Schema Design
// Production frontend log schema
interface FrontendLogEntry {
// Identity
id: string; // Unique log ID
timestamp: number; // Unix ms
type: LogType; // error | performance | interaction | debug
// Context (added to every log)
session: SessionContext;
page: PageContext;
device: DeviceContext;
// Payload (varies by type)
payload: ErrorPayload | PerformancePayload | InteractionPayload | DebugPayload;
// Metadata
version: string; // SDK version
appVersion: string; // App version/commit
sampling: SamplingInfo;
}
interface SessionContext {
id: string; // Session ID
userId?: string; // If authenticated
startTime: number;
pageViews: number;
isNewUser: boolean;
}
interface PageContext {
url: string; // Current URL (sanitized)
path: string; // URL path only
referrer?: string;
title: string;
loadTime: number; // Time since navigation
}
interface DeviceContext {
userAgent: string;
browser: string; // Parsed browser name
browserVersion: string;
os: string;
osVersion: string;
deviceType: 'mobile' | 'tablet' | 'desktop';
viewport: { width: number; height: number };
connection?: {
type: string; // 4g, 3g, wifi, etc.
effectiveType: string;
downlink?: number;
rtt?: number;
};
memory?: number; // Device memory GB
cores?: number; // Hardware concurrency
}
interface ErrorPayload {
type: 'uncaught' | 'promise' | 'network' | 'custom';
message: string;
stack?: string;
filename?: string;
lineno?: number;
colno?: number;
componentStack?: string; // React component stack
breadcrumbs: Breadcrumb[];
tags: Record<string, string>;
extra: Record<string, unknown>;
}
interface PerformancePayload {
metric: string; // LCP, FID, CLS, etc.
value: number;
rating: 'good' | 'needs-improvement' | 'poor';
attribution?: Record<string, unknown>;
resources?: ResourceEntry[]; // Relevant resources
}
interface Breadcrumb {
timestamp: number;
category: string; // ui.click, fetch, console, navigation
message: string;
level: 'info' | 'warning' | 'error';
data?: Record<string, unknown>;
}
// Compact wire format (minimized for transmission)
interface CompactLogEntry {
i: string; // id
t: number; // timestamp
y: number; // type (enum)
s: string; // session id
u?: string; // user id
p: string; // page path
d: number; // device type (enum)
v: unknown; // payload (varies)
}
// Compression: Full schema ~2KB → Compact ~400 bytes → Gzip ~150 bytes
Client-Side Collection
SDK Architecture
┌─────────────────────────────────────────────────────────────────────────────┐
│ FRONTEND LOGGING SDK ARCHITECTURE │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ COLLECTORS │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Error │ │ Performance │ │ Interaction │ │ │
│ │ │ Collector │ │ Collector │ │ Collector │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ • onerror │ │ • Perf Obs │ │ • Click │ │ │
│ │ │ • rejection │ │ • Nav Timing │ │ • Scroll │ │ │
│ │ │ • network │ │ • Resource │ │ • Input │ │ │
│ │ │ • console │ │ • Long Task │ │ • Navigation │ │ │
│ │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ │
│ │ │ │ │ │ │
│ │ └─────────────────┼─────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
│ │ │ PROCESSOR │ │ │
│ │ │ │ │ │
│ │ │ • Sampling decision │ │ │
│ │ │ • Context enrichment │ │ │
│ │ │ • PII scrubbing │ │ │
│ │ │ • Deduplication │ │ │
│ │ │ • Rate limiting │ │ │
│ │ │ │ │ │
│ │ └─────────────────────────────────────────────────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
│ │ │ BUFFER │ │ │
│ │ │ │ │ │
│ │ │ ┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐ │ │ │
│ │ │ │ Log │ Log │ Log │ Log │ Log │ Log │ Log │ ... │ │ │ │
│ │ │ └─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘ │ │ │
│ │ │ │ │ │
│ │ │ Max size: 100 entries or 50KB │ │ │
│ │ │ Max age: 10 seconds │ │ │
│ │ │ Overflow: Drop oldest (FIFO) │ │ │
│ │ │ │ │ │
│ │ └─────────────────────────────────────────────────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
│ │ │ TRANSPORT │ │ │
│ │ │ │ │ │
│ │ │ Priority order: │ │ │
│ │ │ 1. sendBeacon (page unload) │ │ │
│ │ │ 2. fetch + keepalive (normal) │ │ │
│ │ │ 3. XHR (fallback) │ │ │
│ │ │ │ │ │
│ │ │ Features: │ │ │
│ │ │ • Compression (gzip/brotli) │ │ │
│ │ │ • Retry with backoff │ │ │
│ │ │ • Offline queue (IndexedDB) │ │ │
│ │ │ │ │ │
│ │ └─────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
SDK Implementation
// Production logging SDK
interface LoggerConfig {
endpoint: string;
apiKey: string;
appVersion: string;
// Sampling
errorSampleRate: number; // Default 1.0
perfSampleRate: number; // Default 0.1
interactionSampleRate: number; // Default 0.01
// Batching
batchSize: number; // Default 100
flushInterval: number; // Default 10000ms
maxBufferSize: number; // Default 500
// Privacy
scrubFields: string[];
blockSelectors: string[];
// Feature flags
enableConsoleCapture: boolean;
enableNetworkCapture: boolean;
enableOfflineQueue: boolean;
}
class FrontendLogger {
private config: LoggerConfig;
private buffer: CompactLogEntry[] = [];
private sessionContext: SessionContext;
private deviceContext: DeviceContext;
private flushTimer: number | null = null;
private offlineDb: IDBDatabase | null = null;
constructor(config: LoggerConfig) {
this.config = config;
this.initialize();
}
private async initialize(): Promise<void> {
this.sessionContext = this.initSession();
this.deviceContext = this.getDeviceContext();
this.setupCollectors();
this.startFlushTimer();
if (this.config.enableOfflineQueue) {
await this.initOfflineDb();
await this.flushOfflineQueue();
}
this.setupLifecycleHandlers();
}
// Public API
error(message: string, extra?: Record<string, unknown>): void {
this.log('error', {
type: 'custom',
message,
stack: new Error().stack,
extra
});
}
info(message: string, data?: Record<string, unknown>): void {
if (!this.shouldSample('interaction')) return;
this.log('debug', {
level: 'info',
message,
data
});
}
private log(type: LogType, payload: unknown): void {
const entry = this.createLogEntry(type, payload);
// Apply processing
const processed = this.process(entry);
if (!processed) return;
// Add to buffer
this.addToBuffer(processed);
// Immediate flush for errors
if (type === 'error') {
this.flush();
}
}
private createLogEntry(type: LogType, payload: unknown): CompactLogEntry {
return {
i: this.generateId(),
t: Date.now(),
y: this.typeToEnum(type),
s: this.sessionContext.id,
u: this.sessionContext.userId,
p: window.location.pathname,
d: this.deviceTypeToEnum(this.deviceContext.deviceType),
v: payload
};
}
private process(entry: CompactLogEntry): CompactLogEntry | null {
// Rate limiting
if (!this.checkRateLimit(entry)) return null;
// Deduplication
if (this.isDuplicate(entry)) return null;
// PII scrubbing
entry.v = this.scrubPII(entry.v);
return entry;
}
private addToBuffer(entry: CompactLogEntry): void {
this.buffer.push(entry);
// Overflow protection
if (this.buffer.length > this.config.maxBufferSize) {
// Drop oldest entries
this.buffer = this.buffer.slice(-this.config.batchSize);
}
// Flush if batch size reached
if (this.buffer.length >= this.config.batchSize) {
this.flush();
}
}
private async flush(): Promise<void> {
if (this.buffer.length === 0) return;
const batch = this.buffer.splice(0, this.config.batchSize);
const payload = this.serializeBatch(batch);
try {
const success = await this.send(payload);
if (!success && this.config.enableOfflineQueue) {
// Store for retry
await this.storeOffline(batch);
}
} catch (error) {
// Store for retry
if (this.config.enableOfflineQueue) {
await this.storeOffline(batch);
}
}
}
private serializeBatch(batch: CompactLogEntry[]): ArrayBuffer {
const json = JSON.stringify({
logs: batch,
meta: {
sdk: '1.0.0',
app: this.config.appVersion,
device: this.deviceContext
}
});
return this.compress(json);
}
private async compress(data: string): Promise<ArrayBuffer> {
if ('CompressionStream' in window) {
const encoder = new TextEncoder();
const stream = new CompressionStream('gzip');
const writer = stream.writable.getWriter();
writer.write(encoder.encode(data));
writer.close();
const reader = stream.readable.getReader();
const chunks: Uint8Array[] = [];
while (true) {
const { done, value } = await reader.read();
if (done) break;
chunks.push(value);
}
const totalLength = chunks.reduce((acc, chunk) => acc + chunk.length, 0);
const result = new Uint8Array(totalLength);
let offset = 0;
for (const chunk of chunks) {
result.set(chunk, offset);
offset += chunk.length;
}
return result.buffer;
}
// Fallback: no compression
return new TextEncoder().encode(data).buffer;
}
private async send(payload: ArrayBuffer): Promise<boolean> {
// Prefer sendBeacon for reliability
if (navigator.sendBeacon && payload.byteLength < 65536) {
const blob = new Blob([payload], { type: 'application/octet-stream' });
return navigator.sendBeacon(this.config.endpoint, blob);
}
// fetch with keepalive
try {
const response = await fetch(this.config.endpoint, {
method: 'POST',
headers: {
'Content-Type': 'application/octet-stream',
'Content-Encoding': 'gzip',
'X-API-Key': this.config.apiKey
},
body: payload,
keepalive: true
});
return response.ok;
} catch {
return false;
}
}
private setupLifecycleHandlers(): void {
// Flush on page hide (more reliable than unload)
document.addEventListener('visibilitychange', () => {
if (document.visibilityState === 'hidden') {
this.flush();
}
});
// Backup: pagehide
window.addEventListener('pagehide', () => {
this.flush();
});
}
// Sampling
private shouldSample(type: string): boolean {
const rate = this.getSampleRate(type);
// Session-level sampling for consistency
const sessionHash = this.hashString(this.sessionContext.id + type);
return sessionHash < rate;
}
private getSampleRate(type: string): number {
switch (type) {
case 'error': return this.config.errorSampleRate;
case 'performance': return this.config.perfSampleRate;
case 'interaction': return this.config.interactionSampleRate;
default: return 0.1;
}
}
private hashString(str: string): number {
let hash = 0;
for (let i = 0; i < str.length; i++) {
hash = ((hash << 5) - hash) + str.charCodeAt(i);
hash = hash & hash;
}
return Math.abs(hash) / 2147483647; // Normalize to 0-1
}
// PII Scrubbing
private scrubPII(value: unknown): unknown {
if (typeof value === 'string') {
return this.scrubString(value);
}
if (typeof value === 'object' && value !== null) {
const scrubbed: Record<string, unknown> = {};
for (const [key, val] of Object.entries(value)) {
if (this.config.scrubFields.includes(key.toLowerCase())) {
scrubbed[key] = '[REDACTED]';
} else {
scrubbed[key] = this.scrubPII(val);
}
}
return scrubbed;
}
return value;
}
private scrubString(str: string): string {
// Email pattern
str = str.replace(/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g, '[EMAIL]');
// Credit card pattern
str = str.replace(/\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b/g, '[CARD]');
// SSN pattern
str = str.replace(/\b\d{3}-\d{2}-\d{4}\b/g, '[SSN]');
// Phone patterns
str = str.replace(/\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g, '[PHONE]');
return str;
}
}
Ingestion Architecture
High-Volume Ingestion
┌─────────────────────────────────────────────────────────────────────────────┐
│ LOG INGESTION ARCHITECTURE │
│ │
│ Browser Traffic ──────────────────────────────────────────────────────▶ │
│ │ │
│ │ HTTPS POST (gzipped batches) │
│ │ 500K requests/sec peak │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ EDGE INGESTION │ │
│ │ (CDN Workers / Edge Functions) │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Edge PoP │ │ Edge PoP │ │ Edge PoP │ ... (200+) │ │
│ │ │ (NYC) │ │ (London) │ │ (Tokyo) │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ • Validate │ │ • Validate │ │ • Validate │ │ │
│ │ │ • Decompress│ │ • Decompress│ │ • Decompress│ │ │
│ │ │ • Enrich │ │ • Enrich │ │ • Enrich │ │ │
│ │ │ • Route │ │ • Route │ │ • Route │ │ │
│ │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ │
│ │ │ │ │ │ │
│ │ └────────────────┼────────────────┘ │ │
│ │ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ MESSAGE QUEUE │ │
│ │ (Kafka / Kinesis / Pub/Sub) │ │
│ │ │ │
│ │ ┌───────────────────────────────────────────────────────────────┐ │ │
│ │ │ Topic: frontend-logs │ │ │
│ │ │ Partitions: 256 │ │ │
│ │ │ Replication: 3 │ │ │
│ │ │ Retention: 24 hours │ │ │
│ │ │ │ │ │
│ │ │ Partition Strategy: hash(customer_id) % partitions │ │ │
│ │ │ Throughput: 500K msgs/sec write, 2M msgs/sec read │ │ │
│ │ │ │ │ │
│ │ └───────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ Separate topics for: │ │
│ │ • frontend-logs-errors (high priority) │ │
│ │ • frontend-logs-performance │ │
│ │ • frontend-logs-interactions │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌──────────────┼──────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │
│ │ Stream │ │ Aggregation │ │ Alerting │ │
│ │ Processor │ │ Engine │ │ Engine │ │
│ │ (Flink) │ │ (Custom) │ │ (Custom) │ │
│ │ │ │ │ │ │ │
│ │ • Parse │ │ • Time-window │ │ • Threshold │ │
│ │ • Symbolicate │ │ aggregation │ │ • Anomaly │ │
│ │ • Sessionize │ │ • Dimensional │ │ • Correlation │ │
│ │ • Error group │ │ rollups │ │ │ │
│ └────────────────┘ └────────────────┘ └────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Edge Ingestion Worker
// Edge worker for log ingestion
interface IngestRequest {
logs: CompactLogEntry[];
meta: {
sdk: string;
app: string;
device: DeviceContext;
};
}
export default {
async fetch(request: Request, env: Env): Promise<Response> {
// Rate limiting
const clientIP = request.headers.get('CF-Connecting-IP') || 'unknown';
const rateLimited = await checkRateLimit(env, clientIP);
if (rateLimited) {
return new Response('Rate limited', { status: 429 });
}
// Validate API key
const apiKey = request.headers.get('X-API-Key');
if (!apiKey || !await validateApiKey(env, apiKey)) {
return new Response('Unauthorized', { status: 401 });
}
// Parse request
const contentEncoding = request.headers.get('Content-Encoding');
let body: string;
if (contentEncoding === 'gzip') {
const decompressed = await decompressGzip(await request.arrayBuffer());
body = new TextDecoder().decode(decompressed);
} else {
body = await request.text();
}
let data: IngestRequest;
try {
data = JSON.parse(body);
} catch {
return new Response('Invalid JSON', { status: 400 });
}
// Validate structure
if (!Array.isArray(data.logs) || data.logs.length === 0) {
return new Response('Invalid payload', { status: 400 });
}
// Enrich logs
const enriched = data.logs.map(log => enrichLog(log, request, env));
// Route to Kafka
const customerId = await getCustomerIdFromApiKey(env, apiKey);
// Separate errors (high priority) from other logs
const errors = enriched.filter(l => l.y === 0); // type enum for error
const others = enriched.filter(l => l.y !== 0);
// Send to appropriate queues
const promises: Promise<void>[] = [];
if (errors.length > 0) {
promises.push(
env.KAFKA.send({
topic: 'frontend-logs-errors',
messages: errors.map(e => ({
key: customerId,
value: JSON.stringify(e)
}))
})
);
}
if (others.length > 0) {
promises.push(
env.KAFKA.send({
topic: 'frontend-logs',
messages: others.map(e => ({
key: customerId,
value: JSON.stringify(e)
}))
})
);
}
await Promise.all(promises);
return new Response('OK', {
status: 202,
headers: {
'X-Logs-Received': String(enriched.length)
}
});
}
};
function enrichLog(
log: CompactLogEntry,
request: Request,
env: Env
): EnrichedLogEntry {
const cf = request.cf as any;
return {
...log,
// Server-side enrichment
ingestTime: Date.now(),
geo: {
country: cf?.country,
region: cf?.region,
city: cf?.city,
colo: cf?.colo
},
clientIP: hashIP(request.headers.get('CF-Connecting-IP') || ''),
asn: cf?.asn
};
}
async function decompressGzip(buffer: ArrayBuffer): Promise<ArrayBuffer> {
const stream = new DecompressionStream('gzip');
const writer = stream.writable.getWriter();
writer.write(new Uint8Array(buffer));
writer.close();
const reader = stream.readable.getReader();
const chunks: Uint8Array[] = [];
while (true) {
const { done, value } = await reader.read();
if (done) break;
chunks.push(value);
}
const totalLength = chunks.reduce((acc, chunk) => acc + chunk.length, 0);
const result = new Uint8Array(totalLength);
let offset = 0;
for (const chunk of chunks) {
result.set(chunk, offset);
offset += chunk.length;
}
return result.buffer;
}
Processing Pipeline
Stream Processing
// Flink-style stream processor (pseudo-code)
class LogStreamProcessor {
private sourceMapCache = new Map<string, SourceMap>();
async process(stream: LogStream): Promise<void> {
stream
.filter(log => this.isValid(log))
.map(log => this.parse(log))
.keyBy(log => log.sessionId)
.window(ProcessingTimeWindows.of(Time.seconds(30)))
.process(new SessionWindowProcessor())
.sink(this.createSinks());
}
private parse(rawLog: string): ParsedLog {
const log = JSON.parse(rawLog);
// Symbolicate error stack traces
if (log.type === 'error' && log.payload.stack) {
log.payload.symbolicatedStack = this.symbolicate(
log.payload.stack,
log.meta.appVersion
);
}
// Parse user agent
log.device = this.parseUserAgent(log.device.userAgent);
return log;
}
private symbolicate(stack: string, version: string): string {
// Get source map (cached)
const sourceMap = this.getSourceMap(version);
if (!sourceMap) return stack;
// Parse stack frames
const frames = this.parseStackFrames(stack);
// Map each frame to original source
const symbolicatedFrames = frames.map(frame => {
const original = sourceMap.originalPositionFor({
line: frame.line,
column: frame.column
});
if (original.source) {
return {
...frame,
file: original.source,
line: original.line,
column: original.column,
function: original.name || frame.function
};
}
return frame;
});
return this.formatStack(symbolicatedFrames);
}
}
class SessionWindowProcessor {
process(
key: string,
context: ProcessWindowContext,
logs: Iterable<ParsedLog>,
out: Collector<SessionAggregate>
): void {
const session: SessionAggregate = {
sessionId: key,
windowStart: context.window().start(),
windowEnd: context.window().end(),
pageViews: 0,
errors: [],
performance: {
lcp: [],
fid: [],
cls: []
},
interactions: 0
};
for (const log of logs) {
switch (log.type) {
case 'pageview':
session.pageViews++;
break;
case 'error':
session.errors.push({
message: log.payload.message,
stack: log.payload.symbolicatedStack,
count: 1
});
break;
case 'performance':
if (log.payload.metric === 'LCP') {
session.performance.lcp.push(log.payload.value);
}
// ... other metrics
break;
case 'interaction':
session.interactions++;
break;
}
}
// Group and dedupe errors
session.errors = this.dedupeErrors(session.errors);
// Aggregate performance
session.performance.lcpP75 = this.percentile(session.performance.lcp, 75);
out.collect(session);
}
private dedupeErrors(errors: ErrorEntry[]): ErrorEntry[] {
const grouped = new Map<string, ErrorEntry>();
for (const error of errors) {
const key = this.errorFingerprint(error);
const existing = grouped.get(key);
if (existing) {
existing.count++;
} else {
grouped.set(key, { ...error });
}
}
return Array.from(grouped.values());
}
private errorFingerprint(error: ErrorEntry): string {
// Fingerprint by message + first stack frame
const firstFrame = error.stack?.split('\n')[0] || '';
return `${error.message}|${firstFrame}`;
}
}
Error Grouping
// Intelligent error grouping
interface ErrorGroup {
id: string;
fingerprint: string;
message: string;
stack: string;
firstSeen: number;
lastSeen: number;
count: number;
affectedUsers: number;
affectedSessions: number;
browsers: Record<string, number>;
devices: Record<string, number>;
pages: Record<string, number>;
status: 'new' | 'ongoing' | 'regressed' | 'resolved';
}
class ErrorGrouper {
private groups = new Map<string, ErrorGroup>();
addError(error: ParsedError): ErrorGroup {
const fingerprint = this.computeFingerprint(error);
const existing = this.groups.get(fingerprint);
if (existing) {
return this.updateGroup(existing, error);
} else {
const group = this.createGroup(fingerprint, error);
this.groups.set(fingerprint, group);
return group;
}
}
private computeFingerprint(error: ParsedError): string {
// Strategy: Combine normalized message + top stack frames
// Normalize message (remove variable parts)
const normalizedMessage = this.normalizeMessage(error.message);
// Get top N stack frames (symbolicated)
const topFrames = this.getTopFrames(error.symbolicatedStack, 3);
// Combine
const parts = [normalizedMessage, ...topFrames];
// Hash
return this.hash(parts.join('|'));
}
private normalizeMessage(message: string): string {
return message
// Remove UUIDs
.replace(/[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}/gi, '{uuid}')
// Remove numbers
.replace(/\b\d+\b/g, '{number}')
// Remove URLs
.replace(/https?:\/\/[^\s]+/g, '{url}')
// Remove quotes strings
.replace(/"[^"]+"/g, '"{string}"')
.replace(/'[^']+'/g, "'{string}'");
}
private getTopFrames(stack: string | undefined, count: number): string[] {
if (!stack) return [];
const lines = stack.split('\n');
const frames: string[] = [];
for (const line of lines) {
// Parse frame
const match = line.match(/at\s+(\S+)\s+\((.+):(\d+):(\d+)\)/);
if (match) {
const [, func, file, line, col] = match;
// Normalize
const normalizedFile = file.replace(/\?.*$/, ''); // Remove query string
frames.push(`${func}@${normalizedFile}:${line}`);
if (frames.length >= count) break;
}
}
return frames;
}
private updateGroup(group: ErrorGroup, error: ParsedError): ErrorGroup {
group.lastSeen = Date.now();
group.count++;
// Update affected counts (using HyperLogLog in production)
if (error.userId) {
// Track unique users
}
if (error.sessionId) {
// Track unique sessions
}
// Update distributions
const browser = error.device.browser;
group.browsers[browser] = (group.browsers[browser] || 0) + 1;
const device = error.device.deviceType;
group.devices[device] = (group.devices[device] || 0) + 1;
const page = error.page.path;
group.pages[page] = (group.pages[page] || 0) + 1;
// Check for regression
if (group.status === 'resolved') {
group.status = 'regressed';
}
return group;
}
}
Storage Architecture
Tiered Storage
┌─────────────────────────────────────────────────────────────────────────────┐
│ LOG STORAGE TIERS │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ HOT TIER (0-7 days) │ │
│ │ │ │
│ │ Storage: ClickHouse / Elasticsearch │ │
│ │ Volume: ~100TB │ │
│ │ Query latency: <1 second │ │
│ │ Cost: $$$ │ │
│ │ │ │
│ │ Use cases: │ │
│ │ • Real-time debugging │ │
│ │ • Active incident investigation │ │
│ │ • Alert evaluation │ │
│ │ • Dashboard queries │ │
│ │ │ │
│ │ Features: │ │
│ │ • Full-text search │ │
│ │ • All columns indexed │ │
│ │ • Sub-second aggregations │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ │ Age > 7 days │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ WARM TIER (7-30 days) │ │
│ │ │ │
│ │ Storage: ClickHouse (cold replica) / S3 + Athena │ │
│ │ Volume: ~500TB │ │
│ │ Query latency: 5-30 seconds │ │
│ │ Cost: $$ │ │
│ │ │ │
│ │ Use cases: │ │
│ │ • Trend analysis │ │
│ │ • Post-mortem investigations │ │
│ │ • Weekly/monthly reports │ │
│ │ │ │
│ │ Features: │ │
│ │ • Columnar storage │ │
│ │ • Compressed │ │
│ │ • Partitioned by date │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ │ Age > 30 days │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ COLD TIER (30 days - 1 year) │ │
│ │ │ │
│ │ Storage: S3 + Glacier / BigQuery (cold) │ │
│ │ Volume: ~5PB (sampled/aggregated) │ │
│ │ Query latency: Minutes to hours │ │
│ │ Cost: $ │ │
│ │ │ │
│ │ Use cases: │ │
│ │ • Compliance / audit │ │
│ │ • Year-over-year analysis │ │
│ │ • ML training data │ │
│ │ │ │
│ │ Features: │ │
│ │ • Heavily compressed │ │
│ │ • Sampled (10% of original) │ │
│ │ • Pre-aggregated metrics │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
ClickHouse Schema
-- ClickHouse schema for frontend logs
CREATE TABLE frontend_logs
(
-- Identity
id UUID,
timestamp DateTime64(3),
type Enum8('error' = 0, 'performance' = 1, 'interaction' = 2, 'debug' = 3),
-- Context
customer_id String,
session_id String,
user_id Nullable(String),
-- Page
page_url String,
page_path LowCardinality(String),
-- Device
device_type Enum8('mobile' = 0, 'tablet' = 1, 'desktop' = 2),
browser LowCardinality(String),
browser_version LowCardinality(String),
os LowCardinality(String),
-- Geo
country LowCardinality(String),
region LowCardinality(String),
city String,
-- Payload (varies by type)
error_message Nullable(String),
error_stack Nullable(String),
error_fingerprint Nullable(String),
perf_metric Nullable(String),
perf_value Nullable(Float64),
interaction_name Nullable(String),
-- App
app_version LowCardinality(String),
sdk_version LowCardinality(String)
)
ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY (customer_id, type, timestamp)
TTL timestamp + INTERVAL 7 DAY TO DISK 'warm',
timestamp + INTERVAL 30 DAY TO DISK 'cold',
timestamp + INTERVAL 365 DAY DELETE
SETTINGS index_granularity = 8192;
-- Materialized view for error aggregates
CREATE MATERIALIZED VIEW frontend_errors_hourly
ENGINE = SummingMergeTree()
PARTITION BY toYYYYMMDD(hour)
ORDER BY (customer_id, error_fingerprint, hour)
AS SELECT
customer_id,
error_fingerprint,
toStartOfHour(timestamp) AS hour,
count() AS count,
uniqExact(user_id) AS affected_users,
uniqExact(session_id) AS affected_sessions,
anyLast(error_message) AS message,
anyLast(error_stack) AS stack
FROM frontend_logs
WHERE type = 'error'
GROUP BY customer_id, error_fingerprint, hour;
-- Materialized view for performance percentiles
CREATE MATERIALIZED VIEW frontend_perf_hourly
ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMMDD(hour)
ORDER BY (customer_id, page_path, perf_metric, hour)
AS SELECT
customer_id,
page_path,
perf_metric,
toStartOfHour(timestamp) AS hour,
quantileState(0.5)(perf_value) AS p50,
quantileState(0.75)(perf_value) AS p75,
quantileState(0.95)(perf_value) AS p95,
quantileState(0.99)(perf_value) AS p99,
count() AS sample_count
FROM frontend_logs
WHERE type = 'performance' AND perf_metric IS NOT NULL
GROUP BY customer_id, page_path, perf_metric, hour;
Query Patterns
Common Queries
-- Find errors by fingerprint with full context
SELECT
error_message,
error_stack,
page_path,
browser,
device_type,
country,
count() AS occurrences,
uniqExact(session_id) AS sessions,
min(timestamp) AS first_seen,
max(timestamp) AS last_seen
FROM frontend_logs
WHERE
customer_id = 'cust_123'
AND type = 'error'
AND timestamp > now() - INTERVAL 24 HOUR
GROUP BY error_message, error_stack, page_path, browser, device_type, country
ORDER BY occurrences DESC
LIMIT 100;
-- Performance percentiles by page over time
SELECT
toStartOfFifteenMinutes(timestamp) AS bucket,
page_path,
quantile(0.75)(perf_value) AS p75,
quantile(0.95)(perf_value) AS p95,
count() AS samples
FROM frontend_logs
WHERE
customer_id = 'cust_123'
AND type = 'performance'
AND perf_metric = 'LCP'
AND timestamp > now() - INTERVAL 6 HOUR
GROUP BY bucket, page_path
ORDER BY bucket;
-- Session reconstruction
SELECT
timestamp,
type,
page_path,
CASE
WHEN type = 'error' THEN error_message
WHEN type = 'performance' THEN concat(perf_metric, ': ', toString(perf_value))
WHEN type = 'interaction' THEN interaction_name
ELSE 'debug'
END AS event_detail
FROM frontend_logs
WHERE
customer_id = 'cust_123'
AND session_id = 'sess_abc123'
ORDER BY timestamp;
-- Error rate by browser/version
SELECT
browser,
browser_version,
countIf(type = 'error') AS errors,
countIf(type = 'pageview') AS pageviews,
errors / pageviews * 100 AS error_rate_pct
FROM frontend_logs
WHERE
customer_id = 'cust_123'
AND timestamp > now() - INTERVAL 7 DAY
GROUP BY browser, browser_version
HAVING pageviews > 1000
ORDER BY error_rate_pct DESC;
Alerting on Logs
Alert Rule Engine
// Log-based alerting
interface LogAlertRule {
id: string;
name: string;
query: LogQuery;
condition: AlertCondition;
window: number; // seconds
severity: 'critical' | 'warning' | 'info';
channels: string[];
}
interface LogQuery {
type?: LogType[];
filters: Record<string, string | string[]>;
aggregation?: 'count' | 'rate' | 'distinct';
groupBy?: string[];
}
const alertRules: LogAlertRule[] = [
// Error spike detection
{
id: 'error-spike',
name: 'Error Rate Spike',
query: {
type: ['error'],
aggregation: 'rate'
},
condition: {
type: 'anomaly',
baseline: 'rolling-7d',
threshold: 3 // 3 standard deviations
},
window: 300, // 5 minutes
severity: 'critical',
channels: ['pagerduty', 'slack-critical']
},
// New error type
{
id: 'new-error',
name: 'New Error Type Detected',
query: {
type: ['error'],
aggregation: 'distinct',
groupBy: ['error_fingerprint']
},
condition: {
type: 'new-value',
lookbackDays: 7
},
window: 60,
severity: 'warning',
channels: ['slack-frontend']
},
// Performance regression
{
id: 'lcp-regression',
name: 'LCP P75 Regression',
query: {
type: ['performance'],
filters: { perf_metric: 'LCP' },
aggregation: 'percentile',
percentile: 75
},
condition: {
type: 'threshold',
operator: 'gt',
value: 2500,
sustainedMinutes: 10
},
window: 600,
severity: 'warning',
channels: ['slack-frontend']
},
// Session error cascade
{
id: 'session-cascade',
name: 'Session Error Cascade',
query: {
type: ['error'],
aggregation: 'count',
groupBy: ['session_id']
},
condition: {
type: 'threshold',
operator: 'gt',
value: 5 // More than 5 errors in one session
},
window: 300,
severity: 'info',
channels: ['slack-frontend']
}
];
class LogAlertEvaluator {
async evaluate(rule: LogAlertRule): Promise<Alert | null> {
const results = await this.executeQuery(rule.query, rule.window);
const triggered = this.checkCondition(rule.condition, results);
if (triggered) {
return {
ruleId: rule.id,
severity: rule.severity,
message: this.buildAlertMessage(rule, results),
context: results
};
}
return null;
}
private async executeQuery(
query: LogQuery,
windowSeconds: number
): Promise<QueryResult> {
// Build and execute ClickHouse query
const sql = this.buildSQL(query, windowSeconds);
return await this.clickhouse.query(sql);
}
private buildSQL(query: LogQuery, windowSeconds: number): string {
let sql = 'SELECT ';
if (query.aggregation === 'count') {
sql += 'count() AS value';
} else if (query.aggregation === 'rate') {
sql += `count() / ${windowSeconds} AS value`;
} else if (query.aggregation === 'distinct') {
sql += `uniqExact(${query.groupBy?.[0] || 'session_id'}) AS value`;
}
if (query.groupBy) {
sql += `, ${query.groupBy.join(', ')}`;
}
sql += ' FROM frontend_logs WHERE timestamp > now() - INTERVAL ';
sql += `${windowSeconds} SECOND`;
if (query.type) {
sql += ` AND type IN (${query.type.map(t => `'${t}'`).join(', ')})`;
}
for (const [key, value] of Object.entries(query.filters || {})) {
if (Array.isArray(value)) {
sql += ` AND ${key} IN (${value.map(v => `'${v}'`).join(', ')})`;
} else {
sql += ` AND ${key} = '${value}'`;
}
}
if (query.groupBy) {
sql += ` GROUP BY ${query.groupBy.join(', ')}`;
}
return sql;
}
}
Summary
Frontend logging at scale is not just "backend logging but in browsers." The constraints are fundamentally different: you're collecting from untrusted, uncontrolled environments with strict performance budgets and massive volume.
Key Architectural Principles:
-
Sample ruthlessly, but never errors - Behavioral logs can be 1% sampled. Errors are always 100%.
-
Batch and compress client-side - Never send individual log entries. Batch 50-100, gzip, send.
-
Use sendBeacon for reliability - fetch fails on page unload. sendBeacon survives navigation.
-
Fingerprint and group errors server-side - Don't send duplicate stacks. Dedupe by fingerprint.
-
Symbolicate early in the pipeline - Stack traces are useless without source maps. Do it in stream processing.
-
Tier your storage by access pattern - Hot (7d), warm (30d), cold (1y). Different cost/latency tradeoffs.
-
Pre-aggregate for dashboards - Don't query raw logs for percentiles. Use materialized views.
-
PII scrub everywhere - Emails, cards, SSNs can appear anywhere. Scrub client-side AND server-side.
-
Rate limit at the edge - Runaway clients can DoS your pipeline. Enforce limits per-client.
-
Alert on log patterns, not just metrics - New error fingerprints, error cascades, and behavioral anomalies are all detectable from logs.
Frontend logging done right gives you superpowers: instant visibility into production issues, session-level debugging, and real user experience data. Done wrong, it's an expensive, noisy pipeline that drowns signal in volume.
Build for scale from day one—retrofitting sampling and tiering onto an overloaded system is painful.
What did you think?