RUM vs Synthetic Monitoring Systems

June 20, 2026141 min read0 views

rum

synthetic monitoring

frontend observability

monitoring systems

web performance

frontend architecture

telemetry

performance engineering

user experience

RUM vs Synthetic Monitoring Systems

Introduction

"Is the site slow?" seems like a simple question. It isn't. Slow for whom? Slow compared to what? Slow right now or on average? Slow for a user in Tokyo on 4G or a developer in the office on gigabit fiber?

This is why production frontend monitoring requires two fundamentally different approaches: Real User Monitoring (RUM) captures what actual users experience in the wild, while Synthetic Monitoring continuously tests known scenarios from controlled environments. Neither approach alone gives you the full picture.

Understanding when each approach excels—and more importantly, when each fails—is essential for building a monitoring strategy that catches regressions, identifies issues, and gives you confidence in your frontend's health.

This deep dive examines both approaches in detail: their architectures, their tradeoffs, their blind spots, and how to combine them for comprehensive frontend observability.

Scale Context

Production monitoring deployment we're examining:

Metric	RUM	Synthetic
Data Points per Day	5B+	500K
Unique Users Monitored	25M	N/A
Unique Device/Browser Combos	50,000+	20-50
Geographic Coverage	190 countries	30-50 locations
Test Scenarios	N/A (user-driven)	200-500 scripts
Alert Detection Time (P50)	5-15 minutes	2-5 minutes
Coverage of Edge Cases	High (emergent)	Low (scripted)
Cost per Data Point	~$0.00001	~$0.001
Data Retention	30-90 days	1-2 years

The order of magnitude difference in data volume and cost fundamentally shapes how each system is architected and used.

Fundamental Differences

Conceptual Model

┌─────────────────────────────────────────────────────────────────────────────┐
│                    RUM VS SYNTHETIC: CONCEPTUAL MODEL                        │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                    REAL USER MONITORING (RUM)                        │    │
│  │                                                                      │    │
│  │  Data Source: Actual user browsers                                  │    │
│  │  Question:    "What did users experience?"                          │    │
│  │  Trigger:     User actions (organic)                                │    │
│  │  Coverage:    Whatever users do                                     │    │
│  │  Environment: Uncontrolled (real world)                             │    │
│  │  Variance:    High (device, network, behavior varies)               │    │
│  │  Latency:     Reactive (after users experience issues)              │    │
│  │                                                                      │    │
│  │  ┌────────────────────────────────────────────────────────────────┐ │    │
│  │  │  User A      User B      User C      User D      User E       │ │    │
│  │  │  iPhone/4G   Desktop/    Android/    MacBook/    Desktop/     │ │    │
│  │  │  Mumbai      Fiber NYC   3G Brazil   WiFi London IE11/DSL     │ │    │
│  │  │  LCP: 4.2s   LCP: 1.1s   LCP: 8.3s   LCP: 1.8s   LCP: 6.1s    │ │    │
│  │  └────────────────────────────────────────────────────────────────┘ │    │
│  │                                                                      │    │
│  │  Result: Statistical distribution of real experience                │    │
│  │  P50: 2.1s  P75: 3.4s  P95: 7.2s                                   │    │
│  │                                                                      │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                    SYNTHETIC MONITORING                              │    │
│  │                                                                      │    │
│  │  Data Source: Controlled test agents                                │    │
│  │  Question:    "Is the site working correctly right now?"           │    │
│  │  Trigger:     Scheduled (every N minutes)                           │    │
│  │  Coverage:    Pre-defined test scenarios                            │    │
│  │  Environment: Controlled (consistent agents)                        │    │
│  │  Variance:    Low (by design)                                       │    │
│  │  Latency:     Proactive (detect before users)                       │    │
│  │                                                                      │    │
│  │  ┌────────────────────────────────────────────────────────────────┐ │    │
│  │  │  Agent NYC    Agent London   Agent Tokyo    Agent Sydney      │ │    │
│  │  │  Chrome/      Chrome/        Chrome/        Chrome/           │ │    │
│  │  │  Cable        Cable          Cable          Cable             │ │    │
│  │  │  LCP: 1.2s    LCP: 1.1s      LCP: 1.3s      LCP: 1.4s         │ │    │
│  │  │                                                                │ │    │
│  │  │  Same script, same conditions, same browsers                  │ │    │
│  │  └────────────────────────────────────────────────────────────────┘ │    │
│  │                                                                      │    │
│  │  Result: Consistent baseline for comparison                         │    │
│  │  Avg: 1.25s  StdDev: 0.1s                                          │    │
│  │                                                                      │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

When Each Approach Shines

Scenario	RUM	Synthetic	Why
Deployment validation	❌	✓	Need immediate feedback, not enough RUM data yet
Performance regression detection	✓	✓	Both useful, synthetic faster, RUM more accurate
User segment analysis	✓	❌	Only RUM has real user segments
Uptime monitoring	❌	✓	RUM requires users; no users = no data
Third-party impact	✓	⚠️	RUM sees real third-party; synthetic may get different treatment
Geographic performance	✓	⚠️	RUM sees real ISPs; synthetic sees data center networks
Mobile performance	✓	⚠️	RUM sees real device diversity; synthetic limited
Critical flow testing	⚠️	✓	Synthetic runs flows on schedule; RUM depends on user volume
Competitive benchmarking	❌	✓	Can't RUM competitors' sites
Pre-launch testing	❌	✓	No users before launch
A/B test analysis	✓	❌	Need real user behavior in both variants

RUM Architecture Deep Dive

Data Collection Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                         RUM ARCHITECTURE                                     │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                     BROWSER (Data Source)                            │    │
│  │                                                                      │    │
│  │  ┌─────────────────────────────────────────────────────────────┐    │    │
│  │  │                    RUM SDK                                   │    │    │
│  │  │                                                              │    │    │
│  │  │  Collectors:                                                 │    │    │
│  │  │  ├── Performance Observer (LCP, FID, CLS, Long Tasks)       │    │    │
│  │  │  ├── Navigation Timing API                                   │    │    │
│  │  │  ├── Resource Timing API                                     │    │    │
│  │  │  ├── Error Capture (window.onerror, unhandledrejection)     │    │    │
│  │  │  ├── Network Interception (fetch, XHR)                      │    │    │
│  │  │  └── User Interaction Tracking                               │    │    │
│  │  │                                                              │    │    │
│  │  │  Enrichment:                                                 │    │    │
│  │  │  ├── Device/Browser detection                               │    │    │
│  │  │  ├── Session management                                      │    │    │
│  │  │  ├── User identity (if authenticated)                       │    │    │
│  │  │  └── Custom context (feature flags, A/B variant)            │    │    │
│  │  │                                                              │    │    │
│  │  │  Transport:                                                  │    │    │
│  │  │  ├── Batching (100 events or 10 seconds)                    │    │    │
│  │  │  ├── Sampling (1-100% configurable)                         │    │    │
│  │  │  ├── Compression (gzip)                                      │    │    │
│  │  │  └── sendBeacon / fetch with keepalive                      │    │    │
│  │  │                                                              │    │    │
│  │  └─────────────────────────────────────────────────────────────┘    │    │
│  │                                                                      │    │
│  └──────────────────────────────────────────────────────────────────────┘   │
│                                      │                                       │
│                                      │ HTTPS POST (batched, compressed)     │
│                                      ▼                                       │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                    INGESTION LAYER                                   │    │
│  │                                                                      │    │
│  │  ┌─────────────────────────────────────────────────────────────┐    │    │
│  │  │                 EDGE COLLECTORS                              │    │    │
│  │  │                 (Deployed at CDN edge)                       │    │    │
│  │  │                                                              │    │    │
│  │  │  • Low latency ingest (close to users)                      │    │    │
│  │  │  • Validation & rate limiting                               │    │    │
│  │  │  • GeoIP enrichment                                         │    │    │
│  │  │  • Initial filtering (bots, spam)                           │    │    │
│  │  │                                                              │    │    │
│  │  └─────────────────────────────────────────────────────────────┘    │    │
│  │                          │                                           │    │
│  │                          ▼                                           │    │
│  │  ┌─────────────────────────────────────────────────────────────┐    │    │
│  │  │                 MESSAGE QUEUE                                │    │    │
│  │  │                 (Kafka / Kinesis)                            │    │    │
│  │  │                                                              │    │    │
│  │  │  Partitioned by:                                             │    │    │
│  │  │  • Customer (multi-tenant isolation)                        │    │    │
│  │  │  • Event type (different processing needs)                  │    │    │
│  │  │                                                              │    │    │
│  │  └─────────────────────────────────────────────────────────────┘    │    │
│  │                                                                      │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                      │                                       │
│                                      ▼                                       │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                    PROCESSING LAYER                                  │    │
│  │                                                                      │    │
│  │  ┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐  │    │
│  │  │ Stream Processor │  │  Aggregation     │  │    Alerting      │  │    │
│  │  │ (Flink/Spark)    │  │  Engine          │  │    Engine        │  │    │
│  │  │                  │  │                  │  │                  │  │    │
│  │  │ • Sessionization │  │ • Time-series    │  │ • Threshold      │  │    │
│  │  │ • Error grouping │  │   rollups        │  │ • Anomaly        │  │    │
│  │  │ • Stack symb.    │  │ • Dimensional    │  │ • Composite      │  │    │
│  │  │ • User journeys  │  │   aggregation    │  │                  │  │    │
│  │  └──────────────────┘  └──────────────────┘  └──────────────────┘  │    │
│  │                                                                      │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                      │                                       │
│                                      ▼                                       │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                    STORAGE LAYER                                     │    │
│  │                                                                      │    │
│  │  ┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐  │    │
│  │  │  Time-Series DB  │  │   Raw Events     │  │   Error Store    │  │    │
│  │  │  (Metrics)       │  │   (Data Lake)    │  │   (Grouped)      │  │    │
│  │  │                  │  │                  │  │                  │  │    │
│  │  │  30 days hot     │  │  90 days         │  │  Indefinite      │  │    │
│  │  │  1 year cold     │  │  (sampled cold)  │  │                  │  │    │
│  │  └──────────────────┘  └──────────────────┘  └──────────────────┘  │    │
│  │                                                                      │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

RUM SDK Implementation

// Production RUM SDK architecture

interface RUMConfig {
  appId: string;
  apiKey: string;
  endpoint: string;
  version: string;

  // Sampling
  sampleRate: number;           // 0-1, default 1.0
  errorSampleRate: number;      // Typically 1.0 (capture all errors)

  // Privacy
  maskAllInputs: boolean;
  maskAllText: boolean;
  blockSelector: string[];      // CSS selectors to exclude

  // Performance
  trackResources: boolean;
  trackLongTasks: boolean;
  resourceTimingBufferSize: number;

  // Session
  sessionTimeout: number;       // ms
  maxSessionDuration: number;   // ms
}

class RUMSDK {
  private config: RUMConfig;
  private session: Session;
  private queue: RUMEvent[] = [];
  private observers: PerformanceObserver[] = [];

  constructor(config: RUMConfig) {
    this.config = config;
    this.initialize();
  }

  private initialize(): void {
    // Initialize session
    this.session = this.initSession();

    // Start core collectors
    this.initPerformanceObservers();
    this.initErrorCapture();
    this.initNetworkCapture();
    this.initUserInteractionCapture();

    // Start transport
    this.initTransport();
  }

  private initPerformanceObservers(): void {
    // Largest Contentful Paint
    this.observe('largest-contentful-paint', (entries) => {
      const last = entries[entries.length - 1] as any;
      this.track({
        type: 'web-vital',
        name: 'LCP',
        value: last.startTime,
        element: this.describeElement(last.element),
        url: last.url
      });
    });

    // First Input Delay / INP
    this.observe('first-input', (entries) => {
      const entry = entries[0] as PerformanceEventTiming;
      this.track({
        type: 'web-vital',
        name: 'FID',
        value: entry.processingStart - entry.startTime,
        eventType: entry.name,
        target: this.describeElement(entry.target as Element)
      });
    });

    // Interaction to Next Paint
    this.observe('event', (entries) => {
      for (const entry of entries as PerformanceEventTiming[]) {
        if (entry.interactionId && entry.duration > 40) {
          this.track({
            type: 'interaction',
            name: entry.name,
            duration: entry.duration,
            inputDelay: entry.processingStart - entry.startTime,
            processingTime: entry.processingEnd - entry.processingStart,
            interactionId: entry.interactionId
          });
        }
      }
    }, { durationThreshold: 16 });

    // Layout Shift
    let clsValue = 0;
    this.observe('layout-shift', (entries) => {
      for (const entry of entries as LayoutShift[]) {
        if (!entry.hadRecentInput) {
          clsValue += entry.value;
        }
      }
    });

    // Report CLS on visibility change
    document.addEventListener('visibilitychange', () => {
      if (document.visibilityState === 'hidden') {
        this.track({
          type: 'web-vital',
          name: 'CLS',
          value: clsValue
        });
      }
    });

    // Long Tasks
    if (this.config.trackLongTasks) {
      this.observe('longtask', (entries) => {
        for (const entry of entries) {
          this.track({
            type: 'long-task',
            duration: entry.duration,
            startTime: entry.startTime,
            attribution: (entry as any).attribution
          });
        }
      });
    }

    // Resource Timing
    if (this.config.trackResources) {
      this.observe('resource', (entries) => {
        for (const entry of entries as PerformanceResourceTiming[]) {
          // Only track significant resources
          if (entry.duration > 50 || entry.transferSize > 50000) {
            this.track({
              type: 'resource',
              name: entry.name,
              initiatorType: entry.initiatorType,
              duration: entry.duration,
              transferSize: entry.transferSize,
              decodedBodySize: entry.decodedBodySize
            });
          }
        }
      });
    }
  }

  private observe(
    type: string,
    callback: (entries: PerformanceEntryList) => void,
    options?: PerformanceObserverInit
  ): void {
    try {
      const observer = new PerformanceObserver((list) => {
        callback(list.getEntries());
      });

      observer.observe({ type, buffered: true, ...options });
      this.observers.push(observer);
    } catch (e) {
      // Observer type not supported
    }
  }

  private initErrorCapture(): void {
    // Global errors
    window.addEventListener('error', (event) => {
      if (this.shouldSampleError()) {
        this.track({
          type: 'error',
          errorType: 'uncaught',
          message: event.message,
          filename: event.filename,
          lineno: event.lineno,
          colno: event.colno,
          stack: event.error?.stack
        });
      }
    });

    // Unhandled promise rejections
    window.addEventListener('unhandledrejection', (event) => {
      if (this.shouldSampleError()) {
        this.track({
          type: 'error',
          errorType: 'unhandled-rejection',
          message: event.reason?.message || String(event.reason),
          stack: event.reason?.stack
        });
      }
    });
  }

  private initNetworkCapture(): void {
    // Intercept fetch
    const originalFetch = window.fetch;
    window.fetch = async (input, init) => {
      const url = typeof input === 'string' ? input : input.url;
      const method = init?.method || 'GET';
      const startTime = performance.now();

      try {
        const response = await originalFetch(input, init);
        const duration = performance.now() - startTime;

        this.track({
          type: 'request',
          method,
          url: this.sanitizeUrl(url),
          status: response.status,
          duration,
          ok: response.ok
        });

        return response;
      } catch (error) {
        const duration = performance.now() - startTime;

        this.track({
          type: 'request',
          method,
          url: this.sanitizeUrl(url),
          status: 0,
          duration,
          ok: false,
          error: error.message
        });

        throw error;
      }
    };
  }

  private initUserInteractionCapture(): void {
    // Click tracking with rage click detection
    let clickBuffer: { timestamp: number; x: number; y: number }[] = [];

    document.addEventListener('click', (event) => {
      const now = Date.now();
      const click = { timestamp: now, x: event.clientX, y: event.clientY };

      // Detect rage clicks (3+ clicks in same area within 1 second)
      clickBuffer.push(click);
      clickBuffer = clickBuffer.filter(c => now - c.timestamp < 1000);

      const nearbyClicks = clickBuffer.filter(c =>
        Math.abs(c.x - click.x) < 30 && Math.abs(c.y - click.y) < 30
      );

      if (nearbyClicks.length >= 3) {
        this.track({
          type: 'frustration',
          name: 'rage-click',
          target: this.describeElement(event.target as Element),
          clickCount: nearbyClicks.length
        });
      }

      // Regular click tracking (sampled)
      if (this.shouldSample()) {
        this.track({
          type: 'interaction',
          name: 'click',
          target: this.describeElement(event.target as Element)
        });
      }
    }, true);
  }

  track(event: Partial<RUMEvent>): void {
    const fullEvent: RUMEvent = {
      ...event,
      timestamp: Date.now(),
      sessionId: this.session.id,
      pageUrl: window.location.href,
      pageTitle: document.title,
      viewport: {
        width: window.innerWidth,
        height: window.innerHeight
      },
      connection: this.getConnectionInfo(),
      appVersion: this.config.version
    } as RUMEvent;

    this.queue.push(fullEvent);
  }

  private shouldSample(): boolean {
    return Math.random() < this.config.sampleRate;
  }

  private shouldSampleError(): boolean {
    return Math.random() < this.config.errorSampleRate;
  }
}

RUM Sampling Strategies

// Intelligent sampling for RUM

interface SamplingStrategy {
  type: 'fixed' | 'adaptive' | 'priority';
  baseRate: number;
  rules?: SamplingRule[];
}

interface SamplingRule {
  condition: (event: RUMEvent, session: Session) => boolean;
  rate: number;
  priority?: number;
}

class AdaptiveSampler {
  private baseRate: number;
  private rules: SamplingRule[];
  private sessionDecisions = new Map<string, boolean>();

  constructor(strategy: SamplingStrategy) {
    this.baseRate = strategy.baseRate;
    this.rules = strategy.rules || [];
  }

  shouldSample(event: RUMEvent, session: Session): boolean {
    // Session-level sampling: once decided, stick with it
    if (event.type === 'page-view') {
      const decision = this.decideSession(session);
      this.sessionDecisions.set(session.id, decision);
      return decision;
    }

    // Check session decision
    const sessionDecision = this.sessionDecisions.get(session.id);
    if (sessionDecision === false) {
      // Session not sampled, but errors always go through
      if (event.type === 'error') {
        return true;
      }
      return false;
    }

    // Event-level rules
    for (const rule of this.rules) {
      if (rule.condition(event, session)) {
        return Math.random() < rule.rate;
      }
    }

    return true; // Session was sampled, include event
  }

  private decideSession(session: Session): boolean {
    // Priority rules that override base rate
    const priorityRules = this.rules
      .filter(r => r.priority !== undefined)
      .sort((a, b) => (b.priority || 0) - (a.priority || 0));

    for (const rule of priorityRules) {
      // Check first event or session attributes
      if (this.sessionMatchesRule(session, rule)) {
        return Math.random() < rule.rate;
      }
    }

    return Math.random() < this.baseRate;
  }

  private sessionMatchesRule(session: Session, rule: SamplingRule): boolean {
    // Simplified session matching
    return rule.condition({} as RUMEvent, session);
  }
}

// Example sampling rules
const samplingStrategy: SamplingStrategy = {
  type: 'adaptive',
  baseRate: 0.1, // 10% base sampling

  rules: [
    // Always sample authenticated users (more valuable)
    {
      condition: (_, session) => !!session.userId,
      rate: 0.5, // 50% of auth users
      priority: 10
    },

    // Always sample critical pages
    {
      condition: (event) =>
        event.pageUrl?.includes('/checkout') ||
        event.pageUrl?.includes('/payment'),
      rate: 1.0, // 100% of checkout/payment
      priority: 20
    },

    // Reduce sampling for high-volume pages
    {
      condition: (event) =>
        event.pageUrl?.includes('/search') ||
        event.pageUrl?.includes('/browse'),
      rate: 0.01, // 1% of search/browse
      priority: 5
    },

    // Always capture errors
    {
      condition: (event) => event.type === 'error',
      rate: 1.0
    },

    // Always capture poor performance
    {
      condition: (event) =>
        event.type === 'web-vital' &&
        event.name === 'LCP' &&
        event.value > 4000,
      rate: 1.0
    },

    // Reduce sampling on slow connections (high volume, biased data)
    {
      condition: (_, session) =>
        session.device?.connection === 'slow-2g',
      rate: 0.01,
      priority: 1
    }
  ]
};

Synthetic Monitoring Architecture Deep Dive

System Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                    SYNTHETIC MONITORING ARCHITECTURE                         │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                    CONTROL PLANE                                     │    │
│  │                                                                      │    │
│  │  ┌─────────────────────────────────────────────────────────────┐    │    │
│  │  │                 SCHEDULER                                    │    │    │
│  │  │                                                              │    │    │
│  │  │  Manages test execution:                                     │    │    │
│  │  │  • Cron-like scheduling (every 1, 5, 15 min, etc.)          │    │    │
│  │  │  • Geographic distribution                                   │    │    │
│  │  │  • Test prioritization                                       │    │    │
│  │  │  • Rate limiting per target                                  │    │    │
│  │  │                                                              │    │    │
│  │  │  ┌──────────────────────────────────────────────────────┐   │    │    │
│  │  │  │  Test Queue                                          │   │    │    │
│  │  │  │  ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐   │   │    │    │
│  │  │  │  │NYC-1│ │LON-1│ │TYO-1│ │NYC-2│ │SYD-1│ │...  │   │   │    │    │
│  │  │  │  └─────┘ └─────┘ └─────┘ └─────┘ └─────┘ └─────┘   │   │    │    │
│  │  │  └──────────────────────────────────────────────────────┘   │    │    │
│  │  │                                                              │    │    │
│  │  └─────────────────────────────────────────────────────────────┘    │    │
│  │                                                                      │    │
│  │  ┌─────────────────────────────────────────────────────────────┐    │    │
│  │  │                 TEST CONFIGURATION                           │    │    │
│  │  │                                                              │    │    │
│  │  │  • Script repository                                         │    │    │
│  │  │  • Environment variables / secrets                           │    │    │
│  │  │  • Threshold definitions                                     │    │    │
│  │  │  • Alert configurations                                      │    │    │
│  │  │                                                              │    │    │
│  │  └─────────────────────────────────────────────────────────────┘    │    │
│  │                                                                      │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                      │                                       │
│                      Test Assignment │                                       │
│                                      ▼                                       │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                    AGENT LAYER                                       │    │
│  │                    (Globally Distributed)                            │    │
│  │                                                                      │    │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                 │    │
│  │  │  NYC Agent  │  │ London Agent│  │ Tokyo Agent │  ...            │    │
│  │  │             │  │             │  │             │                 │    │
│  │  │ ┌─────────┐ │  │ ┌─────────┐ │  │ ┌─────────┐ │                 │    │
│  │  │ │ Chrome  │ │  │ │ Chrome  │ │  │ │ Chrome  │ │                 │    │
│  │  │ │ Headless│ │  │ │ Headless│ │  │ │ Headless│ │                 │    │
│  │  │ └─────────┘ │  │ └─────────┘ │  │ └─────────┘ │                 │    │
│  │  │ ┌─────────┐ │  │ ┌─────────┐ │  │ ┌─────────┐ │                 │    │
│  │  │ │ Firefox │ │  │ │ Firefox │ │  │ │ Firefox │ │                 │    │
│  │  │ │ Headless│ │  │ │ Headless│ │  │ │ Headless│ │                 │    │
│  │  │ └─────────┘ │  │ └─────────┘ │  │ └─────────┘ │                 │    │
│  │  │             │  │             │  │             │                 │    │
│  │  │ Network:    │  │ Network:    │  │ Network:    │                 │    │
│  │  │ Cable sim   │  │ Cable sim   │  │ Cable sim   │                 │    │
│  │  │ 3G sim      │  │ 3G sim      │  │ 3G sim      │                 │    │
│  │  │             │  │             │  │             │                 │    │
│  │  └─────────────┘  └─────────────┘  └─────────────┘                 │    │
│  │                                                                      │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                      │                                       │
│                         Test Results │                                       │
│                                      ▼                                       │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                    RESULTS PROCESSING                                │    │
│  │                                                                      │    │
│  │  • Result validation                                                 │    │
│  │  • Metric extraction                                                 │    │
│  │  • Screenshot/video storage                                         │    │
│  │  • Alert evaluation                                                  │    │
│  │  • Trend analysis                                                    │    │
│  │                                                                      │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Test Script Architecture

// Synthetic test script structure

interface SyntheticTest {
  id: string;
  name: string;
  type: 'browser' | 'api' | 'multi-step';
  frequency: number; // minutes
  locations: string[];
  timeout: number; // ms
  retries: number;
  alerting: AlertConfig;
  script: TestScript;
}

interface TestScript {
  steps: TestStep[];
  assertions: Assertion[];
  customMetrics?: CustomMetric[];
}

interface TestStep {
  action: 'navigate' | 'click' | 'type' | 'wait' | 'screenshot' | 'custom';
  target?: string; // CSS selector or URL
  value?: string;
  timeout?: number;
  waitUntil?: 'load' | 'domcontentloaded' | 'networkidle';
}

// Example: E-commerce checkout flow test
const checkoutFlowTest: SyntheticTest = {
  id: 'checkout-flow-e2e',
  name: 'Checkout Flow - Complete Purchase',
  type: 'browser',
  frequency: 5, // Every 5 minutes
  locations: ['us-east', 'us-west', 'eu-west', 'ap-northeast'],
  timeout: 60000,
  retries: 1,
  alerting: {
    onFailure: { channels: ['pagerduty', 'slack-critical'] },
    onDegradation: { channels: ['slack-frontend'] },
    thresholds: {
      totalDuration: { warning: 15000, critical: 30000 },
      stepDuration: { warning: 5000, critical: 10000 }
    }
  },
  script: {
    steps: [
      {
        action: 'navigate',
        target: 'https://shop.example.com',
        waitUntil: 'networkidle'
      },
      {
        action: 'click',
        target: '[data-testid="product-card"]:first-child',
        timeout: 5000
      },
      {
        action: 'wait',
        target: '[data-testid="product-detail"]',
        timeout: 5000
      },
      {
        action: 'click',
        target: '[data-testid="add-to-cart"]'
      },
      {
        action: 'wait',
        target: '[data-testid="cart-indicator"]:not(:empty)',
        timeout: 3000
      },
      {
        action: 'navigate',
        target: 'https://shop.example.com/cart'
      },
      {
        action: 'click',
        target: '[data-testid="checkout-button"]'
      },
      {
        action: 'wait',
        target: '[data-testid="checkout-form"]',
        timeout: 5000
      },
      {
        action: 'screenshot',
        target: 'checkout-form-loaded'
      }
    ],
    assertions: [
      {
        type: 'elementPresent',
        target: '[data-testid="checkout-form"]'
      },
      {
        type: 'textContains',
        target: '[data-testid="cart-total"]',
        value: '$'
      }
    ],
    customMetrics: [
      {
        name: 'time_to_checkout_form',
        type: 'timing',
        startMark: 'navigation_start',
        endMark: 'checkout_form_visible'
      }
    ]
  }
};

Agent Implementation

// Synthetic monitoring agent

import puppeteer, { Browser, Page } from 'puppeteer';

interface TestResult {
  testId: string;
  runId: string;
  location: string;
  startTime: number;
  endTime: number;
  status: 'pass' | 'fail' | 'error';
  error?: string;
  metrics: TestMetrics;
  stepResults: StepResult[];
  screenshots: Screenshot[];
  networkRequests: NetworkRequest[];
  consoleMessages: ConsoleMessage[];
}

interface TestMetrics {
  totalDuration: number;
  webVitals: {
    lcp?: number;
    fid?: number;
    cls?: number;
    ttfb?: number;
  };
  resourceMetrics: {
    requestCount: number;
    transferSize: number;
    domContentLoaded: number;
    load: number;
  };
  customMetrics: Record<string, number>;
}

class SyntheticAgent {
  private browser: Browser | null = null;
  private location: string;

  constructor(location: string) {
    this.location = location;
  }

  async initialize(): Promise<void> {
    this.browser = await puppeteer.launch({
      headless: true,
      args: [
        '--no-sandbox',
        '--disable-setuid-sandbox',
        '--disable-dev-shm-usage',
        '--disable-gpu'
      ]
    });
  }

  async runTest(test: SyntheticTest): Promise<TestResult> {
    const runId = crypto.randomUUID();
    const startTime = Date.now();

    const page = await this.browser!.newPage();

    const result: TestResult = {
      testId: test.id,
      runId,
      location: this.location,
      startTime,
      endTime: 0,
      status: 'pass',
      metrics: this.initMetrics(),
      stepResults: [],
      screenshots: [],
      networkRequests: [],
      consoleMessages: []
    };

    try {
      // Setup page monitoring
      await this.setupPageMonitoring(page, result);

      // Execute test steps
      for (let i = 0; i < test.script.steps.length; i++) {
        const step = test.script.steps[i];
        const stepResult = await this.executeStep(page, step, i);
        result.stepResults.push(stepResult);

        if (stepResult.status === 'fail') {
          result.status = 'fail';
          result.error = stepResult.error;
          break;
        }
      }

      // Run assertions
      if (result.status === 'pass') {
        for (const assertion of test.script.assertions) {
          const assertionResult = await this.checkAssertion(page, assertion);
          if (!assertionResult.pass) {
            result.status = 'fail';
            result.error = assertionResult.message;
            break;
          }
        }
      }

      // Collect final metrics
      result.metrics = await this.collectMetrics(page);

    } catch (error) {
      result.status = 'error';
      result.error = error.message;

      // Capture screenshot on error
      const screenshot = await this.captureScreenshot(page, 'error');
      result.screenshots.push(screenshot);

    } finally {
      result.endTime = Date.now();
      result.metrics.totalDuration = result.endTime - startTime;

      await page.close();
    }

    return result;
  }

  private async setupPageMonitoring(page: Page, result: TestResult): Promise<void> {
    // Network monitoring
    page.on('request', (request) => {
      result.networkRequests.push({
        url: request.url(),
        method: request.method(),
        resourceType: request.resourceType(),
        startTime: Date.now()
      });
    });

    page.on('response', (response) => {
      const request = result.networkRequests.find(
        r => r.url === response.url() && !r.status
      );
      if (request) {
        request.status = response.status();
        request.endTime = Date.now();
      }
    });

    // Console monitoring
    page.on('console', (msg) => {
      result.consoleMessages.push({
        type: msg.type(),
        text: msg.text(),
        timestamp: Date.now()
      });
    });

    // Inject performance observer
    await page.evaluateOnNewDocument(() => {
      (window as any).__syntheticMetrics = {
        lcp: 0,
        cls: 0
      };

      const lcpObserver = new PerformanceObserver((list) => {
        const entries = list.getEntries();
        const last = entries[entries.length - 1];
        (window as any).__syntheticMetrics.lcp = last.startTime;
      });
      lcpObserver.observe({ type: 'largest-contentful-paint', buffered: true });

      const clsObserver = new PerformanceObserver((list) => {
        for (const entry of list.getEntries() as any[]) {
          if (!entry.hadRecentInput) {
            (window as any).__syntheticMetrics.cls += entry.value;
          }
        }
      });
      clsObserver.observe({ type: 'layout-shift', buffered: true });
    });
  }

  private async executeStep(
    page: Page,
    step: TestStep,
    index: number
  ): Promise<StepResult> {
    const startTime = Date.now();
    const result: StepResult = {
      index,
      action: step.action,
      target: step.target,
      startTime,
      endTime: 0,
      status: 'pass'
    };

    try {
      switch (step.action) {
        case 'navigate':
          await page.goto(step.target!, {
            waitUntil: step.waitUntil || 'load',
            timeout: step.timeout || 30000
          });
          break;

        case 'click':
          await page.waitForSelector(step.target!, {
            timeout: step.timeout || 10000
          });
          await page.click(step.target!);
          break;

        case 'type':
          await page.waitForSelector(step.target!, {
            timeout: step.timeout || 10000
          });
          await page.type(step.target!, step.value!);
          break;

        case 'wait':
          await page.waitForSelector(step.target!, {
            timeout: step.timeout || 10000
          });
          break;

        case 'screenshot':
          const screenshot = await this.captureScreenshot(page, step.target!);
          // Store screenshot...
          break;
      }

    } catch (error) {
      result.status = 'fail';
      result.error = error.message;
    }

    result.endTime = Date.now();
    result.duration = result.endTime - startTime;

    return result;
  }

  private async collectMetrics(page: Page): Promise<TestMetrics> {
    const performanceMetrics = await page.evaluate(() => {
      const timing = performance.getEntriesByType('navigation')[0] as PerformanceNavigationTiming;
      const syntheticMetrics = (window as any).__syntheticMetrics || {};

      return {
        ttfb: timing.responseStart - timing.requestStart,
        domContentLoaded: timing.domContentLoadedEventEnd - timing.startTime,
        load: timing.loadEventEnd - timing.startTime,
        lcp: syntheticMetrics.lcp || 0,
        cls: syntheticMetrics.cls || 0
      };
    });

    const resourceCount = await page.evaluate(() => {
      return performance.getEntriesByType('resource').length;
    });

    return {
      totalDuration: 0, // Set by caller
      webVitals: {
        lcp: performanceMetrics.lcp,
        cls: performanceMetrics.cls,
        ttfb: performanceMetrics.ttfb
      },
      resourceMetrics: {
        requestCount: resourceCount,
        transferSize: 0, // Would need network interception
        domContentLoaded: performanceMetrics.domContentLoaded,
        load: performanceMetrics.load
      },
      customMetrics: {}
    };
  }
}

Combining RUM and Synthetic

Unified Monitoring Strategy

┌─────────────────────────────────────────────────────────────────────────────┐
│                    UNIFIED MONITORING STRATEGY                               │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                    DETECTION MATRIX                                  │    │
│  │                                                                      │    │
│  │  Issue Type        │ Synthetic    │ RUM       │ Combined             │    │
│  │  ──────────────────┼──────────────┼───────────┼──────────────────    │    │
│  │  Site down         │ ✓ Primary    │ Secondary │ Fast detection       │    │
│  │  New deployment    │ ✓ Primary    │ Validation│ Deploy + validate    │    │
│  │  regression        │              │           │                      │    │
│  │  Regional outage   │ ✓ Primary    │ Confirms  │ Agent + user impact  │    │
│  │  Third-party slow  │ Partial      │ ✓ Primary │ RUM sees real impact │    │
│  │  Mobile perf       │ Simulated    │ ✓ Primary │ RUM for reality      │    │
│  │  Edge case bugs    │ Limited      │ ✓ Primary │ User discovery       │    │
│  │  Baseline/trend    │ ✓ Primary    │ Secondary │ Stable synthetic     │    │
│  │  User segments     │ N/A          │ ✓ Only    │ RUM required         │    │
│  │                                                                      │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                    CORRELATION ARCHITECTURE                          │    │
│  │                                                                      │    │
│  │              Synthetic                        RUM                    │    │
│  │              ────────                         ───                    │    │
│  │  ┌───────────────────────────┐   ┌───────────────────────────┐     │    │
│  │  │ Test: Homepage Load       │   │ Page: Homepage            │     │    │
│  │  │ Location: NYC             │   │ Users: 50,000/hr          │     │    │
│  │  │ LCP: 1.8s                 │   │ LCP P75: 2.4s             │     │    │
│  │  │ Trend: Stable             │   │ Trend: Increasing         │     │    │
│  │  └───────────────────────────┘   └───────────────────────────┘     │    │
│  │              │                               │                       │    │
│  │              └───────────────┬───────────────┘                       │    │
│  │                              │                                       │    │
│  │                              ▼                                       │    │
│  │  ┌───────────────────────────────────────────────────────────────┐  │    │
│  │  │                  CORRELATION ENGINE                            │  │    │
│  │  │                                                                │  │    │
│  │  │  Insight: RUM LCP increasing while Synthetic stable           │  │    │
│  │  │  Hypothesis: Issue affects subset of users                    │  │    │
│  │  │                                                                │  │    │
│  │  │  RUM Deep Dive:                                                │  │    │
│  │  │  • Mobile users: LCP 3.8s (+80% vs desktop)                   │  │    │
│  │  │  • Chrome Android: Primary affected segment                   │  │    │
│  │  │  • Resource: hero-image.webp timing anomaly                   │  │    │
│  │  │                                                                │  │    │
│  │  │  Root Cause: WebP decoding slow on mid-tier Android          │  │    │
│  │  │  (Synthetic runs Chrome desktop, doesn't see this)            │  │    │
│  │  │                                                                │  │    │
│  │  └───────────────────────────────────────────────────────────────┘  │    │
│  │                                                                      │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Alert Correlation

// Correlating synthetic and RUM alerts

interface CorrelatedAlert {
  id: string;
  type: 'synthetic' | 'rum' | 'correlated';
  syntheticAlert?: SyntheticAlert;
  rumAlert?: RUMAlert;
  correlation: CorrelationAnalysis;
  recommendedAction: string;
}

interface CorrelationAnalysis {
  syntheticStatus: 'passing' | 'failing' | 'degraded' | 'unknown';
  rumStatus: 'healthy' | 'degraded' | 'critical' | 'unknown';
  agreement: 'aligned' | 'synthetic-only' | 'rum-only' | 'contradicting';
  confidence: number;
  insights: string[];
}

class AlertCorrelator {
  async correlate(
    syntheticAlerts: SyntheticAlert[],
    rumAlerts: RUMAlert[],
    timeWindow: number
  ): Promise<CorrelatedAlert[]> {
    const correlated: CorrelatedAlert[] = [];

    // Group by resource/page
    const syntheticByPage = this.groupByPage(syntheticAlerts);
    const rumByPage = this.groupByPage(rumAlerts);

    const allPages = new Set([
      ...syntheticByPage.keys(),
      ...rumByPage.keys()
    ]);

    for (const page of allPages) {
      const syntheticForPage = syntheticByPage.get(page) || [];
      const rumForPage = rumByPage.get(page) || [];

      const analysis = this.analyze(syntheticForPage, rumForPage);

      if (analysis.agreement !== 'aligned' || analysis.confidence < 0.8) {
        correlated.push({
          id: `correlated-${Date.now()}-${page}`,
          type: 'correlated',
          syntheticAlert: syntheticForPage[0],
          rumAlert: rumForPage[0],
          correlation: analysis,
          recommendedAction: this.getRecommendation(analysis)
        });
      }
    }

    return correlated;
  }

  private analyze(
    syntheticAlerts: SyntheticAlert[],
    rumAlerts: RUMAlert[]
  ): CorrelationAnalysis {
    const syntheticStatus = this.getSyntheticStatus(syntheticAlerts);
    const rumStatus = this.getRUMStatus(rumAlerts);

    let agreement: CorrelationAnalysis['agreement'];
    let insights: string[] = [];

    if (syntheticStatus === 'failing' && rumStatus === 'critical') {
      agreement = 'aligned';
      insights.push('Both synthetic and RUM indicate issues');
    } else if (syntheticStatus === 'passing' && rumStatus === 'healthy') {
      agreement = 'aligned';
      insights.push('Both systems report healthy');
    } else if (syntheticStatus === 'failing' && rumStatus === 'healthy') {
      agreement = 'synthetic-only';
      insights.push('Synthetic failing but RUM healthy');
      insights.push('Possible: Test environment issue or false positive');
      insights.push('Possible: Issue only affects specific test scenario');
    } else if (syntheticStatus === 'passing' && rumStatus === 'critical') {
      agreement = 'rum-only';
      insights.push('RUM degraded but Synthetic passing');
      insights.push('Possible: Issue affects user segments not covered by synthetic');
      insights.push('Possible: Third-party or CDN issue affecting real users');
      insights.push('Possible: Mobile or slow network impact');
    } else {
      agreement = 'contradicting';
      insights.push('Inconclusive - signals don\'t align clearly');
    }

    return {
      syntheticStatus,
      rumStatus,
      agreement,
      confidence: this.calculateConfidence(syntheticAlerts, rumAlerts),
      insights
    };
  }

  private getRecommendation(analysis: CorrelationAnalysis): string {
    switch (analysis.agreement) {
      case 'aligned':
        return analysis.rumStatus === 'critical'
          ? 'Investigate immediately - both systems confirm issue'
          : 'No action needed - systems agree on healthy status';

      case 'synthetic-only':
        return 'Check synthetic test configuration and environment. ' +
               'Verify test is representative of real usage. ' +
               'May be false positive if RUM is healthy.';

      case 'rum-only':
        return 'Investigate RUM data for affected segments. ' +
               'Check mobile users, slow connections, specific regions. ' +
               'Expand synthetic coverage if pattern found.';

      case 'contradicting':
        return 'Manual investigation needed. ' +
               'Check both data sources for anomalies. ' +
               'May indicate monitoring configuration issue.';
    }
  }
}

Deployment Validation Workflow

sequenceDiagram
    participant D as Deployment
    participant S as Synthetic
    participant R as RUM
    participant A as Alerting

    Note over D: Deployment starts

    D->>S: Trigger deployment validation tests
    S->>S: Run critical flow tests (all locations)

    alt Synthetic Tests Pass
        S->>D: Tests pass
        D->>D: Route 10% traffic to new version

        Note over R: RUM starts collecting<br/>from 10% canary

        loop Every 5 minutes for 30 min
            R->>R: Aggregate canary metrics
            R->>A: Compare canary vs baseline

            alt Metrics within tolerance
                A->>D: Continue rollout
            else Metrics degraded
                A->>D: Halt rollout
                D->>D: Rollback to previous
            end
        end

        D->>D: Full rollout (100%)

    else Synthetic Tests Fail
        S->>D: Tests fail
        D->>D: Abort deployment
        S->>A: Alert: Pre-deployment validation failed
    end

Tradeoffs and Decision Framework

Cost Analysis

┌─────────────────────────────────────────────────────────────────────────────┐
│                    COST COMPARISON                                           │
│                                                                              │
│  Scenario: 25M DAU, 200M page views/day                                     │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  RUM COSTS                                                           │    │
│  │                                                                      │    │
│  │  Events per page view: ~10 (vitals, resources, interactions)        │    │
│  │  Total events/day: 2 billion                                        │    │
│  │                                                                      │    │
│  │  At 10% sampling: 200M events/day                                   │    │
│  │                                                                      │    │
│  │  Ingestion cost: ~$2,000/day ($0.01 per 1000 events)               │    │
│  │  Storage cost: ~$500/day (30 day retention)                         │    │
│  │  Processing cost: ~$1,000/day                                       │    │
│  │                                                                      │    │
│  │  Total: ~$3,500/day = $105,000/month                                │    │
│  │                                                                      │    │
│  │  OR: Use managed RUM (Datadog, New Relic)                           │    │
│  │  ~$0.10 per 1000 sessions = $75,000/month at 10% sampling          │    │
│  │                                                                      │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  SYNTHETIC COSTS                                                     │    │
│  │                                                                      │    │
│  │  Tests: 200 scripts                                                 │    │
│  │  Frequency: Every 5 minutes                                         │    │
│  │  Locations: 30                                                      │    │
│  │                                                                      │    │
│  │  Runs per day: 200 × 288 × 30 = 1.7M                               │    │
│  │                                                                      │    │
│  │  Self-hosted agents:                                                │    │
│  │  - 30 servers × $200/month = $6,000/month                          │    │
│  │  - Engineering time: $10,000/month equivalent                       │    │
│  │                                                                      │    │
│  │  OR: Managed synthetic (Datadog, Catchpoint)                        │    │
│  │  - ~$0.10 per run × 1.7M = $170,000/month (high volume)            │    │
│  │  - Typical with volume discount: ~$50,000/month                     │    │
│  │                                                                      │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│  Key Insight:                                                                │
│  - RUM cost scales with traffic (sampling helps)                            │
│  - Synthetic cost scales with test count and frequency                      │
│  - At high scale, self-hosted can be significantly cheaper                  │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Blind Spots

┌─────────────────────────────────────────────────────────────────────────────┐
│                    BLIND SPOTS                                               │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  RUM BLIND SPOTS                                                     │    │
│  │                                                                      │    │
│  │  1. No traffic = No data                                            │    │
│  │     • Off-hours coverage gaps                                       │    │
│  │     • New features with low adoption                                │    │
│  │     • Regional pages with low traffic                               │    │
│  │                                                                      │    │
│  │  2. Sampling gaps                                                    │    │
│  │     • Rare errors may be missed at low sample rates                 │    │
│  │     • Edge cases in long-tail segments                              │    │
│  │                                                                      │    │
│  │  3. Detection latency                                                │    │
│  │     • Need statistical significance (5-15 min at scale)             │    │
│  │     • Users experience issue before detection                       │    │
│  │                                                                      │    │
│  │  4. SDK dependency                                                   │    │
│  │     • SDK crashes = no data                                         │    │
│  │     • Blocked by ad blockers (10-30% of users)                     │    │
│  │     • Can't monitor competitor sites                                │    │
│  │                                                                      │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  SYNTHETIC BLIND SPOTS                                               │    │
│  │                                                                      │    │
│  │  1. Only tests what's scripted                                      │    │
│  │     • Can't discover unknown issues                                 │    │
│  │     • User journeys are diverse and unpredictable                   │    │
│  │                                                                      │    │
│  │  2. Test environment != Real world                                  │    │
│  │     • Datacenter networks are fast/stable                           │    │
│  │     • Limited device diversity                                      │    │
│  │     • No real user behavior variance                                │    │
│  │                                                                      │    │
│  │  3. Third-party behavior differs                                    │    │
│  │     • CDNs may treat bots differently                               │    │
│  │     • A/B tests may not apply to synthetic                         │    │
│  │     • Rate limiting may block synthetic                             │    │
│  │                                                                      │    │
│  │  4. Maintenance burden                                               │    │
│  │     • Scripts break when UI changes                                 │    │
│  │     • Keeping tests up-to-date is ongoing work                     │    │
│  │                                                                      │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Decision Framework

When to prioritize SYNTHETIC:
┌────────────────────────────────────────────────────────────────────────────┐
│ ✓ Pre-launch testing (no users yet)                                        │
│ ✓ Deployment validation (need fast feedback)                               │
│ ✓ Uptime monitoring (continuous heartbeat)                                  │
│ ✓ SLA validation (contractual compliance)                                  │
│ ✓ Competitive benchmarking (can't RUM competitors)                         │
│ ✓ Critical path verification (checkout, login, payment)                    │
│ ✓ Baseline establishment (controlled conditions)                           │
│ ✓ Low traffic pages (not enough RUM volume)                               │
└────────────────────────────────────────────────────────────────────────────┘

When to prioritize RUM:
┌────────────────────────────────────────────────────────────────────────────┐
│ ✓ Understanding real user experience                                       │
│ ✓ Mobile performance analysis (device diversity)                           │
│ ✓ Geographic performance analysis (real ISPs/networks)                    │
│ ✓ User segment analysis (authenticated vs anonymous, premium vs free)     │
│ ✓ Third-party impact assessment (ads, chat widgets, etc.)                 │
│ ✓ Edge case discovery (issues you didn't anticipate)                      │
│ ✓ A/B test performance analysis                                            │
│ ✓ Error tracking at scale                                                  │
│ ✓ User behavior analytics (rage clicks, abandonment)                      │
└────────────────────────────────────────────────────────────────────────────┘

When to use BOTH (most production scenarios):
┌────────────────────────────────────────────────────────────────────────────┐
│ ✓ Synthetic for early detection + RUM for impact assessment               │
│ ✓ Synthetic for baseline + RUM for variance understanding                 │
│ ✓ Synthetic for deployment gates + RUM for post-deploy validation        │
│ ✓ Synthetic for critical paths + RUM for full coverage                    │
└────────────────────────────────────────────────────────────────────────────┘

Summary

RUM and Synthetic Monitoring are not competing approaches—they're complementary lenses on frontend performance. Synthetic gives you control, consistency, and proactive detection. RUM gives you reality, diversity, and user truth.

Key Architectural Insights:

Synthetic detects, RUM confirms - Synthetic catches issues faster (minutes vs hours), but RUM tells you if real users are affected.
Sample RUM intelligently - 100% sampling is rarely needed. Prioritize critical pages, authenticated users, and errors.
Synthetic is not the real world - Datacenter networks, limited browsers, and scripted flows don't capture mobile-on-3G reality.
Correlate alerts across systems - Synthetic-only alerts may be false positives. RUM-only alerts may indicate coverage gaps.
Use Synthetic for deployment gates - Fast feedback before users see new code. RUM validates after rollout.
RUM reveals unknowns - Users find bugs you didn't anticipate. Synthetic only tests what you script.
Cost scales differently - RUM scales with traffic (sample to control). Synthetic scales with test count (prioritize ruthlessly).
Neither is complete alone - Production monitoring requires both for full coverage. Budget accordingly.

The goal isn't choosing one over the other—it's designing a monitoring strategy where each approach covers the other's blind spots. Synthetic gives you confidence that critical paths work. RUM tells you how users actually experience your site. Together, they provide the visibility needed to ship with confidence.

What did you think?