RUM vs Synthetic Monitoring Systems
RUM vs Synthetic Monitoring Systems
Introduction
"Is the site slow?" seems like a simple question. It isn't. Slow for whom? Slow compared to what? Slow right now or on average? Slow for a user in Tokyo on 4G or a developer in the office on gigabit fiber?
This is why production frontend monitoring requires two fundamentally different approaches: Real User Monitoring (RUM) captures what actual users experience in the wild, while Synthetic Monitoring continuously tests known scenarios from controlled environments. Neither approach alone gives you the full picture.
Understanding when each approach excels—and more importantly, when each fails—is essential for building a monitoring strategy that catches regressions, identifies issues, and gives you confidence in your frontend's health.
This deep dive examines both approaches in detail: their architectures, their tradeoffs, their blind spots, and how to combine them for comprehensive frontend observability.
Scale Context
Production monitoring deployment we're examining:
| Metric | RUM | Synthetic |
|---|---|---|
| Data Points per Day | 5B+ | 500K |
| Unique Users Monitored | 25M | N/A |
| Unique Device/Browser Combos | 50,000+ | 20-50 |
| Geographic Coverage | 190 countries | 30-50 locations |
| Test Scenarios | N/A (user-driven) | 200-500 scripts |
| Alert Detection Time (P50) | 5-15 minutes | 2-5 minutes |
| Coverage of Edge Cases | High (emergent) | Low (scripted) |
| Cost per Data Point | ~$0.00001 | ~$0.001 |
| Data Retention | 30-90 days | 1-2 years |
The order of magnitude difference in data volume and cost fundamentally shapes how each system is architected and used.
Fundamental Differences
Conceptual Model
┌─────────────────────────────────────────────────────────────────────────────┐
│ RUM VS SYNTHETIC: CONCEPTUAL MODEL │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ REAL USER MONITORING (RUM) │ │
│ │ │ │
│ │ Data Source: Actual user browsers │ │
│ │ Question: "What did users experience?" │ │
│ │ Trigger: User actions (organic) │ │
│ │ Coverage: Whatever users do │ │
│ │ Environment: Uncontrolled (real world) │ │
│ │ Variance: High (device, network, behavior varies) │ │
│ │ Latency: Reactive (after users experience issues) │ │
│ │ │ │
│ │ ┌────────────────────────────────────────────────────────────────┐ │ │
│ │ │ User A User B User C User D User E │ │ │
│ │ │ iPhone/4G Desktop/ Android/ MacBook/ Desktop/ │ │ │
│ │ │ Mumbai Fiber NYC 3G Brazil WiFi London IE11/DSL │ │ │
│ │ │ LCP: 4.2s LCP: 1.1s LCP: 8.3s LCP: 1.8s LCP: 6.1s │ │ │
│ │ └────────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ Result: Statistical distribution of real experience │ │
│ │ P50: 2.1s P75: 3.4s P95: 7.2s │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ SYNTHETIC MONITORING │ │
│ │ │ │
│ │ Data Source: Controlled test agents │ │
│ │ Question: "Is the site working correctly right now?" │ │
│ │ Trigger: Scheduled (every N minutes) │ │
│ │ Coverage: Pre-defined test scenarios │ │
│ │ Environment: Controlled (consistent agents) │ │
│ │ Variance: Low (by design) │ │
│ │ Latency: Proactive (detect before users) │ │
│ │ │ │
│ │ ┌────────────────────────────────────────────────────────────────┐ │ │
│ │ │ Agent NYC Agent London Agent Tokyo Agent Sydney │ │ │
│ │ │ Chrome/ Chrome/ Chrome/ Chrome/ │ │ │
│ │ │ Cable Cable Cable Cable │ │ │
│ │ │ LCP: 1.2s LCP: 1.1s LCP: 1.3s LCP: 1.4s │ │ │
│ │ │ │ │ │
│ │ │ Same script, same conditions, same browsers │ │ │
│ │ └────────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ Result: Consistent baseline for comparison │ │
│ │ Avg: 1.25s StdDev: 0.1s │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
When Each Approach Shines
| Scenario | RUM | Synthetic | Why |
|---|---|---|---|
| Deployment validation | ❌ | ✓ | Need immediate feedback, not enough RUM data yet |
| Performance regression detection | ✓ | ✓ | Both useful, synthetic faster, RUM more accurate |
| User segment analysis | ✓ | ❌ | Only RUM has real user segments |
| Uptime monitoring | ❌ | ✓ | RUM requires users; no users = no data |
| Third-party impact | ✓ | ⚠️ | RUM sees real third-party; synthetic may get different treatment |
| Geographic performance | ✓ | ⚠️ | RUM sees real ISPs; synthetic sees data center networks |
| Mobile performance | ✓ | ⚠️ | RUM sees real device diversity; synthetic limited |
| Critical flow testing | ⚠️ | ✓ | Synthetic runs flows on schedule; RUM depends on user volume |
| Competitive benchmarking | ❌ | ✓ | Can't RUM competitors' sites |
| Pre-launch testing | ❌ | ✓ | No users before launch |
| A/B test analysis | ✓ | ❌ | Need real user behavior in both variants |
RUM Architecture Deep Dive
Data Collection Architecture
┌─────────────────────────────────────────────────────────────────────────────┐
│ RUM ARCHITECTURE │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ BROWSER (Data Source) │ │
│ │ │ │
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
│ │ │ RUM SDK │ │ │
│ │ │ │ │ │
│ │ │ Collectors: │ │ │
│ │ │ ├── Performance Observer (LCP, FID, CLS, Long Tasks) │ │ │
│ │ │ ├── Navigation Timing API │ │ │
│ │ │ ├── Resource Timing API │ │ │
│ │ │ ├── Error Capture (window.onerror, unhandledrejection) │ │ │
│ │ │ ├── Network Interception (fetch, XHR) │ │ │
│ │ │ └── User Interaction Tracking │ │ │
│ │ │ │ │ │
│ │ │ Enrichment: │ │ │
│ │ │ ├── Device/Browser detection │ │ │
│ │ │ ├── Session management │ │ │
│ │ │ ├── User identity (if authenticated) │ │ │
│ │ │ └── Custom context (feature flags, A/B variant) │ │ │
│ │ │ │ │ │
│ │ │ Transport: │ │ │
│ │ │ ├── Batching (100 events or 10 seconds) │ │ │
│ │ │ ├── Sampling (1-100% configurable) │ │ │
│ │ │ ├── Compression (gzip) │ │ │
│ │ │ └── sendBeacon / fetch with keepalive │ │ │
│ │ │ │ │ │
│ │ └─────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ │ HTTPS POST (batched, compressed) │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ INGESTION LAYER │ │
│ │ │ │
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
│ │ │ EDGE COLLECTORS │ │ │
│ │ │ (Deployed at CDN edge) │ │ │
│ │ │ │ │ │
│ │ │ • Low latency ingest (close to users) │ │ │
│ │ │ • Validation & rate limiting │ │ │
│ │ │ • GeoIP enrichment │ │ │
│ │ │ • Initial filtering (bots, spam) │ │ │
│ │ │ │ │ │
│ │ └─────────────────────────────────────────────────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
│ │ │ MESSAGE QUEUE │ │ │
│ │ │ (Kafka / Kinesis) │ │ │
│ │ │ │ │ │
│ │ │ Partitioned by: │ │ │
│ │ │ • Customer (multi-tenant isolation) │ │ │
│ │ │ • Event type (different processing needs) │ │ │
│ │ │ │ │ │
│ │ └─────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ PROCESSING LAYER │ │
│ │ │ │
│ │ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ │
│ │ │ Stream Processor │ │ Aggregation │ │ Alerting │ │ │
│ │ │ (Flink/Spark) │ │ Engine │ │ Engine │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ • Sessionization │ │ • Time-series │ │ • Threshold │ │ │
│ │ │ • Error grouping │ │ rollups │ │ • Anomaly │ │ │
│ │ │ • Stack symb. │ │ • Dimensional │ │ • Composite │ │ │
│ │ │ • User journeys │ │ aggregation │ │ │ │ │
│ │ └──────────────────┘ └──────────────────┘ └──────────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ STORAGE LAYER │ │
│ │ │ │
│ │ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ │
│ │ │ Time-Series DB │ │ Raw Events │ │ Error Store │ │ │
│ │ │ (Metrics) │ │ (Data Lake) │ │ (Grouped) │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ 30 days hot │ │ 90 days │ │ Indefinite │ │ │
│ │ │ 1 year cold │ │ (sampled cold) │ │ │ │ │
│ │ └──────────────────┘ └──────────────────┘ └──────────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
RUM SDK Implementation
// Production RUM SDK architecture
interface RUMConfig {
appId: string;
apiKey: string;
endpoint: string;
version: string;
// Sampling
sampleRate: number; // 0-1, default 1.0
errorSampleRate: number; // Typically 1.0 (capture all errors)
// Privacy
maskAllInputs: boolean;
maskAllText: boolean;
blockSelector: string[]; // CSS selectors to exclude
// Performance
trackResources: boolean;
trackLongTasks: boolean;
resourceTimingBufferSize: number;
// Session
sessionTimeout: number; // ms
maxSessionDuration: number; // ms
}
class RUMSDK {
private config: RUMConfig;
private session: Session;
private queue: RUMEvent[] = [];
private observers: PerformanceObserver[] = [];
constructor(config: RUMConfig) {
this.config = config;
this.initialize();
}
private initialize(): void {
// Initialize session
this.session = this.initSession();
// Start core collectors
this.initPerformanceObservers();
this.initErrorCapture();
this.initNetworkCapture();
this.initUserInteractionCapture();
// Start transport
this.initTransport();
}
private initPerformanceObservers(): void {
// Largest Contentful Paint
this.observe('largest-contentful-paint', (entries) => {
const last = entries[entries.length - 1] as any;
this.track({
type: 'web-vital',
name: 'LCP',
value: last.startTime,
element: this.describeElement(last.element),
url: last.url
});
});
// First Input Delay / INP
this.observe('first-input', (entries) => {
const entry = entries[0] as PerformanceEventTiming;
this.track({
type: 'web-vital',
name: 'FID',
value: entry.processingStart - entry.startTime,
eventType: entry.name,
target: this.describeElement(entry.target as Element)
});
});
// Interaction to Next Paint
this.observe('event', (entries) => {
for (const entry of entries as PerformanceEventTiming[]) {
if (entry.interactionId && entry.duration > 40) {
this.track({
type: 'interaction',
name: entry.name,
duration: entry.duration,
inputDelay: entry.processingStart - entry.startTime,
processingTime: entry.processingEnd - entry.processingStart,
interactionId: entry.interactionId
});
}
}
}, { durationThreshold: 16 });
// Layout Shift
let clsValue = 0;
this.observe('layout-shift', (entries) => {
for (const entry of entries as LayoutShift[]) {
if (!entry.hadRecentInput) {
clsValue += entry.value;
}
}
});
// Report CLS on visibility change
document.addEventListener('visibilitychange', () => {
if (document.visibilityState === 'hidden') {
this.track({
type: 'web-vital',
name: 'CLS',
value: clsValue
});
}
});
// Long Tasks
if (this.config.trackLongTasks) {
this.observe('longtask', (entries) => {
for (const entry of entries) {
this.track({
type: 'long-task',
duration: entry.duration,
startTime: entry.startTime,
attribution: (entry as any).attribution
});
}
});
}
// Resource Timing
if (this.config.trackResources) {
this.observe('resource', (entries) => {
for (const entry of entries as PerformanceResourceTiming[]) {
// Only track significant resources
if (entry.duration > 50 || entry.transferSize > 50000) {
this.track({
type: 'resource',
name: entry.name,
initiatorType: entry.initiatorType,
duration: entry.duration,
transferSize: entry.transferSize,
decodedBodySize: entry.decodedBodySize
});
}
}
});
}
}
private observe(
type: string,
callback: (entries: PerformanceEntryList) => void,
options?: PerformanceObserverInit
): void {
try {
const observer = new PerformanceObserver((list) => {
callback(list.getEntries());
});
observer.observe({ type, buffered: true, ...options });
this.observers.push(observer);
} catch (e) {
// Observer type not supported
}
}
private initErrorCapture(): void {
// Global errors
window.addEventListener('error', (event) => {
if (this.shouldSampleError()) {
this.track({
type: 'error',
errorType: 'uncaught',
message: event.message,
filename: event.filename,
lineno: event.lineno,
colno: event.colno,
stack: event.error?.stack
});
}
});
// Unhandled promise rejections
window.addEventListener('unhandledrejection', (event) => {
if (this.shouldSampleError()) {
this.track({
type: 'error',
errorType: 'unhandled-rejection',
message: event.reason?.message || String(event.reason),
stack: event.reason?.stack
});
}
});
}
private initNetworkCapture(): void {
// Intercept fetch
const originalFetch = window.fetch;
window.fetch = async (input, init) => {
const url = typeof input === 'string' ? input : input.url;
const method = init?.method || 'GET';
const startTime = performance.now();
try {
const response = await originalFetch(input, init);
const duration = performance.now() - startTime;
this.track({
type: 'request',
method,
url: this.sanitizeUrl(url),
status: response.status,
duration,
ok: response.ok
});
return response;
} catch (error) {
const duration = performance.now() - startTime;
this.track({
type: 'request',
method,
url: this.sanitizeUrl(url),
status: 0,
duration,
ok: false,
error: error.message
});
throw error;
}
};
}
private initUserInteractionCapture(): void {
// Click tracking with rage click detection
let clickBuffer: { timestamp: number; x: number; y: number }[] = [];
document.addEventListener('click', (event) => {
const now = Date.now();
const click = { timestamp: now, x: event.clientX, y: event.clientY };
// Detect rage clicks (3+ clicks in same area within 1 second)
clickBuffer.push(click);
clickBuffer = clickBuffer.filter(c => now - c.timestamp < 1000);
const nearbyClicks = clickBuffer.filter(c =>
Math.abs(c.x - click.x) < 30 && Math.abs(c.y - click.y) < 30
);
if (nearbyClicks.length >= 3) {
this.track({
type: 'frustration',
name: 'rage-click',
target: this.describeElement(event.target as Element),
clickCount: nearbyClicks.length
});
}
// Regular click tracking (sampled)
if (this.shouldSample()) {
this.track({
type: 'interaction',
name: 'click',
target: this.describeElement(event.target as Element)
});
}
}, true);
}
track(event: Partial<RUMEvent>): void {
const fullEvent: RUMEvent = {
...event,
timestamp: Date.now(),
sessionId: this.session.id,
pageUrl: window.location.href,
pageTitle: document.title,
viewport: {
width: window.innerWidth,
height: window.innerHeight
},
connection: this.getConnectionInfo(),
appVersion: this.config.version
} as RUMEvent;
this.queue.push(fullEvent);
}
private shouldSample(): boolean {
return Math.random() < this.config.sampleRate;
}
private shouldSampleError(): boolean {
return Math.random() < this.config.errorSampleRate;
}
}
RUM Sampling Strategies
// Intelligent sampling for RUM
interface SamplingStrategy {
type: 'fixed' | 'adaptive' | 'priority';
baseRate: number;
rules?: SamplingRule[];
}
interface SamplingRule {
condition: (event: RUMEvent, session: Session) => boolean;
rate: number;
priority?: number;
}
class AdaptiveSampler {
private baseRate: number;
private rules: SamplingRule[];
private sessionDecisions = new Map<string, boolean>();
constructor(strategy: SamplingStrategy) {
this.baseRate = strategy.baseRate;
this.rules = strategy.rules || [];
}
shouldSample(event: RUMEvent, session: Session): boolean {
// Session-level sampling: once decided, stick with it
if (event.type === 'page-view') {
const decision = this.decideSession(session);
this.sessionDecisions.set(session.id, decision);
return decision;
}
// Check session decision
const sessionDecision = this.sessionDecisions.get(session.id);
if (sessionDecision === false) {
// Session not sampled, but errors always go through
if (event.type === 'error') {
return true;
}
return false;
}
// Event-level rules
for (const rule of this.rules) {
if (rule.condition(event, session)) {
return Math.random() < rule.rate;
}
}
return true; // Session was sampled, include event
}
private decideSession(session: Session): boolean {
// Priority rules that override base rate
const priorityRules = this.rules
.filter(r => r.priority !== undefined)
.sort((a, b) => (b.priority || 0) - (a.priority || 0));
for (const rule of priorityRules) {
// Check first event or session attributes
if (this.sessionMatchesRule(session, rule)) {
return Math.random() < rule.rate;
}
}
return Math.random() < this.baseRate;
}
private sessionMatchesRule(session: Session, rule: SamplingRule): boolean {
// Simplified session matching
return rule.condition({} as RUMEvent, session);
}
}
// Example sampling rules
const samplingStrategy: SamplingStrategy = {
type: 'adaptive',
baseRate: 0.1, // 10% base sampling
rules: [
// Always sample authenticated users (more valuable)
{
condition: (_, session) => !!session.userId,
rate: 0.5, // 50% of auth users
priority: 10
},
// Always sample critical pages
{
condition: (event) =>
event.pageUrl?.includes('/checkout') ||
event.pageUrl?.includes('/payment'),
rate: 1.0, // 100% of checkout/payment
priority: 20
},
// Reduce sampling for high-volume pages
{
condition: (event) =>
event.pageUrl?.includes('/search') ||
event.pageUrl?.includes('/browse'),
rate: 0.01, // 1% of search/browse
priority: 5
},
// Always capture errors
{
condition: (event) => event.type === 'error',
rate: 1.0
},
// Always capture poor performance
{
condition: (event) =>
event.type === 'web-vital' &&
event.name === 'LCP' &&
event.value > 4000,
rate: 1.0
},
// Reduce sampling on slow connections (high volume, biased data)
{
condition: (_, session) =>
session.device?.connection === 'slow-2g',
rate: 0.01,
priority: 1
}
]
};
Synthetic Monitoring Architecture Deep Dive
System Architecture
┌─────────────────────────────────────────────────────────────────────────────┐
│ SYNTHETIC MONITORING ARCHITECTURE │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ CONTROL PLANE │ │
│ │ │ │
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
│ │ │ SCHEDULER │ │ │
│ │ │ │ │ │
│ │ │ Manages test execution: │ │ │
│ │ │ • Cron-like scheduling (every 1, 5, 15 min, etc.) │ │ │
│ │ │ • Geographic distribution │ │ │
│ │ │ • Test prioritization │ │ │
│ │ │ • Rate limiting per target │ │ │
│ │ │ │ │ │
│ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │
│ │ │ │ Test Queue │ │ │ │
│ │ │ │ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ │ │ │ │
│ │ │ │ │NYC-1│ │LON-1│ │TYO-1│ │NYC-2│ │SYD-1│ │... │ │ │ │ │
│ │ │ │ └─────┘ └─────┘ └─────┘ └─────┘ └─────┘ └─────┘ │ │ │ │
│ │ │ └──────────────────────────────────────────────────────┘ │ │ │
│ │ │ │ │ │
│ │ └─────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
│ │ │ TEST CONFIGURATION │ │ │
│ │ │ │ │ │
│ │ │ • Script repository │ │ │
│ │ │ • Environment variables / secrets │ │ │
│ │ │ • Threshold definitions │ │ │
│ │ │ • Alert configurations │ │ │
│ │ │ │ │ │
│ │ └─────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ Test Assignment │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ AGENT LAYER │ │
│ │ (Globally Distributed) │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ NYC Agent │ │ London Agent│ │ Tokyo Agent │ ... │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ ┌─────────┐ │ │ ┌─────────┐ │ │ ┌─────────┐ │ │ │
│ │ │ │ Chrome │ │ │ │ Chrome │ │ │ │ Chrome │ │ │ │
│ │ │ │ Headless│ │ │ │ Headless│ │ │ │ Headless│ │ │ │
│ │ │ └─────────┘ │ │ └─────────┘ │ │ └─────────┘ │ │ │
│ │ │ ┌─────────┐ │ │ ┌─────────┐ │ │ ┌─────────┐ │ │ │
│ │ │ │ Firefox │ │ │ │ Firefox │ │ │ │ Firefox │ │ │ │
│ │ │ │ Headless│ │ │ │ Headless│ │ │ │ Headless│ │ │ │
│ │ │ └─────────┘ │ │ └─────────┘ │ │ └─────────┘ │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ Network: │ │ Network: │ │ Network: │ │ │
│ │ │ Cable sim │ │ Cable sim │ │ Cable sim │ │ │
│ │ │ 3G sim │ │ 3G sim │ │ 3G sim │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ Test Results │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ RESULTS PROCESSING │ │
│ │ │ │
│ │ • Result validation │ │
│ │ • Metric extraction │ │
│ │ • Screenshot/video storage │ │
│ │ • Alert evaluation │ │
│ │ • Trend analysis │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Test Script Architecture
// Synthetic test script structure
interface SyntheticTest {
id: string;
name: string;
type: 'browser' | 'api' | 'multi-step';
frequency: number; // minutes
locations: string[];
timeout: number; // ms
retries: number;
alerting: AlertConfig;
script: TestScript;
}
interface TestScript {
steps: TestStep[];
assertions: Assertion[];
customMetrics?: CustomMetric[];
}
interface TestStep {
action: 'navigate' | 'click' | 'type' | 'wait' | 'screenshot' | 'custom';
target?: string; // CSS selector or URL
value?: string;
timeout?: number;
waitUntil?: 'load' | 'domcontentloaded' | 'networkidle';
}
// Example: E-commerce checkout flow test
const checkoutFlowTest: SyntheticTest = {
id: 'checkout-flow-e2e',
name: 'Checkout Flow - Complete Purchase',
type: 'browser',
frequency: 5, // Every 5 minutes
locations: ['us-east', 'us-west', 'eu-west', 'ap-northeast'],
timeout: 60000,
retries: 1,
alerting: {
onFailure: { channels: ['pagerduty', 'slack-critical'] },
onDegradation: { channels: ['slack-frontend'] },
thresholds: {
totalDuration: { warning: 15000, critical: 30000 },
stepDuration: { warning: 5000, critical: 10000 }
}
},
script: {
steps: [
{
action: 'navigate',
target: 'https://shop.example.com',
waitUntil: 'networkidle'
},
{
action: 'click',
target: '[data-testid="product-card"]:first-child',
timeout: 5000
},
{
action: 'wait',
target: '[data-testid="product-detail"]',
timeout: 5000
},
{
action: 'click',
target: '[data-testid="add-to-cart"]'
},
{
action: 'wait',
target: '[data-testid="cart-indicator"]:not(:empty)',
timeout: 3000
},
{
action: 'navigate',
target: 'https://shop.example.com/cart'
},
{
action: 'click',
target: '[data-testid="checkout-button"]'
},
{
action: 'wait',
target: '[data-testid="checkout-form"]',
timeout: 5000
},
{
action: 'screenshot',
target: 'checkout-form-loaded'
}
],
assertions: [
{
type: 'elementPresent',
target: '[data-testid="checkout-form"]'
},
{
type: 'textContains',
target: '[data-testid="cart-total"]',
value: '$'
}
],
customMetrics: [
{
name: 'time_to_checkout_form',
type: 'timing',
startMark: 'navigation_start',
endMark: 'checkout_form_visible'
}
]
}
};
Agent Implementation
// Synthetic monitoring agent
import puppeteer, { Browser, Page } from 'puppeteer';
interface TestResult {
testId: string;
runId: string;
location: string;
startTime: number;
endTime: number;
status: 'pass' | 'fail' | 'error';
error?: string;
metrics: TestMetrics;
stepResults: StepResult[];
screenshots: Screenshot[];
networkRequests: NetworkRequest[];
consoleMessages: ConsoleMessage[];
}
interface TestMetrics {
totalDuration: number;
webVitals: {
lcp?: number;
fid?: number;
cls?: number;
ttfb?: number;
};
resourceMetrics: {
requestCount: number;
transferSize: number;
domContentLoaded: number;
load: number;
};
customMetrics: Record<string, number>;
}
class SyntheticAgent {
private browser: Browser | null = null;
private location: string;
constructor(location: string) {
this.location = location;
}
async initialize(): Promise<void> {
this.browser = await puppeteer.launch({
headless: true,
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-gpu'
]
});
}
async runTest(test: SyntheticTest): Promise<TestResult> {
const runId = crypto.randomUUID();
const startTime = Date.now();
const page = await this.browser!.newPage();
const result: TestResult = {
testId: test.id,
runId,
location: this.location,
startTime,
endTime: 0,
status: 'pass',
metrics: this.initMetrics(),
stepResults: [],
screenshots: [],
networkRequests: [],
consoleMessages: []
};
try {
// Setup page monitoring
await this.setupPageMonitoring(page, result);
// Execute test steps
for (let i = 0; i < test.script.steps.length; i++) {
const step = test.script.steps[i];
const stepResult = await this.executeStep(page, step, i);
result.stepResults.push(stepResult);
if (stepResult.status === 'fail') {
result.status = 'fail';
result.error = stepResult.error;
break;
}
}
// Run assertions
if (result.status === 'pass') {
for (const assertion of test.script.assertions) {
const assertionResult = await this.checkAssertion(page, assertion);
if (!assertionResult.pass) {
result.status = 'fail';
result.error = assertionResult.message;
break;
}
}
}
// Collect final metrics
result.metrics = await this.collectMetrics(page);
} catch (error) {
result.status = 'error';
result.error = error.message;
// Capture screenshot on error
const screenshot = await this.captureScreenshot(page, 'error');
result.screenshots.push(screenshot);
} finally {
result.endTime = Date.now();
result.metrics.totalDuration = result.endTime - startTime;
await page.close();
}
return result;
}
private async setupPageMonitoring(page: Page, result: TestResult): Promise<void> {
// Network monitoring
page.on('request', (request) => {
result.networkRequests.push({
url: request.url(),
method: request.method(),
resourceType: request.resourceType(),
startTime: Date.now()
});
});
page.on('response', (response) => {
const request = result.networkRequests.find(
r => r.url === response.url() && !r.status
);
if (request) {
request.status = response.status();
request.endTime = Date.now();
}
});
// Console monitoring
page.on('console', (msg) => {
result.consoleMessages.push({
type: msg.type(),
text: msg.text(),
timestamp: Date.now()
});
});
// Inject performance observer
await page.evaluateOnNewDocument(() => {
(window as any).__syntheticMetrics = {
lcp: 0,
cls: 0
};
const lcpObserver = new PerformanceObserver((list) => {
const entries = list.getEntries();
const last = entries[entries.length - 1];
(window as any).__syntheticMetrics.lcp = last.startTime;
});
lcpObserver.observe({ type: 'largest-contentful-paint', buffered: true });
const clsObserver = new PerformanceObserver((list) => {
for (const entry of list.getEntries() as any[]) {
if (!entry.hadRecentInput) {
(window as any).__syntheticMetrics.cls += entry.value;
}
}
});
clsObserver.observe({ type: 'layout-shift', buffered: true });
});
}
private async executeStep(
page: Page,
step: TestStep,
index: number
): Promise<StepResult> {
const startTime = Date.now();
const result: StepResult = {
index,
action: step.action,
target: step.target,
startTime,
endTime: 0,
status: 'pass'
};
try {
switch (step.action) {
case 'navigate':
await page.goto(step.target!, {
waitUntil: step.waitUntil || 'load',
timeout: step.timeout || 30000
});
break;
case 'click':
await page.waitForSelector(step.target!, {
timeout: step.timeout || 10000
});
await page.click(step.target!);
break;
case 'type':
await page.waitForSelector(step.target!, {
timeout: step.timeout || 10000
});
await page.type(step.target!, step.value!);
break;
case 'wait':
await page.waitForSelector(step.target!, {
timeout: step.timeout || 10000
});
break;
case 'screenshot':
const screenshot = await this.captureScreenshot(page, step.target!);
// Store screenshot...
break;
}
} catch (error) {
result.status = 'fail';
result.error = error.message;
}
result.endTime = Date.now();
result.duration = result.endTime - startTime;
return result;
}
private async collectMetrics(page: Page): Promise<TestMetrics> {
const performanceMetrics = await page.evaluate(() => {
const timing = performance.getEntriesByType('navigation')[0] as PerformanceNavigationTiming;
const syntheticMetrics = (window as any).__syntheticMetrics || {};
return {
ttfb: timing.responseStart - timing.requestStart,
domContentLoaded: timing.domContentLoadedEventEnd - timing.startTime,
load: timing.loadEventEnd - timing.startTime,
lcp: syntheticMetrics.lcp || 0,
cls: syntheticMetrics.cls || 0
};
});
const resourceCount = await page.evaluate(() => {
return performance.getEntriesByType('resource').length;
});
return {
totalDuration: 0, // Set by caller
webVitals: {
lcp: performanceMetrics.lcp,
cls: performanceMetrics.cls,
ttfb: performanceMetrics.ttfb
},
resourceMetrics: {
requestCount: resourceCount,
transferSize: 0, // Would need network interception
domContentLoaded: performanceMetrics.domContentLoaded,
load: performanceMetrics.load
},
customMetrics: {}
};
}
}
Combining RUM and Synthetic
Unified Monitoring Strategy
┌─────────────────────────────────────────────────────────────────────────────┐
│ UNIFIED MONITORING STRATEGY │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ DETECTION MATRIX │ │
│ │ │ │
│ │ Issue Type │ Synthetic │ RUM │ Combined │ │
│ │ ──────────────────┼──────────────┼───────────┼────────────────── │ │
│ │ Site down │ ✓ Primary │ Secondary │ Fast detection │ │
│ │ New deployment │ ✓ Primary │ Validation│ Deploy + validate │ │
│ │ regression │ │ │ │ │
│ │ Regional outage │ ✓ Primary │ Confirms │ Agent + user impact │ │
│ │ Third-party slow │ Partial │ ✓ Primary │ RUM sees real impact │ │
│ │ Mobile perf │ Simulated │ ✓ Primary │ RUM for reality │ │
│ │ Edge case bugs │ Limited │ ✓ Primary │ User discovery │ │
│ │ Baseline/trend │ ✓ Primary │ Secondary │ Stable synthetic │ │
│ │ User segments │ N/A │ ✓ Only │ RUM required │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ CORRELATION ARCHITECTURE │ │
│ │ │ │
│ │ Synthetic RUM │ │
│ │ ──────── ─── │ │
│ │ ┌───────────────────────────┐ ┌───────────────────────────┐ │ │
│ │ │ Test: Homepage Load │ │ Page: Homepage │ │ │
│ │ │ Location: NYC │ │ Users: 50,000/hr │ │ │
│ │ │ LCP: 1.8s │ │ LCP P75: 2.4s │ │ │
│ │ │ Trend: Stable │ │ Trend: Increasing │ │ │
│ │ └───────────────────────────┘ └───────────────────────────┘ │ │
│ │ │ │ │ │
│ │ └───────────────┬───────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌───────────────────────────────────────────────────────────────┐ │ │
│ │ │ CORRELATION ENGINE │ │ │
│ │ │ │ │ │
│ │ │ Insight: RUM LCP increasing while Synthetic stable │ │ │
│ │ │ Hypothesis: Issue affects subset of users │ │ │
│ │ │ │ │ │
│ │ │ RUM Deep Dive: │ │ │
│ │ │ • Mobile users: LCP 3.8s (+80% vs desktop) │ │ │
│ │ │ • Chrome Android: Primary affected segment │ │ │
│ │ │ • Resource: hero-image.webp timing anomaly │ │ │
│ │ │ │ │ │
│ │ │ Root Cause: WebP decoding slow on mid-tier Android │ │ │
│ │ │ (Synthetic runs Chrome desktop, doesn't see this) │ │ │
│ │ │ │ │ │
│ │ └───────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Alert Correlation
// Correlating synthetic and RUM alerts
interface CorrelatedAlert {
id: string;
type: 'synthetic' | 'rum' | 'correlated';
syntheticAlert?: SyntheticAlert;
rumAlert?: RUMAlert;
correlation: CorrelationAnalysis;
recommendedAction: string;
}
interface CorrelationAnalysis {
syntheticStatus: 'passing' | 'failing' | 'degraded' | 'unknown';
rumStatus: 'healthy' | 'degraded' | 'critical' | 'unknown';
agreement: 'aligned' | 'synthetic-only' | 'rum-only' | 'contradicting';
confidence: number;
insights: string[];
}
class AlertCorrelator {
async correlate(
syntheticAlerts: SyntheticAlert[],
rumAlerts: RUMAlert[],
timeWindow: number
): Promise<CorrelatedAlert[]> {
const correlated: CorrelatedAlert[] = [];
// Group by resource/page
const syntheticByPage = this.groupByPage(syntheticAlerts);
const rumByPage = this.groupByPage(rumAlerts);
const allPages = new Set([
...syntheticByPage.keys(),
...rumByPage.keys()
]);
for (const page of allPages) {
const syntheticForPage = syntheticByPage.get(page) || [];
const rumForPage = rumByPage.get(page) || [];
const analysis = this.analyze(syntheticForPage, rumForPage);
if (analysis.agreement !== 'aligned' || analysis.confidence < 0.8) {
correlated.push({
id: `correlated-${Date.now()}-${page}`,
type: 'correlated',
syntheticAlert: syntheticForPage[0],
rumAlert: rumForPage[0],
correlation: analysis,
recommendedAction: this.getRecommendation(analysis)
});
}
}
return correlated;
}
private analyze(
syntheticAlerts: SyntheticAlert[],
rumAlerts: RUMAlert[]
): CorrelationAnalysis {
const syntheticStatus = this.getSyntheticStatus(syntheticAlerts);
const rumStatus = this.getRUMStatus(rumAlerts);
let agreement: CorrelationAnalysis['agreement'];
let insights: string[] = [];
if (syntheticStatus === 'failing' && rumStatus === 'critical') {
agreement = 'aligned';
insights.push('Both synthetic and RUM indicate issues');
} else if (syntheticStatus === 'passing' && rumStatus === 'healthy') {
agreement = 'aligned';
insights.push('Both systems report healthy');
} else if (syntheticStatus === 'failing' && rumStatus === 'healthy') {
agreement = 'synthetic-only';
insights.push('Synthetic failing but RUM healthy');
insights.push('Possible: Test environment issue or false positive');
insights.push('Possible: Issue only affects specific test scenario');
} else if (syntheticStatus === 'passing' && rumStatus === 'critical') {
agreement = 'rum-only';
insights.push('RUM degraded but Synthetic passing');
insights.push('Possible: Issue affects user segments not covered by synthetic');
insights.push('Possible: Third-party or CDN issue affecting real users');
insights.push('Possible: Mobile or slow network impact');
} else {
agreement = 'contradicting';
insights.push('Inconclusive - signals don\'t align clearly');
}
return {
syntheticStatus,
rumStatus,
agreement,
confidence: this.calculateConfidence(syntheticAlerts, rumAlerts),
insights
};
}
private getRecommendation(analysis: CorrelationAnalysis): string {
switch (analysis.agreement) {
case 'aligned':
return analysis.rumStatus === 'critical'
? 'Investigate immediately - both systems confirm issue'
: 'No action needed - systems agree on healthy status';
case 'synthetic-only':
return 'Check synthetic test configuration and environment. ' +
'Verify test is representative of real usage. ' +
'May be false positive if RUM is healthy.';
case 'rum-only':
return 'Investigate RUM data for affected segments. ' +
'Check mobile users, slow connections, specific regions. ' +
'Expand synthetic coverage if pattern found.';
case 'contradicting':
return 'Manual investigation needed. ' +
'Check both data sources for anomalies. ' +
'May indicate monitoring configuration issue.';
}
}
}
Deployment Validation Workflow
sequenceDiagram
participant D as Deployment
participant S as Synthetic
participant R as RUM
participant A as Alerting
Note over D: Deployment starts
D->>S: Trigger deployment validation tests
S->>S: Run critical flow tests (all locations)
alt Synthetic Tests Pass
S->>D: Tests pass
D->>D: Route 10% traffic to new version
Note over R: RUM starts collecting<br/>from 10% canary
loop Every 5 minutes for 30 min
R->>R: Aggregate canary metrics
R->>A: Compare canary vs baseline
alt Metrics within tolerance
A->>D: Continue rollout
else Metrics degraded
A->>D: Halt rollout
D->>D: Rollback to previous
end
end
D->>D: Full rollout (100%)
else Synthetic Tests Fail
S->>D: Tests fail
D->>D: Abort deployment
S->>A: Alert: Pre-deployment validation failed
end
Tradeoffs and Decision Framework
Cost Analysis
┌─────────────────────────────────────────────────────────────────────────────┐
│ COST COMPARISON │
│ │
│ Scenario: 25M DAU, 200M page views/day │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ RUM COSTS │ │
│ │ │ │
│ │ Events per page view: ~10 (vitals, resources, interactions) │ │
│ │ Total events/day: 2 billion │ │
│ │ │ │
│ │ At 10% sampling: 200M events/day │ │
│ │ │ │
│ │ Ingestion cost: ~$2,000/day ($0.01 per 1000 events) │ │
│ │ Storage cost: ~$500/day (30 day retention) │ │
│ │ Processing cost: ~$1,000/day │ │
│ │ │ │
│ │ Total: ~$3,500/day = $105,000/month │ │
│ │ │ │
│ │ OR: Use managed RUM (Datadog, New Relic) │ │
│ │ ~$0.10 per 1000 sessions = $75,000/month at 10% sampling │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ SYNTHETIC COSTS │ │
│ │ │ │
│ │ Tests: 200 scripts │ │
│ │ Frequency: Every 5 minutes │ │
│ │ Locations: 30 │ │
│ │ │ │
│ │ Runs per day: 200 × 288 × 30 = 1.7M │ │
│ │ │ │
│ │ Self-hosted agents: │ │
│ │ - 30 servers × $200/month = $6,000/month │ │
│ │ - Engineering time: $10,000/month equivalent │ │
│ │ │ │
│ │ OR: Managed synthetic (Datadog, Catchpoint) │ │
│ │ - ~$0.10 per run × 1.7M = $170,000/month (high volume) │ │
│ │ - Typical with volume discount: ~$50,000/month │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ Key Insight: │
│ - RUM cost scales with traffic (sampling helps) │
│ - Synthetic cost scales with test count and frequency │
│ - At high scale, self-hosted can be significantly cheaper │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Blind Spots
┌─────────────────────────────────────────────────────────────────────────────┐
│ BLIND SPOTS │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ RUM BLIND SPOTS │ │
│ │ │ │
│ │ 1. No traffic = No data │ │
│ │ • Off-hours coverage gaps │ │
│ │ • New features with low adoption │ │
│ │ • Regional pages with low traffic │ │
│ │ │ │
│ │ 2. Sampling gaps │ │
│ │ • Rare errors may be missed at low sample rates │ │
│ │ • Edge cases in long-tail segments │ │
│ │ │ │
│ │ 3. Detection latency │ │
│ │ • Need statistical significance (5-15 min at scale) │ │
│ │ • Users experience issue before detection │ │
│ │ │ │
│ │ 4. SDK dependency │ │
│ │ • SDK crashes = no data │ │
│ │ • Blocked by ad blockers (10-30% of users) │ │
│ │ • Can't monitor competitor sites │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ SYNTHETIC BLIND SPOTS │ │
│ │ │ │
│ │ 1. Only tests what's scripted │ │
│ │ • Can't discover unknown issues │ │
│ │ • User journeys are diverse and unpredictable │ │
│ │ │ │
│ │ 2. Test environment != Real world │ │
│ │ • Datacenter networks are fast/stable │ │
│ │ • Limited device diversity │ │
│ │ • No real user behavior variance │ │
│ │ │ │
│ │ 3. Third-party behavior differs │ │
│ │ • CDNs may treat bots differently │ │
│ │ • A/B tests may not apply to synthetic │ │
│ │ • Rate limiting may block synthetic │ │
│ │ │ │
│ │ 4. Maintenance burden │ │
│ │ • Scripts break when UI changes │ │
│ │ • Keeping tests up-to-date is ongoing work │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Decision Framework
When to prioritize SYNTHETIC:
┌────────────────────────────────────────────────────────────────────────────┐
│ ✓ Pre-launch testing (no users yet) │
│ ✓ Deployment validation (need fast feedback) │
│ ✓ Uptime monitoring (continuous heartbeat) │
│ ✓ SLA validation (contractual compliance) │
│ ✓ Competitive benchmarking (can't RUM competitors) │
│ ✓ Critical path verification (checkout, login, payment) │
│ ✓ Baseline establishment (controlled conditions) │
│ ✓ Low traffic pages (not enough RUM volume) │
└────────────────────────────────────────────────────────────────────────────┘
When to prioritize RUM:
┌────────────────────────────────────────────────────────────────────────────┐
│ ✓ Understanding real user experience │
│ ✓ Mobile performance analysis (device diversity) │
│ ✓ Geographic performance analysis (real ISPs/networks) │
│ ✓ User segment analysis (authenticated vs anonymous, premium vs free) │
│ ✓ Third-party impact assessment (ads, chat widgets, etc.) │
│ ✓ Edge case discovery (issues you didn't anticipate) │
│ ✓ A/B test performance analysis │
│ ✓ Error tracking at scale │
│ ✓ User behavior analytics (rage clicks, abandonment) │
└────────────────────────────────────────────────────────────────────────────┘
When to use BOTH (most production scenarios):
┌────────────────────────────────────────────────────────────────────────────┐
│ ✓ Synthetic for early detection + RUM for impact assessment │
│ ✓ Synthetic for baseline + RUM for variance understanding │
│ ✓ Synthetic for deployment gates + RUM for post-deploy validation │
│ ✓ Synthetic for critical paths + RUM for full coverage │
└────────────────────────────────────────────────────────────────────────────┘
Summary
RUM and Synthetic Monitoring are not competing approaches—they're complementary lenses on frontend performance. Synthetic gives you control, consistency, and proactive detection. RUM gives you reality, diversity, and user truth.
Key Architectural Insights:
-
Synthetic detects, RUM confirms - Synthetic catches issues faster (minutes vs hours), but RUM tells you if real users are affected.
-
Sample RUM intelligently - 100% sampling is rarely needed. Prioritize critical pages, authenticated users, and errors.
-
Synthetic is not the real world - Datacenter networks, limited browsers, and scripted flows don't capture mobile-on-3G reality.
-
Correlate alerts across systems - Synthetic-only alerts may be false positives. RUM-only alerts may indicate coverage gaps.
-
Use Synthetic for deployment gates - Fast feedback before users see new code. RUM validates after rollout.
-
RUM reveals unknowns - Users find bugs you didn't anticipate. Synthetic only tests what you script.
-
Cost scales differently - RUM scales with traffic (sample to control). Synthetic scales with test count (prioritize ruthlessly).
-
Neither is complete alone - Production monitoring requires both for full coverage. Budget accordingly.
The goal isn't choosing one over the other—it's designing a monitoring strategy where each approach covers the other's blind spots. Synthetic gives you confidence that critical paths work. RUM tells you how users actually experience your site. Together, they provide the visibility needed to ship with confidence.
What did you think?