Back to Blog

Designing an Analytics Pipeline for Frontend Events Without Selling Your Users Out

First-party event collection, edge-based analytics with ClickHouse or Tinybird, privacy-preserving aggregation, and building analytics that respect users while delivering the insights product teams need.


The Problem with Third-Party Analytics

Google Analytics, Mixpanel, Amplitude, Segment—they all have the same fundamental issue: your users' behavioral data lives on someone else's servers, governed by someone else's privacy policy, monetized by someone else's business model.

┌─────────────────────────────────────────────────────────────────┐
│                Third-Party Analytics Data Flow                   │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│   Your User                                                      │
│      │                                                           │
│      │ Clicks, scrolls, views                                    │
│      ▼                                                           │
│   Your App ──────────────────────────┐                           │
│      │                               │                           │
│      │ analytics.track()             │ (you think)               │
│      ▼                               ▼                           │
│   ┌─────────────────────────────────────────────────────┐       │
│   │            Third-Party Analytics Server              │       │
│   ├─────────────────────────────────────────────────────┤       │
│   │                                                      │       │
│   │  • User's full browsing history (cross-site)        │       │
│   │  • Device fingerprint                                │       │
│   │  • IP-based location                                 │       │
│   │  • Behavioral profile                                │       │
│   │  • Cross-referenced with other customers' data       │       │
│   │                                                      │       │
│   │  Sold to: advertisers, data brokers, "partners"      │       │
│   │                                                      │       │
│   └─────────────────────────────────────────────────────┘       │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

The real costs:

  1. Legal liability — GDPR, CCPA, ePrivacy violations
  2. User trust erosion — Privacy-conscious users bounce
  3. Ad blockers — 30-40% of traffic blocked
  4. Data dependency — Vendor holds your insights hostage
  5. Ethical burden — Complicity in surveillance capitalism

First-Party Architecture

The alternative: own your data pipeline end-to-end.

┌─────────────────────────────────────────────────────────────────┐
│                 First-Party Analytics Architecture               │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│   Browser                                                        │
│   ┌────────────────────────────────────────────┐                │
│   │  Event Collector SDK                        │                │
│   │  • Batching                                 │                │
│   │  • Compression                              │                │
│   │  • Retry logic                              │                │
│   │  • PII stripping                            │                │
│   └───────────────────┬────────────────────────┘                │
│                       │                                          │
│                       │ POST /events (your domain)               │
│                       ▼                                          │
│   Edge (Cloudflare/Vercel/Fastly)                               │
│   ┌────────────────────────────────────────────┐                │
│   │  Edge Function                              │                │
│   │  • Rate limiting                            │                │
│   │  • IP anonymization                         │                │
│   │  • Geo enrichment                           │                │
│   │  • Session stitching                        │                │
│   │  • Bot filtering                            │                │
│   └───────────────────┬────────────────────────┘                │
│                       │                                          │
│                       │ Kafka / HTTP                             │
│                       ▼                                          │
│   Ingest Layer                                                   │
│   ┌────────────────────────────────────────────┐                │
│   │  Stream Processor                           │                │
│   │  • Schema validation                        │                │
│   │  • Deduplication                            │                │
│   │  • Sessionization                           │                │
│   │  • User ID resolution                       │                │
│   └───────────────────┬────────────────────────┘                │
│                       │                                          │
│                       ▼                                          │
│   Storage (ClickHouse / Tinybird / BigQuery)                    │
│   ┌────────────────────────────────────────────┐                │
│   │  OLAP Database                              │                │
│   │  • Columnar storage                         │                │
│   │  • Time-series optimized                    │                │
│   │  • Real-time aggregations                   │                │
│   └───────────────────┬────────────────────────┘                │
│                       │                                          │
│                       ▼                                          │
│   Query Layer                                                    │
│   ┌────────────────────────────────────────────┐                │
│   │  API / Dashboard                            │                │
│   │  • Aggregated insights only                 │                │
│   │  • No individual user data exposed          │                │
│   │  • Differential privacy queries             │                │
│   └────────────────────────────────────────────┘                │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Client-Side Event Collection

Build a lightweight SDK that respects privacy by design:

// analytics-sdk.ts

interface AnalyticsConfig {
  endpoint: string;
  batchSize: number;
  flushInterval: number;
  maxRetries: number;
  sessionTimeout: number;
  respectDNT: boolean;
  anonymizeIP: boolean;
  hashUserIds: boolean;
}

interface Event {
  name: string;
  properties: Record<string, unknown>;
  timestamp: number;
  sessionId: string;
  pageUrl: string;
  referrer: string;
}

interface QueuedEvent extends Event {
  id: string;
  retries: number;
}

const defaultConfig: AnalyticsConfig = {
  endpoint: '/api/events',
  batchSize: 10,
  flushInterval: 5000,
  maxRetries: 3,
  sessionTimeout: 30 * 60 * 1000, // 30 minutes
  respectDNT: true,
  anonymizeIP: true,
  hashUserIds: true,
};

export class Analytics {
  private config: AnalyticsConfig;
  private queue: QueuedEvent[] = [];
  private flushTimer: ReturnType<typeof setInterval> | null = null;
  private sessionId: string;
  private lastActivity: number;

  constructor(config: Partial<AnalyticsConfig> = {}) {
    this.config = { ...defaultConfig, ...config };
    this.sessionId = this.getOrCreateSession();
    this.lastActivity = Date.now();

    // Check Do Not Track
    if (this.config.respectDNT && this.isDNTEnabled()) {
      console.log('[Analytics] DNT enabled, tracking disabled');
      return;
    }

    this.startFlushTimer();
    this.setupVisibilityHandler();
    this.setupUnloadHandler();
  }

  private isDNTEnabled(): boolean {
    return (
      navigator.doNotTrack === '1' ||
      (window as any).doNotTrack === '1' ||
      (navigator as any).msDoNotTrack === '1'
    );
  }

  private getOrCreateSession(): string {
    const stored = sessionStorage.getItem('analytics_session');
    const lastActivity = sessionStorage.getItem('analytics_last_activity');

    if (stored && lastActivity) {
      const elapsed = Date.now() - parseInt(lastActivity, 10);
      if (elapsed < this.config.sessionTimeout) {
        return stored;
      }
    }

    const newSession = this.generateSessionId();
    sessionStorage.setItem('analytics_session', newSession);
    return newSession;
  }

  private generateSessionId(): string {
    // Use Web Crypto for proper randomness
    const array = new Uint8Array(16);
    crypto.getRandomValues(array);
    return Array.from(array, (b) => b.toString(16).padStart(2, '0')).join('');
  }

  private updateActivity(): void {
    this.lastActivity = Date.now();
    sessionStorage.setItem(
      'analytics_last_activity',
      this.lastActivity.toString()
    );
  }

  /**
   * Hash user identifiers to prevent PII transmission
   */
  private async hashIdentifier(value: string): Promise<string> {
    const encoder = new TextEncoder();
    const data = encoder.encode(value + this.config.endpoint); // Salt with endpoint
    const hash = await crypto.subtle.digest('SHA-256', data);
    const hashArray = Array.from(new Uint8Array(hash));
    return hashArray.map((b) => b.toString(16).padStart(2, '0')).join('');
  }

  /**
   * Strip PII from properties
   */
  private sanitizeProperties(
    properties: Record<string, unknown>
  ): Record<string, unknown> {
    const piiPatterns = [
      /email/i,
      /phone/i,
      /address/i,
      /name/i,
      /ssn/i,
      /password/i,
      /credit.?card/i,
      /birth.?date/i,
    ];

    const sanitized: Record<string, unknown> = {};

    for (const [key, value] of Object.entries(properties)) {
      // Skip if key matches PII pattern
      if (piiPatterns.some((pattern) => pattern.test(key))) {
        sanitized[key] = '[REDACTED]';
        continue;
      }

      // Recursively sanitize nested objects
      if (value && typeof value === 'object' && !Array.isArray(value)) {
        sanitized[key] = this.sanitizeProperties(
          value as Record<string, unknown>
        );
      } else {
        sanitized[key] = value;
      }
    }

    return sanitized;
  }

  /**
   * Track an event
   */
  async track(
    name: string,
    properties: Record<string, unknown> = {}
  ): Promise<void> {
    if (this.config.respectDNT && this.isDNTEnabled()) {
      return;
    }

    this.updateActivity();

    const event: QueuedEvent = {
      id: this.generateSessionId(),
      name,
      properties: this.sanitizeProperties(properties),
      timestamp: Date.now(),
      sessionId: this.sessionId,
      pageUrl: this.sanitizeUrl(window.location.href),
      referrer: this.sanitizeUrl(document.referrer),
      retries: 0,
    };

    this.queue.push(event);

    if (this.queue.length >= this.config.batchSize) {
      await this.flush();
    }
  }

  /**
   * Remove query params that might contain PII
   */
  private sanitizeUrl(url: string): string {
    if (!url) return '';

    try {
      const parsed = new URL(url);

      // Remove common PII query params
      const piiParams = [
        'email',
        'token',
        'key',
        'password',
        'auth',
        'session',
        'user',
        'uid',
      ];

      piiParams.forEach((param) => {
        if (parsed.searchParams.has(param)) {
          parsed.searchParams.set(param, '[REDACTED]');
        }
      });

      return parsed.toString();
    } catch {
      return url.split('?')[0]; // Just return path if URL parsing fails
    }
  }

  /**
   * Track page view
   */
  trackPageView(properties: Record<string, unknown> = {}): void {
    this.track('page_view', {
      ...properties,
      title: document.title,
      path: window.location.pathname,
    });
  }

  /**
   * Flush events to server
   */
  async flush(): Promise<void> {
    if (this.queue.length === 0) return;

    const batch = this.queue.splice(0, this.config.batchSize);

    try {
      const payload = {
        events: batch.map(({ id, retries, ...event }) => event),
        context: {
          userAgent: navigator.userAgent,
          language: navigator.language,
          screenWidth: window.screen.width,
          screenHeight: window.screen.height,
          viewportWidth: window.innerWidth,
          viewportHeight: window.innerHeight,
          timezone: Intl.DateTimeFormat().resolvedOptions().timeZone,
        },
      };

      const response = await fetch(this.config.endpoint, {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
        },
        body: JSON.stringify(payload),
        keepalive: true, // Allows request to outlive page
      });

      if (!response.ok) {
        throw new Error(`HTTP ${response.status}`);
      }
    } catch (error) {
      // Re-queue failed events with retry count
      const retriable = batch.filter(
        (event) => event.retries < this.config.maxRetries
      );

      retriable.forEach((event) => {
        event.retries++;
        this.queue.unshift(event);
      });

      console.error('[Analytics] Flush failed:', error);
    }
  }

  private startFlushTimer(): void {
    this.flushTimer = setInterval(() => {
      this.flush();
    }, this.config.flushInterval);
  }

  private setupVisibilityHandler(): void {
    document.addEventListener('visibilitychange', () => {
      if (document.visibilityState === 'hidden') {
        this.flush();
      }
    });
  }

  private setupUnloadHandler(): void {
    // Use sendBeacon for reliability on page unload
    window.addEventListener('pagehide', () => {
      if (this.queue.length === 0) return;

      const payload = JSON.stringify({
        events: this.queue.map(({ id, retries, ...event }) => event),
        context: {
          userAgent: navigator.userAgent,
          language: navigator.language,
          timezone: Intl.DateTimeFormat().resolvedOptions().timeZone,
        },
      });

      navigator.sendBeacon(this.config.endpoint, payload);
    });
  }

  /**
   * Identify user (hashed)
   */
  async identify(userId: string): Promise<void> {
    const hashedId = this.config.hashUserIds
      ? await this.hashIdentifier(userId)
      : userId;

    localStorage.setItem('analytics_user_id', hashedId);

    this.track('identify', {
      hashedUserId: hashedId,
    });
  }

  /**
   * Reset identity (logout)
   */
  reset(): void {
    localStorage.removeItem('analytics_user_id');
    sessionStorage.removeItem('analytics_session');
    this.sessionId = this.generateSessionId();
    sessionStorage.setItem('analytics_session', this.sessionId);
  }

  /**
   * Cleanup
   */
  destroy(): void {
    if (this.flushTimer) {
      clearInterval(this.flushTimer);
    }
    this.flush();
  }
}

// Singleton export
export const analytics = new Analytics();

Edge Function Processing

Process events at the edge for privacy and performance:

// edge-function.ts (Cloudflare Workers / Vercel Edge)

import { z } from 'zod';

const eventSchema = z.object({
  name: z.string().max(100),
  properties: z.record(z.unknown()).optional(),
  timestamp: z.number(),
  sessionId: z.string(),
  pageUrl: z.string().max(2000),
  referrer: z.string().max(2000).optional(),
});

const payloadSchema = z.object({
  events: z.array(eventSchema).max(100),
  context: z.object({
    userAgent: z.string().max(500),
    language: z.string().max(10),
    screenWidth: z.number().optional(),
    screenHeight: z.number().optional(),
    viewportWidth: z.number().optional(),
    viewportHeight: z.number().optional(),
    timezone: z.string().optional(),
  }),
});

interface EnrichedEvent {
  // Original event data
  name: string;
  properties: Record<string, unknown>;
  timestamp: number;
  sessionId: string;
  pageUrl: string;
  referrer?: string;

  // Edge-enriched data
  country?: string;
  region?: string;
  city?: string;
  deviceType: 'mobile' | 'tablet' | 'desktop';
  browser: string;
  os: string;
  isBot: boolean;
  receivedAt: number;
}

/**
 * Anonymize IP - keep only country-level precision
 */
function anonymizeIP(ip: string): string {
  if (ip.includes(':')) {
    // IPv6: keep first 48 bits (3 groups)
    const parts = ip.split(':');
    return parts.slice(0, 3).join(':') + '::';
  } else {
    // IPv4: keep first 3 octets
    const parts = ip.split('.');
    return parts.slice(0, 3).join('.') + '.0';
  }
}

/**
 * Parse user agent for device info
 */
function parseUserAgent(ua: string): {
  deviceType: 'mobile' | 'tablet' | 'desktop';
  browser: string;
  os: string;
} {
  const isMobile = /Mobile|Android|iPhone|iPad/.test(ua);
  const isTablet = /iPad|Tablet/.test(ua);

  let browser = 'Other';
  if (ua.includes('Chrome')) browser = 'Chrome';
  else if (ua.includes('Firefox')) browser = 'Firefox';
  else if (ua.includes('Safari')) browser = 'Safari';
  else if (ua.includes('Edge')) browser = 'Edge';

  let os = 'Other';
  if (ua.includes('Windows')) os = 'Windows';
  else if (ua.includes('Mac')) os = 'macOS';
  else if (ua.includes('Linux')) os = 'Linux';
  else if (ua.includes('Android')) os = 'Android';
  else if (ua.includes('iOS') || ua.includes('iPhone')) os = 'iOS';

  return {
    deviceType: isTablet ? 'tablet' : isMobile ? 'mobile' : 'desktop',
    browser,
    os,
  };
}

/**
 * Detect bots
 */
function isBot(ua: string): boolean {
  const botPatterns = [
    /bot/i,
    /crawler/i,
    /spider/i,
    /scraper/i,
    /headless/i,
    /puppet/i,
    /selenium/i,
    /phantomjs/i,
  ];

  return botPatterns.some((pattern) => pattern.test(ua));
}

/**
 * Rate limit by session
 */
const rateLimiter = new Map<string, { count: number; resetAt: number }>();

function checkRateLimit(
  sessionId: string,
  limit: number,
  windowMs: number
): boolean {
  const now = Date.now();
  const entry = rateLimiter.get(sessionId);

  if (!entry || entry.resetAt < now) {
    rateLimiter.set(sessionId, { count: 1, resetAt: now + windowMs });
    return true;
  }

  if (entry.count >= limit) {
    return false;
  }

  entry.count++;
  return true;
}

/**
 * Edge function handler
 */
export default async function handler(request: Request): Promise<Response> {
  // Only accept POST
  if (request.method !== 'POST') {
    return new Response('Method not allowed', { status: 405 });
  }

  // Parse and validate payload
  let payload;
  try {
    const body = await request.json();
    payload = payloadSchema.parse(body);
  } catch (error) {
    return new Response('Invalid payload', { status: 400 });
  }

  // Get session for rate limiting
  const sessionId = payload.events[0]?.sessionId;
  if (!sessionId) {
    return new Response('Session ID required', { status: 400 });
  }

  // Rate limit: 100 events per minute per session
  if (!checkRateLimit(sessionId, 100, 60000)) {
    return new Response('Rate limited', { status: 429 });
  }

  // Get geo data from Cloudflare headers (or similar)
  const country = request.headers.get('cf-ipcountry') || undefined;
  const city = request.headers.get('cf-ipcity') || undefined;
  const region = request.headers.get('cf-region') || undefined;

  // Parse user agent
  const deviceInfo = parseUserAgent(payload.context.userAgent);

  // Check for bots
  const botDetected = isBot(payload.context.userAgent);

  // Filter out bot traffic
  if (botDetected) {
    // Still return 200 to not reveal detection
    return new Response('OK', { status: 200 });
  }

  // Enrich events
  const enrichedEvents: EnrichedEvent[] = payload.events.map((event) => ({
    ...event,
    country,
    city,
    region,
    ...deviceInfo,
    isBot: botDetected,
    receivedAt: Date.now(),
  }));

  // Forward to ingest layer
  try {
    await forwardToIngest(enrichedEvents);
  } catch (error) {
    console.error('Ingest failed:', error);
    // Don't fail the request - events are already validated
  }

  return new Response('OK', { status: 200 });
}

/**
 * Forward to ClickHouse / Tinybird / Kafka
 */
async function forwardToIngest(events: EnrichedEvent[]): Promise<void> {
  // Option 1: Direct ClickHouse insert
  await insertToClickHouse(events);

  // Option 2: Tinybird Events API
  // await insertToTinybird(events);

  // Option 3: Kafka producer
  // await publishToKafka(events);
}

async function insertToClickHouse(events: EnrichedEvent[]): Promise<void> {
  const CLICKHOUSE_URL = process.env.CLICKHOUSE_URL!;
  const CLICKHOUSE_USER = process.env.CLICKHOUSE_USER!;
  const CLICKHOUSE_PASSWORD = process.env.CLICKHOUSE_PASSWORD!;

  // Format as NDJSON for ClickHouse
  const ndjson = events.map((e) => JSON.stringify(e)).join('\n');

  const response = await fetch(
    `${CLICKHOUSE_URL}/?query=INSERT INTO events FORMAT JSONEachRow`,
    {
      method: 'POST',
      headers: {
        'Content-Type': 'application/x-ndjson',
        'X-ClickHouse-User': CLICKHOUSE_USER,
        'X-ClickHouse-Key': CLICKHOUSE_PASSWORD,
      },
      body: ndjson,
    }
  );

  if (!response.ok) {
    throw new Error(`ClickHouse insert failed: ${response.status}`);
  }
}

async function insertToTinybird(events: EnrichedEvent[]): Promise<void> {
  const TINYBIRD_TOKEN = process.env.TINYBIRD_TOKEN!;
  const TINYBIRD_DATASOURCE = process.env.TINYBIRD_DATASOURCE!;

  const ndjson = events.map((e) => JSON.stringify(e)).join('\n');

  const response = await fetch(
    `https://api.tinybird.co/v0/events?name=${TINYBIRD_DATASOURCE}`,
    {
      method: 'POST',
      headers: {
        Authorization: `Bearer ${TINYBIRD_TOKEN}`,
        'Content-Type': 'application/x-ndjson',
      },
      body: ndjson,
    }
  );

  if (!response.ok) {
    throw new Error(`Tinybird insert failed: ${response.status}`);
  }
}

ClickHouse Schema Design

Optimized for time-series analytics:

-- events.sql

-- Main events table with partitioning and ordering
CREATE TABLE events
(
    -- Event identification
    event_id UUID DEFAULT generateUUIDv4(),
    name LowCardinality(String),

    -- Timestamps
    timestamp DateTime64(3),
    received_at DateTime64(3) DEFAULT now64(3),
    event_date Date DEFAULT toDate(timestamp),

    -- Session and user
    session_id String,
    user_id String DEFAULT '',  -- Hashed, optional

    -- Page context
    page_url String,
    page_path String MATERIALIZED extractURLPath(page_url),
    page_host String MATERIALIZED extractURLHost(page_url),
    referrer String DEFAULT '',
    referrer_host String MATERIALIZED extractURLHost(referrer),

    -- Geographic (country-level only for privacy)
    country LowCardinality(String) DEFAULT '',
    region LowCardinality(String) DEFAULT '',

    -- Device
    device_type LowCardinality(String),  -- mobile, tablet, desktop
    browser LowCardinality(String),
    os LowCardinality(String),

    -- Screen
    screen_width UInt16 DEFAULT 0,
    screen_height UInt16 DEFAULT 0,
    viewport_width UInt16 DEFAULT 0,
    viewport_height UInt16 DEFAULT 0,

    -- Event properties (flexible schema)
    properties String DEFAULT '{}',  -- JSON string

    -- Flags
    is_bot UInt8 DEFAULT 0
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_date)
ORDER BY (name, event_date, session_id, timestamp)
TTL event_date + INTERVAL 90 DAY  -- Auto-delete after 90 days
SETTINGS index_granularity = 8192;

-- Materialized view for page views aggregation
CREATE MATERIALIZED VIEW page_views_hourly
ENGINE = SummingMergeTree()
PARTITION BY toYYYYMM(event_date)
ORDER BY (event_date, event_hour, page_path, country, device_type)
AS SELECT
    event_date,
    toStartOfHour(timestamp) as event_hour,
    page_path,
    country,
    device_type,
    count() as views,
    uniqExact(session_id) as unique_sessions
FROM events
WHERE name = 'page_view'
GROUP BY event_date, event_hour, page_path, country, device_type;

-- Materialized view for session metrics
CREATE MATERIALIZED VIEW sessions_daily
ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(event_date)
ORDER BY (event_date, country, device_type)
AS SELECT
    event_date,
    country,
    device_type,
    uniqState(session_id) as unique_sessions,
    countState() as total_events,
    avgState(
        dateDiff('second',
            min(timestamp) OVER (PARTITION BY session_id),
            max(timestamp) OVER (PARTITION BY session_id)
        )
    ) as avg_session_duration
FROM events
GROUP BY event_date, country, device_type;

-- Funnel analysis table
CREATE TABLE funnels
(
    funnel_id String,
    session_id String,
    event_date Date,
    step_1_at Nullable(DateTime64(3)),
    step_2_at Nullable(DateTime64(3)),
    step_3_at Nullable(DateTime64(3)),
    step_4_at Nullable(DateTime64(3)),
    step_5_at Nullable(DateTime64(3)),
    completed UInt8 DEFAULT 0
)
ENGINE = ReplacingMergeTree()
PARTITION BY toYYYYMM(event_date)
ORDER BY (funnel_id, session_id, event_date);

Privacy-Preserving Query Layer

Build APIs that only expose aggregated data:

// analytics-api.ts

import { createClient } from '@clickhouse/client';

const clickhouse = createClient({
  host: process.env.CLICKHOUSE_URL,
  username: process.env.CLICKHOUSE_USER,
  password: process.env.CLICKHOUSE_PASSWORD,
});

interface DateRange {
  start: Date;
  end: Date;
}

interface PageViewsResponse {
  path: string;
  views: number;
  uniqueSessions: number;
}

interface AggregatedMetrics {
  totalViews: number;
  uniqueSessions: number;
  avgSessionDuration: number;
  bounceRate: number;
  topPages: PageViewsResponse[];
  deviceBreakdown: { device: string; percentage: number }[];
  countryBreakdown: { country: string; sessions: number }[];
}

/**
 * Minimum threshold for k-anonymity
 * Don't return data if fewer than K users
 */
const K_ANONYMITY_THRESHOLD = 5;

/**
 * Get aggregated metrics with privacy protection
 */
export async function getAggregatedMetrics(
  dateRange: DateRange
): Promise<AggregatedMetrics> {
  const [overview, topPages, devices, countries] = await Promise.all([
    getOverviewMetrics(dateRange),
    getTopPages(dateRange),
    getDeviceBreakdown(dateRange),
    getCountryBreakdown(dateRange),
  ]);

  return {
    ...overview,
    topPages,
    deviceBreakdown: devices,
    countryBreakdown: countries,
  };
}

async function getOverviewMetrics(dateRange: DateRange): Promise<{
  totalViews: number;
  uniqueSessions: number;
  avgSessionDuration: number;
  bounceRate: number;
}> {
  const result = await clickhouse.query({
    query: `
      SELECT
        count() as total_views,
        uniqExact(session_id) as unique_sessions,
        avgIf(session_duration, session_duration > 0) as avg_duration,
        countIf(event_count = 1) / count() as bounce_rate
      FROM (
        SELECT
          session_id,
          count() as event_count,
          dateDiff('second', min(timestamp), max(timestamp)) as session_duration
        FROM events
        WHERE event_date BETWEEN {start:Date} AND {end:Date}
          AND is_bot = 0
        GROUP BY session_id
      )
    `,
    query_params: {
      start: dateRange.start.toISOString().split('T')[0],
      end: dateRange.end.toISOString().split('T')[0],
    },
    format: 'JSONEachRow',
  });

  const rows = await result.json<{
    total_views: number;
    unique_sessions: number;
    avg_duration: number;
    bounce_rate: number;
  }>();

  const data = rows[0];

  return {
    totalViews: data.total_views,
    uniqueSessions: data.unique_sessions,
    avgSessionDuration: Math.round(data.avg_duration),
    bounceRate: Math.round(data.bounce_rate * 100) / 100,
  };
}

async function getTopPages(
  dateRange: DateRange,
  limit: number = 10
): Promise<PageViewsResponse[]> {
  const result = await clickhouse.query({
    query: `
      SELECT
        page_path as path,
        count() as views,
        uniqExact(session_id) as unique_sessions
      FROM events
      WHERE event_date BETWEEN {start:Date} AND {end:Date}
        AND name = 'page_view'
        AND is_bot = 0
      GROUP BY page_path
      HAVING unique_sessions >= {threshold:UInt32}  -- k-anonymity
      ORDER BY views DESC
      LIMIT {limit:UInt32}
    `,
    query_params: {
      start: dateRange.start.toISOString().split('T')[0],
      end: dateRange.end.toISOString().split('T')[0],
      limit,
      threshold: K_ANONYMITY_THRESHOLD,
    },
    format: 'JSONEachRow',
  });

  return result.json();
}

async function getDeviceBreakdown(
  dateRange: DateRange
): Promise<{ device: string; percentage: number }[]> {
  const result = await clickhouse.query({
    query: `
      SELECT
        device_type as device,
        round(count() * 100.0 / sum(count()) OVER (), 2) as percentage
      FROM events
      WHERE event_date BETWEEN {start:Date} AND {end:Date}
        AND is_bot = 0
      GROUP BY device_type
      HAVING count() >= {threshold:UInt32}
      ORDER BY percentage DESC
    `,
    query_params: {
      start: dateRange.start.toISOString().split('T')[0],
      end: dateRange.end.toISOString().split('T')[0],
      threshold: K_ANONYMITY_THRESHOLD,
    },
    format: 'JSONEachRow',
  });

  return result.json();
}

async function getCountryBreakdown(
  dateRange: DateRange,
  limit: number = 10
): Promise<{ country: string; sessions: number }[]> {
  const result = await clickhouse.query({
    query: `
      SELECT
        country,
        uniqExact(session_id) as sessions
      FROM events
      WHERE event_date BETWEEN {start:Date} AND {end:Date}
        AND is_bot = 0
        AND country != ''
      GROUP BY country
      HAVING sessions >= {threshold:UInt32}  -- k-anonymity
      ORDER BY sessions DESC
      LIMIT {limit:UInt32}
    `,
    query_params: {
      start: dateRange.start.toISOString().split('T')[0],
      end: dateRange.end.toISOString().split('T')[0],
      limit,
      threshold: K_ANONYMITY_THRESHOLD,
    },
    format: 'JSONEachRow',
  });

  return result.json();
}

/**
 * Funnel analysis - always aggregated
 */
export async function analyzeFunnel(
  dateRange: DateRange,
  steps: string[]
): Promise<{ step: string; users: number; dropoff: number }[]> {
  if (steps.length < 2 || steps.length > 5) {
    throw new Error('Funnel must have 2-5 steps');
  }

  const stepConditions = steps
    .map(
      (step, i) =>
        `sumIf(1, step_${i + 1} = 1) as step_${i + 1}_users`
    )
    .join(',\n');

  const result = await clickhouse.query({
    query: `
      WITH funnel_data AS (
        SELECT
          session_id,
          ${steps
            .map(
              (step, i) =>
                `max(if(name = '${step}', 1, 0)) as step_${i + 1}`
            )
            .join(',\n')}
        FROM events
        WHERE event_date BETWEEN {start:Date} AND {end:Date}
          AND name IN (${steps.map((s) => `'${s}'`).join(', ')})
          AND is_bot = 0
        GROUP BY session_id
      )
      SELECT
        ${stepConditions}
      FROM funnel_data
    `,
    query_params: {
      start: dateRange.start.toISOString().split('T')[0],
      end: dateRange.end.toISOString().split('T')[0],
    },
    format: 'JSONEachRow',
  });

  const rows = await result.json<Record<string, number>>();
  const data = rows[0];

  // Apply k-anonymity and calculate dropoffs
  return steps.map((step, i) => {
    const users = data[`step_${i + 1}_users`];
    const prevUsers = i === 0 ? users : data[`step_${i}_users`];
    const dropoff = i === 0 ? 0 : Math.round((1 - users / prevUsers) * 100);

    return {
      step,
      users: users >= K_ANONYMITY_THRESHOLD ? users : 0, // Hide if below threshold
      dropoff,
    };
  });
}

Differential Privacy for Sensitive Queries

When k-anonymity isn't enough, add noise:

// differential-privacy.ts

/**
 * Laplace mechanism for differential privacy
 */
export function laplaceMechanism(
  trueValue: number,
  sensitivity: number,
  epsilon: number
): number {
  // Laplace distribution with scale = sensitivity / epsilon
  const scale = sensitivity / epsilon;
  const u = Math.random() - 0.5;
  const noise = -scale * Math.sign(u) * Math.log(1 - 2 * Math.abs(u));

  return Math.round(trueValue + noise);
}

/**
 * Apply differential privacy to query results
 */
export function applyDifferentialPrivacy<T extends Record<string, number>>(
  results: T[],
  numericFields: (keyof T)[],
  options: {
    epsilon: number; // Privacy budget (lower = more private)
    sensitivity: number; // Max impact of single user
    minThreshold: number; // Don't report values below this
  }
): T[] {
  const { epsilon, sensitivity, minThreshold } = options;

  return results
    .map((row) => {
      const noisyRow = { ...row };

      for (const field of numericFields) {
        const trueValue = row[field] as number;
        const noisyValue = laplaceMechanism(trueValue, sensitivity, epsilon);

        // Suppress if below threshold after noise
        (noisyRow as Record<string, number>)[field as string] =
          noisyValue >= minThreshold ? Math.max(0, noisyValue) : 0;
      }

      return noisyRow;
    })
    .filter((row) => {
      // Remove rows where all numeric fields are 0
      return numericFields.some(
        (field) => (row[field] as number) > 0
      );
    });
}

/**
 * Privacy-preserving unique count using HyperLogLog
 */
export async function approximateUniqueCount(
  clickhouse: any,
  dateRange: DateRange,
  dimension: string
): Promise<number> {
  const result = await clickhouse.query({
    query: `
      SELECT uniqHLL12(session_id) as approx_unique
      FROM events
      WHERE event_date BETWEEN {start:Date} AND {end:Date}
        AND is_bot = 0
        ${dimension ? `AND ${dimension}` : ''}
    `,
    query_params: {
      start: dateRange.start.toISOString().split('T')[0],
      end: dateRange.end.toISOString().split('T')[0],
    },
    format: 'JSONEachRow',
  });

  const rows = await result.json<{ approx_unique: number }>();

  // HyperLogLog already provides privacy through approximation
  return rows[0].approx_unique;
}

// consent.ts

type ConsentCategory = 'necessary' | 'analytics' | 'marketing';

interface ConsentState {
  necessary: boolean; // Always true
  analytics: boolean;
  marketing: boolean;
  timestamp: number;
  version: string;
}

const CONSENT_VERSION = '1.0.0';
const CONSENT_KEY = 'privacy_consent';

export function getConsent(): ConsentState | null {
  const stored = localStorage.getItem(CONSENT_KEY);
  if (!stored) return null;

  try {
    return JSON.parse(stored);
  } catch {
    return null;
  }
}

export function setConsent(
  categories: Partial<Record<ConsentCategory, boolean>>
): ConsentState {
  const consent: ConsentState = {
    necessary: true, // Always required
    analytics: categories.analytics ?? false,
    marketing: categories.marketing ?? false,
    timestamp: Date.now(),
    version: CONSENT_VERSION,
  };

  localStorage.setItem(CONSENT_KEY, JSON.stringify(consent));

  // Emit event for analytics SDK to react
  window.dispatchEvent(
    new CustomEvent('consent-updated', { detail: consent })
  );

  return consent;
}

/**
 * Check if analytics is allowed
 */
export function canTrack(): boolean {
  const consent = getConsent();

  // No consent recorded = no tracking (GDPR compliant default)
  if (!consent) return false;

  return consent.analytics === true;
}

/**
 * React hook for consent state
 */
export function useConsent() {
  const [consent, setConsentState] = useState<ConsentState | null>(
    getConsent()
  );

  useEffect(() => {
    const handler = (e: CustomEvent<ConsentState>) => {
      setConsentState(e.detail);
    };

    window.addEventListener(
      'consent-updated',
      handler as EventListener
    );

    return () => {
      window.removeEventListener(
        'consent-updated',
        handler as EventListener
      );
    };
  }, []);

  const updateConsent = useCallback(
    (categories: Partial<Record<ConsentCategory, boolean>>) => {
      const newConsent = setConsent(categories);
      setConsentState(newConsent);
    },
    []
  );

  return { consent, updateConsent, canTrack: consent?.analytics ?? false };
}

/**
 * Consent-aware analytics wrapper
 */
export class ConsentAwareAnalytics {
  private analytics: Analytics;
  private queuedEvents: Array<{ name: string; properties: Record<string, unknown> }> = [];

  constructor(analytics: Analytics) {
    this.analytics = analytics;

    // Listen for consent changes
    window.addEventListener('consent-updated', ((e: CustomEvent<ConsentState>) => {
      if (e.detail.analytics) {
        this.flushQueue();
      } else {
        this.queuedEvents = []; // Clear queue if consent withdrawn
      }
    }) as EventListener);
  }

  track(name: string, properties: Record<string, unknown> = {}): void {
    if (canTrack()) {
      this.analytics.track(name, properties);
    } else {
      // Queue for potential future consent
      this.queuedEvents.push({ name, properties });
    }
  }

  private flushQueue(): void {
    for (const event of this.queuedEvents) {
      this.analytics.track(event.name, event.properties);
    }
    this.queuedEvents = [];
  }
}

Tinybird Integration

Serverless analytics with Tinybird for simpler ops:

// tinybird.ts

const TINYBIRD_HOST = 'https://api.tinybird.co';
const TINYBIRD_TOKEN = process.env.TINYBIRD_TOKEN!;

interface TinybirdQuery {
  pipe: string;
  params?: Record<string, string | number>;
}

export async function queryTinybird<T>(query: TinybirdQuery): Promise<T[]> {
  const url = new URL(`${TINYBIRD_HOST}/v0/pipes/${query.pipe}.json`);

  if (query.params) {
    for (const [key, value] of Object.entries(query.params)) {
      url.searchParams.append(key, String(value));
    }
  }

  const response = await fetch(url.toString(), {
    headers: {
      Authorization: `Bearer ${TINYBIRD_TOKEN}`,
    },
  });

  if (!response.ok) {
    throw new Error(`Tinybird query failed: ${response.status}`);
  }

  const result = await response.json();
  return result.data;
}

// Example: Tinybird pipe definition (page_views.pipe)
/*
DESCRIPTION >
    Page views aggregation with privacy controls

NODE page_views_aggregated
SQL >
    SELECT
        toDate(timestamp) as date,
        page_path,
        country,
        device_type,
        count() as views,
        uniqExact(session_id) as unique_sessions
    FROM events
    WHERE timestamp >= now() - INTERVAL 90 DAY
      AND is_bot = 0
    GROUP BY date, page_path, country, device_type
    HAVING unique_sessions >= 5  -- k-anonymity

NODE endpoint
SQL >
    SELECT *
    FROM page_views_aggregated
    WHERE date >= {{Date(start_date, '2024-01-01')}}
      AND date <= {{Date(end_date, '2024-12-31')}}
      {% if defined(country) %}
        AND country = {{String(country, '')}}
      {% end %}
    ORDER BY views DESC
    LIMIT {{Int32(limit, 100)}}
*/

// Usage
async function getPageViews(options: {
  startDate: string;
  endDate: string;
  country?: string;
  limit?: number;
}) {
  return queryTinybird<{
    date: string;
    page_path: string;
    country: string;
    device_type: string;
    views: number;
    unique_sessions: number;
  }>({
    pipe: 'page_views',
    params: {
      start_date: options.startDate,
      end_date: options.endDate,
      ...(options.country && { country: options.country }),
      ...(options.limit && { limit: options.limit }),
    },
  });
}

Real-Time Dashboard

Server-sent events for live metrics:

// realtime-api.ts

import { clickhouse } from './clickhouse';

export async function handleRealtimeStream(
  request: Request
): Promise<Response> {
  const encoder = new TextEncoder();

  const stream = new ReadableStream({
    async start(controller) {
      const sendEvent = (data: object) => {
        controller.enqueue(
          encoder.encode(`data: ${JSON.stringify(data)}\n\n`)
        );
      };

      // Send initial data
      const initial = await getRealtimeMetrics();
      sendEvent({ type: 'initial', data: initial });

      // Poll for updates every 5 seconds
      const interval = setInterval(async () => {
        try {
          const metrics = await getRealtimeMetrics();
          sendEvent({ type: 'update', data: metrics });
        } catch (error) {
          sendEvent({ type: 'error', message: 'Failed to fetch metrics' });
        }
      }, 5000);

      // Cleanup on close
      request.signal.addEventListener('abort', () => {
        clearInterval(interval);
        controller.close();
      });
    },
  });

  return new Response(stream, {
    headers: {
      'Content-Type': 'text/event-stream',
      'Cache-Control': 'no-cache',
      Connection: 'keep-alive',
    },
  });
}

async function getRealtimeMetrics() {
  const result = await clickhouse.query({
    query: `
      SELECT
        count() as events_last_minute,
        uniqExact(session_id) as active_sessions,
        countIf(name = 'page_view') as page_views_last_minute,
        topK(5)(page_path) as top_pages
      FROM events
      WHERE timestamp >= now() - INTERVAL 1 MINUTE
        AND is_bot = 0
    `,
    format: 'JSONEachRow',
  });

  const rows = await result.json();
  return rows[0];
}

// Client-side hook
function useRealtimeMetrics() {
  const [metrics, setMetrics] = useState<RealtimeMetrics | null>(null);
  const [error, setError] = useState<string | null>(null);

  useEffect(() => {
    const eventSource = new EventSource('/api/analytics/realtime');

    eventSource.onmessage = (event) => {
      const data = JSON.parse(event.data);

      if (data.type === 'error') {
        setError(data.message);
      } else {
        setMetrics(data.data);
        setError(null);
      }
    };

    eventSource.onerror = () => {
      setError('Connection lost');
    };

    return () => {
      eventSource.close();
    };
  }, []);

  return { metrics, error };
}

Data Retention and Deletion

GDPR-compliant data lifecycle:

// data-lifecycle.ts

/**
 * Delete user data on request (Right to Erasure)
 */
export async function deleteUserData(
  hashedUserId: string
): Promise<{ deletedRows: number }> {
  // Use ALTER TABLE ... DELETE for ClickHouse
  const result = await clickhouse.query({
    query: `
      ALTER TABLE events
      DELETE WHERE user_id = {userId:String}
    `,
    query_params: {
      userId: hashedUserId,
    },
  });

  // Log deletion for audit
  await logDeletionRequest(hashedUserId);

  return { deletedRows: result.affectedRows };
}

/**
 * Export user data (Right to Access)
 */
export async function exportUserData(
  hashedUserId: string
): Promise<UserDataExport> {
  const events = await clickhouse.query({
    query: `
      SELECT
        name,
        timestamp,
        page_url,
        properties
      FROM events
      WHERE user_id = {userId:String}
      ORDER BY timestamp DESC
    `,
    query_params: {
      userId: hashedUserId,
    },
    format: 'JSONEachRow',
  });

  return {
    userId: hashedUserId,
    exportedAt: new Date().toISOString(),
    events: await events.json(),
  };
}

/**
 * Automated data retention cleanup
 */
export async function enforceRetentionPolicy(
  retentionDays: number = 90
): Promise<void> {
  // ClickHouse TTL handles this automatically, but we can force cleanup
  await clickhouse.query({
    query: `
      OPTIMIZE TABLE events FINAL
    `,
  });

  // Log cleanup for compliance
  await logRetentionCleanup(retentionDays);
}

/**
 * Scheduled job for retention
 */
// Cron: 0 3 * * * (daily at 3 AM)
export async function retentionJob(): Promise<void> {
  console.log('Starting retention policy enforcement...');

  const tables = ['events', 'sessions_daily', 'page_views_hourly'];

  for (const table of tables) {
    await clickhouse.query({
      query: `OPTIMIZE TABLE ${table} FINAL`,
    });
  }

  console.log('Retention policy enforced.');
}

Privacy Compliance Checklist

┌─────────────────────────────────────────────────────────────────┐
│                  Privacy Compliance Matrix                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  GDPR Requirements                              Status           │
│  ├── Lawful basis documented                    [  ]             │
│  ├── Consent mechanism implemented              [  ]             │
│  ├── Privacy policy updated                     [  ]             │
│  ├── Data minimization applied                  [  ]             │
│  ├── Purpose limitation enforced                [  ]             │
│  ├── Storage limitation (TTL)                   [  ]             │
│  ├── Right to access implemented                [  ]             │
│  ├── Right to erasure implemented               [  ]             │
│  ├── Data portability supported                 [  ]             │
│  └── DPA with sub-processors                    [  ]             │
│                                                                  │
│  Technical Measures                             Status           │
│  ├── IP anonymization                           [  ]             │
│  ├── User ID hashing                            [  ]             │
│  ├── PII stripping from URLs                    [  ]             │
│  ├── PII detection in properties                [  ]             │
│  ├── K-anonymity (k >= 5)                       [  ]             │
│  ├── Differential privacy (optional)            [  ]             │
│  ├── Data encryption at rest                    [  ]             │
│  ├── Data encryption in transit                 [  ]             │
│  ├── Audit logging                              [  ]             │
│  └── Automated data deletion                    [  ]             │
│                                                                  │
│  ePrivacy / Cookie Compliance                   Status           │
│  ├── No cookies for analytics (optional)        [  ]             │
│  ├── First-party only if cookies used           [  ]             │
│  ├── Cookie banner implemented                  [  ]             │
│  ├── Consent before tracking                    [  ]             │
│  └── Easy consent withdrawal                    [  ]             │
│                                                                  │
│  CCPA Requirements                              Status           │
│  ├── "Do Not Sell" respected                    [  ]             │
│  ├── Opt-out mechanism                          [  ]             │
│  ├── Data disclosure on request                 [  ]             │
│  └── No discrimination for opt-out              [  ]             │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Cost Analysis

First-party analytics costs at scale:

┌─────────────────────────────────────────────────────────────────┐
│                Monthly Cost Comparison (1M events/day)           │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Third-Party (Amplitude/Mixpanel)                               │
│  └── Growth plan: $2,000-5,000/month                            │
│                                                                  │
│  First-Party Stack                                              │
│  ├── ClickHouse Cloud (basic)         $200-400/month            │
│  ├── Edge functions (Cloudflare)      $50-100/month             │
│  ├── Storage (S3/R2)                  $20-50/month              │
│  └── Total:                           $270-550/month            │
│                                                                  │
│  Tinybird Stack                                                 │
│  ├── Tinybird Pro                     $400-800/month            │
│  ├── Edge functions                   $50-100/month             │
│  └── Total:                           $450-900/month            │
│                                                                  │
│  Self-Hosted ClickHouse                                         │
│  ├── 3-node cluster (VMs)             $300-600/month            │
│  ├── Engineering time                 Variable                  │
│  └── Total:                           $300-600 + eng time       │
│                                                                  │
│  Break-even typically at 500K-1M events/day                     │
│  Additional value: No vendor lock-in, full data control         │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Summary

Building privacy-respecting analytics requires:

  1. First-party collection — Events go to your servers, your domain
  2. Edge processing — Anonymize and enrich before storage
  3. OLAP storage — ClickHouse or Tinybird for fast aggregations
  4. Aggregation-only queries — Never expose individual user data
  5. K-anonymity — Don't report data below threshold
  6. Differential privacy — Add noise for sensitive queries
  7. Data lifecycle — TTL, right to erasure, data portability
  8. Consent integration — Track only with explicit permission

The result: analytics that answer product questions without becoming a liability or betraying user trust. Your users' data belongs on your servers, governed by your ethics, deleted when they ask.

Third-party analytics vendors built their business model on your users' data. You don't have to participate.

What did you think?

© 2026 Vidhya Sagar Thakur. All rights reserved.