Designing an Analytics Pipeline for Frontend Without Selling Your Users Out
Designing an Analytics Pipeline for Frontend Events Without Selling Your Users Out
First-party event collection, edge-based analytics with ClickHouse or Tinybird, privacy-preserving aggregation, and building analytics that respect users while delivering the insights product teams need.
The Problem with Third-Party Analytics
Google Analytics, Mixpanel, Amplitude, Segment—they all have the same fundamental issue: your users' behavioral data lives on someone else's servers, governed by someone else's privacy policy, monetized by someone else's business model.
┌─────────────────────────────────────────────────────────────────┐
│ Third-Party Analytics Data Flow │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Your User │
│ │ │
│ │ Clicks, scrolls, views │
│ ▼ │
│ Your App ──────────────────────────┐ │
│ │ │ │
│ │ analytics.track() │ (you think) │
│ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Third-Party Analytics Server │ │
│ ├─────────────────────────────────────────────────────┤ │
│ │ │ │
│ │ • User's full browsing history (cross-site) │ │
│ │ • Device fingerprint │ │
│ │ • IP-based location │ │
│ │ • Behavioral profile │ │
│ │ • Cross-referenced with other customers' data │ │
│ │ │ │
│ │ Sold to: advertisers, data brokers, "partners" │ │
│ │ │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
The real costs:
- Legal liability — GDPR, CCPA, ePrivacy violations
- User trust erosion — Privacy-conscious users bounce
- Ad blockers — 30-40% of traffic blocked
- Data dependency — Vendor holds your insights hostage
- Ethical burden — Complicity in surveillance capitalism
First-Party Architecture
The alternative: own your data pipeline end-to-end.
┌─────────────────────────────────────────────────────────────────┐
│ First-Party Analytics Architecture │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Browser │
│ ┌────────────────────────────────────────────┐ │
│ │ Event Collector SDK │ │
│ │ • Batching │ │
│ │ • Compression │ │
│ │ • Retry logic │ │
│ │ • PII stripping │ │
│ └───────────────────┬────────────────────────┘ │
│ │ │
│ │ POST /events (your domain) │
│ ▼ │
│ Edge (Cloudflare/Vercel/Fastly) │
│ ┌────────────────────────────────────────────┐ │
│ │ Edge Function │ │
│ │ • Rate limiting │ │
│ │ • IP anonymization │ │
│ │ • Geo enrichment │ │
│ │ • Session stitching │ │
│ │ • Bot filtering │ │
│ └───────────────────┬────────────────────────┘ │
│ │ │
│ │ Kafka / HTTP │
│ ▼ │
│ Ingest Layer │
│ ┌────────────────────────────────────────────┐ │
│ │ Stream Processor │ │
│ │ • Schema validation │ │
│ │ • Deduplication │ │
│ │ • Sessionization │ │
│ │ • User ID resolution │ │
│ └───────────────────┬────────────────────────┘ │
│ │ │
│ ▼ │
│ Storage (ClickHouse / Tinybird / BigQuery) │
│ ┌────────────────────────────────────────────┐ │
│ │ OLAP Database │ │
│ │ • Columnar storage │ │
│ │ • Time-series optimized │ │
│ │ • Real-time aggregations │ │
│ └───────────────────┬────────────────────────┘ │
│ │ │
│ ▼ │
│ Query Layer │
│ ┌────────────────────────────────────────────┐ │
│ │ API / Dashboard │ │
│ │ • Aggregated insights only │ │
│ │ • No individual user data exposed │ │
│ │ • Differential privacy queries │ │
│ └────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Client-Side Event Collection
Build a lightweight SDK that respects privacy by design:
// analytics-sdk.ts
interface AnalyticsConfig {
endpoint: string;
batchSize: number;
flushInterval: number;
maxRetries: number;
sessionTimeout: number;
respectDNT: boolean;
anonymizeIP: boolean;
hashUserIds: boolean;
}
interface Event {
name: string;
properties: Record<string, unknown>;
timestamp: number;
sessionId: string;
pageUrl: string;
referrer: string;
}
interface QueuedEvent extends Event {
id: string;
retries: number;
}
const defaultConfig: AnalyticsConfig = {
endpoint: '/api/events',
batchSize: 10,
flushInterval: 5000,
maxRetries: 3,
sessionTimeout: 30 * 60 * 1000, // 30 minutes
respectDNT: true,
anonymizeIP: true,
hashUserIds: true,
};
export class Analytics {
private config: AnalyticsConfig;
private queue: QueuedEvent[] = [];
private flushTimer: ReturnType<typeof setInterval> | null = null;
private sessionId: string;
private lastActivity: number;
constructor(config: Partial<AnalyticsConfig> = {}) {
this.config = { ...defaultConfig, ...config };
this.sessionId = this.getOrCreateSession();
this.lastActivity = Date.now();
// Check Do Not Track
if (this.config.respectDNT && this.isDNTEnabled()) {
console.log('[Analytics] DNT enabled, tracking disabled');
return;
}
this.startFlushTimer();
this.setupVisibilityHandler();
this.setupUnloadHandler();
}
private isDNTEnabled(): boolean {
return (
navigator.doNotTrack === '1' ||
(window as any).doNotTrack === '1' ||
(navigator as any).msDoNotTrack === '1'
);
}
private getOrCreateSession(): string {
const stored = sessionStorage.getItem('analytics_session');
const lastActivity = sessionStorage.getItem('analytics_last_activity');
if (stored && lastActivity) {
const elapsed = Date.now() - parseInt(lastActivity, 10);
if (elapsed < this.config.sessionTimeout) {
return stored;
}
}
const newSession = this.generateSessionId();
sessionStorage.setItem('analytics_session', newSession);
return newSession;
}
private generateSessionId(): string {
// Use Web Crypto for proper randomness
const array = new Uint8Array(16);
crypto.getRandomValues(array);
return Array.from(array, (b) => b.toString(16).padStart(2, '0')).join('');
}
private updateActivity(): void {
this.lastActivity = Date.now();
sessionStorage.setItem(
'analytics_last_activity',
this.lastActivity.toString()
);
}
/**
* Hash user identifiers to prevent PII transmission
*/
private async hashIdentifier(value: string): Promise<string> {
const encoder = new TextEncoder();
const data = encoder.encode(value + this.config.endpoint); // Salt with endpoint
const hash = await crypto.subtle.digest('SHA-256', data);
const hashArray = Array.from(new Uint8Array(hash));
return hashArray.map((b) => b.toString(16).padStart(2, '0')).join('');
}
/**
* Strip PII from properties
*/
private sanitizeProperties(
properties: Record<string, unknown>
): Record<string, unknown> {
const piiPatterns = [
/email/i,
/phone/i,
/address/i,
/name/i,
/ssn/i,
/password/i,
/credit.?card/i,
/birth.?date/i,
];
const sanitized: Record<string, unknown> = {};
for (const [key, value] of Object.entries(properties)) {
// Skip if key matches PII pattern
if (piiPatterns.some((pattern) => pattern.test(key))) {
sanitized[key] = '[REDACTED]';
continue;
}
// Recursively sanitize nested objects
if (value && typeof value === 'object' && !Array.isArray(value)) {
sanitized[key] = this.sanitizeProperties(
value as Record<string, unknown>
);
} else {
sanitized[key] = value;
}
}
return sanitized;
}
/**
* Track an event
*/
async track(
name: string,
properties: Record<string, unknown> = {}
): Promise<void> {
if (this.config.respectDNT && this.isDNTEnabled()) {
return;
}
this.updateActivity();
const event: QueuedEvent = {
id: this.generateSessionId(),
name,
properties: this.sanitizeProperties(properties),
timestamp: Date.now(),
sessionId: this.sessionId,
pageUrl: this.sanitizeUrl(window.location.href),
referrer: this.sanitizeUrl(document.referrer),
retries: 0,
};
this.queue.push(event);
if (this.queue.length >= this.config.batchSize) {
await this.flush();
}
}
/**
* Remove query params that might contain PII
*/
private sanitizeUrl(url: string): string {
if (!url) return '';
try {
const parsed = new URL(url);
// Remove common PII query params
const piiParams = [
'email',
'token',
'key',
'password',
'auth',
'session',
'user',
'uid',
];
piiParams.forEach((param) => {
if (parsed.searchParams.has(param)) {
parsed.searchParams.set(param, '[REDACTED]');
}
});
return parsed.toString();
} catch {
return url.split('?')[0]; // Just return path if URL parsing fails
}
}
/**
* Track page view
*/
trackPageView(properties: Record<string, unknown> = {}): void {
this.track('page_view', {
...properties,
title: document.title,
path: window.location.pathname,
});
}
/**
* Flush events to server
*/
async flush(): Promise<void> {
if (this.queue.length === 0) return;
const batch = this.queue.splice(0, this.config.batchSize);
try {
const payload = {
events: batch.map(({ id, retries, ...event }) => event),
context: {
userAgent: navigator.userAgent,
language: navigator.language,
screenWidth: window.screen.width,
screenHeight: window.screen.height,
viewportWidth: window.innerWidth,
viewportHeight: window.innerHeight,
timezone: Intl.DateTimeFormat().resolvedOptions().timeZone,
},
};
const response = await fetch(this.config.endpoint, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify(payload),
keepalive: true, // Allows request to outlive page
});
if (!response.ok) {
throw new Error(`HTTP ${response.status}`);
}
} catch (error) {
// Re-queue failed events with retry count
const retriable = batch.filter(
(event) => event.retries < this.config.maxRetries
);
retriable.forEach((event) => {
event.retries++;
this.queue.unshift(event);
});
console.error('[Analytics] Flush failed:', error);
}
}
private startFlushTimer(): void {
this.flushTimer = setInterval(() => {
this.flush();
}, this.config.flushInterval);
}
private setupVisibilityHandler(): void {
document.addEventListener('visibilitychange', () => {
if (document.visibilityState === 'hidden') {
this.flush();
}
});
}
private setupUnloadHandler(): void {
// Use sendBeacon for reliability on page unload
window.addEventListener('pagehide', () => {
if (this.queue.length === 0) return;
const payload = JSON.stringify({
events: this.queue.map(({ id, retries, ...event }) => event),
context: {
userAgent: navigator.userAgent,
language: navigator.language,
timezone: Intl.DateTimeFormat().resolvedOptions().timeZone,
},
});
navigator.sendBeacon(this.config.endpoint, payload);
});
}
/**
* Identify user (hashed)
*/
async identify(userId: string): Promise<void> {
const hashedId = this.config.hashUserIds
? await this.hashIdentifier(userId)
: userId;
localStorage.setItem('analytics_user_id', hashedId);
this.track('identify', {
hashedUserId: hashedId,
});
}
/**
* Reset identity (logout)
*/
reset(): void {
localStorage.removeItem('analytics_user_id');
sessionStorage.removeItem('analytics_session');
this.sessionId = this.generateSessionId();
sessionStorage.setItem('analytics_session', this.sessionId);
}
/**
* Cleanup
*/
destroy(): void {
if (this.flushTimer) {
clearInterval(this.flushTimer);
}
this.flush();
}
}
// Singleton export
export const analytics = new Analytics();
Edge Function Processing
Process events at the edge for privacy and performance:
// edge-function.ts (Cloudflare Workers / Vercel Edge)
import { z } from 'zod';
const eventSchema = z.object({
name: z.string().max(100),
properties: z.record(z.unknown()).optional(),
timestamp: z.number(),
sessionId: z.string(),
pageUrl: z.string().max(2000),
referrer: z.string().max(2000).optional(),
});
const payloadSchema = z.object({
events: z.array(eventSchema).max(100),
context: z.object({
userAgent: z.string().max(500),
language: z.string().max(10),
screenWidth: z.number().optional(),
screenHeight: z.number().optional(),
viewportWidth: z.number().optional(),
viewportHeight: z.number().optional(),
timezone: z.string().optional(),
}),
});
interface EnrichedEvent {
// Original event data
name: string;
properties: Record<string, unknown>;
timestamp: number;
sessionId: string;
pageUrl: string;
referrer?: string;
// Edge-enriched data
country?: string;
region?: string;
city?: string;
deviceType: 'mobile' | 'tablet' | 'desktop';
browser: string;
os: string;
isBot: boolean;
receivedAt: number;
}
/**
* Anonymize IP - keep only country-level precision
*/
function anonymizeIP(ip: string): string {
if (ip.includes(':')) {
// IPv6: keep first 48 bits (3 groups)
const parts = ip.split(':');
return parts.slice(0, 3).join(':') + '::';
} else {
// IPv4: keep first 3 octets
const parts = ip.split('.');
return parts.slice(0, 3).join('.') + '.0';
}
}
/**
* Parse user agent for device info
*/
function parseUserAgent(ua: string): {
deviceType: 'mobile' | 'tablet' | 'desktop';
browser: string;
os: string;
} {
const isMobile = /Mobile|Android|iPhone|iPad/.test(ua);
const isTablet = /iPad|Tablet/.test(ua);
let browser = 'Other';
if (ua.includes('Chrome')) browser = 'Chrome';
else if (ua.includes('Firefox')) browser = 'Firefox';
else if (ua.includes('Safari')) browser = 'Safari';
else if (ua.includes('Edge')) browser = 'Edge';
let os = 'Other';
if (ua.includes('Windows')) os = 'Windows';
else if (ua.includes('Mac')) os = 'macOS';
else if (ua.includes('Linux')) os = 'Linux';
else if (ua.includes('Android')) os = 'Android';
else if (ua.includes('iOS') || ua.includes('iPhone')) os = 'iOS';
return {
deviceType: isTablet ? 'tablet' : isMobile ? 'mobile' : 'desktop',
browser,
os,
};
}
/**
* Detect bots
*/
function isBot(ua: string): boolean {
const botPatterns = [
/bot/i,
/crawler/i,
/spider/i,
/scraper/i,
/headless/i,
/puppet/i,
/selenium/i,
/phantomjs/i,
];
return botPatterns.some((pattern) => pattern.test(ua));
}
/**
* Rate limit by session
*/
const rateLimiter = new Map<string, { count: number; resetAt: number }>();
function checkRateLimit(
sessionId: string,
limit: number,
windowMs: number
): boolean {
const now = Date.now();
const entry = rateLimiter.get(sessionId);
if (!entry || entry.resetAt < now) {
rateLimiter.set(sessionId, { count: 1, resetAt: now + windowMs });
return true;
}
if (entry.count >= limit) {
return false;
}
entry.count++;
return true;
}
/**
* Edge function handler
*/
export default async function handler(request: Request): Promise<Response> {
// Only accept POST
if (request.method !== 'POST') {
return new Response('Method not allowed', { status: 405 });
}
// Parse and validate payload
let payload;
try {
const body = await request.json();
payload = payloadSchema.parse(body);
} catch (error) {
return new Response('Invalid payload', { status: 400 });
}
// Get session for rate limiting
const sessionId = payload.events[0]?.sessionId;
if (!sessionId) {
return new Response('Session ID required', { status: 400 });
}
// Rate limit: 100 events per minute per session
if (!checkRateLimit(sessionId, 100, 60000)) {
return new Response('Rate limited', { status: 429 });
}
// Get geo data from Cloudflare headers (or similar)
const country = request.headers.get('cf-ipcountry') || undefined;
const city = request.headers.get('cf-ipcity') || undefined;
const region = request.headers.get('cf-region') || undefined;
// Parse user agent
const deviceInfo = parseUserAgent(payload.context.userAgent);
// Check for bots
const botDetected = isBot(payload.context.userAgent);
// Filter out bot traffic
if (botDetected) {
// Still return 200 to not reveal detection
return new Response('OK', { status: 200 });
}
// Enrich events
const enrichedEvents: EnrichedEvent[] = payload.events.map((event) => ({
...event,
country,
city,
region,
...deviceInfo,
isBot: botDetected,
receivedAt: Date.now(),
}));
// Forward to ingest layer
try {
await forwardToIngest(enrichedEvents);
} catch (error) {
console.error('Ingest failed:', error);
// Don't fail the request - events are already validated
}
return new Response('OK', { status: 200 });
}
/**
* Forward to ClickHouse / Tinybird / Kafka
*/
async function forwardToIngest(events: EnrichedEvent[]): Promise<void> {
// Option 1: Direct ClickHouse insert
await insertToClickHouse(events);
// Option 2: Tinybird Events API
// await insertToTinybird(events);
// Option 3: Kafka producer
// await publishToKafka(events);
}
async function insertToClickHouse(events: EnrichedEvent[]): Promise<void> {
const CLICKHOUSE_URL = process.env.CLICKHOUSE_URL!;
const CLICKHOUSE_USER = process.env.CLICKHOUSE_USER!;
const CLICKHOUSE_PASSWORD = process.env.CLICKHOUSE_PASSWORD!;
// Format as NDJSON for ClickHouse
const ndjson = events.map((e) => JSON.stringify(e)).join('\n');
const response = await fetch(
`${CLICKHOUSE_URL}/?query=INSERT INTO events FORMAT JSONEachRow`,
{
method: 'POST',
headers: {
'Content-Type': 'application/x-ndjson',
'X-ClickHouse-User': CLICKHOUSE_USER,
'X-ClickHouse-Key': CLICKHOUSE_PASSWORD,
},
body: ndjson,
}
);
if (!response.ok) {
throw new Error(`ClickHouse insert failed: ${response.status}`);
}
}
async function insertToTinybird(events: EnrichedEvent[]): Promise<void> {
const TINYBIRD_TOKEN = process.env.TINYBIRD_TOKEN!;
const TINYBIRD_DATASOURCE = process.env.TINYBIRD_DATASOURCE!;
const ndjson = events.map((e) => JSON.stringify(e)).join('\n');
const response = await fetch(
`https://api.tinybird.co/v0/events?name=${TINYBIRD_DATASOURCE}`,
{
method: 'POST',
headers: {
Authorization: `Bearer ${TINYBIRD_TOKEN}`,
'Content-Type': 'application/x-ndjson',
},
body: ndjson,
}
);
if (!response.ok) {
throw new Error(`Tinybird insert failed: ${response.status}`);
}
}
ClickHouse Schema Design
Optimized for time-series analytics:
-- events.sql
-- Main events table with partitioning and ordering
CREATE TABLE events
(
-- Event identification
event_id UUID DEFAULT generateUUIDv4(),
name LowCardinality(String),
-- Timestamps
timestamp DateTime64(3),
received_at DateTime64(3) DEFAULT now64(3),
event_date Date DEFAULT toDate(timestamp),
-- Session and user
session_id String,
user_id String DEFAULT '', -- Hashed, optional
-- Page context
page_url String,
page_path String MATERIALIZED extractURLPath(page_url),
page_host String MATERIALIZED extractURLHost(page_url),
referrer String DEFAULT '',
referrer_host String MATERIALIZED extractURLHost(referrer),
-- Geographic (country-level only for privacy)
country LowCardinality(String) DEFAULT '',
region LowCardinality(String) DEFAULT '',
-- Device
device_type LowCardinality(String), -- mobile, tablet, desktop
browser LowCardinality(String),
os LowCardinality(String),
-- Screen
screen_width UInt16 DEFAULT 0,
screen_height UInt16 DEFAULT 0,
viewport_width UInt16 DEFAULT 0,
viewport_height UInt16 DEFAULT 0,
-- Event properties (flexible schema)
properties String DEFAULT '{}', -- JSON string
-- Flags
is_bot UInt8 DEFAULT 0
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_date)
ORDER BY (name, event_date, session_id, timestamp)
TTL event_date + INTERVAL 90 DAY -- Auto-delete after 90 days
SETTINGS index_granularity = 8192;
-- Materialized view for page views aggregation
CREATE MATERIALIZED VIEW page_views_hourly
ENGINE = SummingMergeTree()
PARTITION BY toYYYYMM(event_date)
ORDER BY (event_date, event_hour, page_path, country, device_type)
AS SELECT
event_date,
toStartOfHour(timestamp) as event_hour,
page_path,
country,
device_type,
count() as views,
uniqExact(session_id) as unique_sessions
FROM events
WHERE name = 'page_view'
GROUP BY event_date, event_hour, page_path, country, device_type;
-- Materialized view for session metrics
CREATE MATERIALIZED VIEW sessions_daily
ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(event_date)
ORDER BY (event_date, country, device_type)
AS SELECT
event_date,
country,
device_type,
uniqState(session_id) as unique_sessions,
countState() as total_events,
avgState(
dateDiff('second',
min(timestamp) OVER (PARTITION BY session_id),
max(timestamp) OVER (PARTITION BY session_id)
)
) as avg_session_duration
FROM events
GROUP BY event_date, country, device_type;
-- Funnel analysis table
CREATE TABLE funnels
(
funnel_id String,
session_id String,
event_date Date,
step_1_at Nullable(DateTime64(3)),
step_2_at Nullable(DateTime64(3)),
step_3_at Nullable(DateTime64(3)),
step_4_at Nullable(DateTime64(3)),
step_5_at Nullable(DateTime64(3)),
completed UInt8 DEFAULT 0
)
ENGINE = ReplacingMergeTree()
PARTITION BY toYYYYMM(event_date)
ORDER BY (funnel_id, session_id, event_date);
Privacy-Preserving Query Layer
Build APIs that only expose aggregated data:
// analytics-api.ts
import { createClient } from '@clickhouse/client';
const clickhouse = createClient({
host: process.env.CLICKHOUSE_URL,
username: process.env.CLICKHOUSE_USER,
password: process.env.CLICKHOUSE_PASSWORD,
});
interface DateRange {
start: Date;
end: Date;
}
interface PageViewsResponse {
path: string;
views: number;
uniqueSessions: number;
}
interface AggregatedMetrics {
totalViews: number;
uniqueSessions: number;
avgSessionDuration: number;
bounceRate: number;
topPages: PageViewsResponse[];
deviceBreakdown: { device: string; percentage: number }[];
countryBreakdown: { country: string; sessions: number }[];
}
/**
* Minimum threshold for k-anonymity
* Don't return data if fewer than K users
*/
const K_ANONYMITY_THRESHOLD = 5;
/**
* Get aggregated metrics with privacy protection
*/
export async function getAggregatedMetrics(
dateRange: DateRange
): Promise<AggregatedMetrics> {
const [overview, topPages, devices, countries] = await Promise.all([
getOverviewMetrics(dateRange),
getTopPages(dateRange),
getDeviceBreakdown(dateRange),
getCountryBreakdown(dateRange),
]);
return {
...overview,
topPages,
deviceBreakdown: devices,
countryBreakdown: countries,
};
}
async function getOverviewMetrics(dateRange: DateRange): Promise<{
totalViews: number;
uniqueSessions: number;
avgSessionDuration: number;
bounceRate: number;
}> {
const result = await clickhouse.query({
query: `
SELECT
count() as total_views,
uniqExact(session_id) as unique_sessions,
avgIf(session_duration, session_duration > 0) as avg_duration,
countIf(event_count = 1) / count() as bounce_rate
FROM (
SELECT
session_id,
count() as event_count,
dateDiff('second', min(timestamp), max(timestamp)) as session_duration
FROM events
WHERE event_date BETWEEN {start:Date} AND {end:Date}
AND is_bot = 0
GROUP BY session_id
)
`,
query_params: {
start: dateRange.start.toISOString().split('T')[0],
end: dateRange.end.toISOString().split('T')[0],
},
format: 'JSONEachRow',
});
const rows = await result.json<{
total_views: number;
unique_sessions: number;
avg_duration: number;
bounce_rate: number;
}>();
const data = rows[0];
return {
totalViews: data.total_views,
uniqueSessions: data.unique_sessions,
avgSessionDuration: Math.round(data.avg_duration),
bounceRate: Math.round(data.bounce_rate * 100) / 100,
};
}
async function getTopPages(
dateRange: DateRange,
limit: number = 10
): Promise<PageViewsResponse[]> {
const result = await clickhouse.query({
query: `
SELECT
page_path as path,
count() as views,
uniqExact(session_id) as unique_sessions
FROM events
WHERE event_date BETWEEN {start:Date} AND {end:Date}
AND name = 'page_view'
AND is_bot = 0
GROUP BY page_path
HAVING unique_sessions >= {threshold:UInt32} -- k-anonymity
ORDER BY views DESC
LIMIT {limit:UInt32}
`,
query_params: {
start: dateRange.start.toISOString().split('T')[0],
end: dateRange.end.toISOString().split('T')[0],
limit,
threshold: K_ANONYMITY_THRESHOLD,
},
format: 'JSONEachRow',
});
return result.json();
}
async function getDeviceBreakdown(
dateRange: DateRange
): Promise<{ device: string; percentage: number }[]> {
const result = await clickhouse.query({
query: `
SELECT
device_type as device,
round(count() * 100.0 / sum(count()) OVER (), 2) as percentage
FROM events
WHERE event_date BETWEEN {start:Date} AND {end:Date}
AND is_bot = 0
GROUP BY device_type
HAVING count() >= {threshold:UInt32}
ORDER BY percentage DESC
`,
query_params: {
start: dateRange.start.toISOString().split('T')[0],
end: dateRange.end.toISOString().split('T')[0],
threshold: K_ANONYMITY_THRESHOLD,
},
format: 'JSONEachRow',
});
return result.json();
}
async function getCountryBreakdown(
dateRange: DateRange,
limit: number = 10
): Promise<{ country: string; sessions: number }[]> {
const result = await clickhouse.query({
query: `
SELECT
country,
uniqExact(session_id) as sessions
FROM events
WHERE event_date BETWEEN {start:Date} AND {end:Date}
AND is_bot = 0
AND country != ''
GROUP BY country
HAVING sessions >= {threshold:UInt32} -- k-anonymity
ORDER BY sessions DESC
LIMIT {limit:UInt32}
`,
query_params: {
start: dateRange.start.toISOString().split('T')[0],
end: dateRange.end.toISOString().split('T')[0],
limit,
threshold: K_ANONYMITY_THRESHOLD,
},
format: 'JSONEachRow',
});
return result.json();
}
/**
* Funnel analysis - always aggregated
*/
export async function analyzeFunnel(
dateRange: DateRange,
steps: string[]
): Promise<{ step: string; users: number; dropoff: number }[]> {
if (steps.length < 2 || steps.length > 5) {
throw new Error('Funnel must have 2-5 steps');
}
const stepConditions = steps
.map(
(step, i) =>
`sumIf(1, step_${i + 1} = 1) as step_${i + 1}_users`
)
.join(',\n');
const result = await clickhouse.query({
query: `
WITH funnel_data AS (
SELECT
session_id,
${steps
.map(
(step, i) =>
`max(if(name = '${step}', 1, 0)) as step_${i + 1}`
)
.join(',\n')}
FROM events
WHERE event_date BETWEEN {start:Date} AND {end:Date}
AND name IN (${steps.map((s) => `'${s}'`).join(', ')})
AND is_bot = 0
GROUP BY session_id
)
SELECT
${stepConditions}
FROM funnel_data
`,
query_params: {
start: dateRange.start.toISOString().split('T')[0],
end: dateRange.end.toISOString().split('T')[0],
},
format: 'JSONEachRow',
});
const rows = await result.json<Record<string, number>>();
const data = rows[0];
// Apply k-anonymity and calculate dropoffs
return steps.map((step, i) => {
const users = data[`step_${i + 1}_users`];
const prevUsers = i === 0 ? users : data[`step_${i}_users`];
const dropoff = i === 0 ? 0 : Math.round((1 - users / prevUsers) * 100);
return {
step,
users: users >= K_ANONYMITY_THRESHOLD ? users : 0, // Hide if below threshold
dropoff,
};
});
}
Differential Privacy for Sensitive Queries
When k-anonymity isn't enough, add noise:
// differential-privacy.ts
/**
* Laplace mechanism for differential privacy
*/
export function laplaceMechanism(
trueValue: number,
sensitivity: number,
epsilon: number
): number {
// Laplace distribution with scale = sensitivity / epsilon
const scale = sensitivity / epsilon;
const u = Math.random() - 0.5;
const noise = -scale * Math.sign(u) * Math.log(1 - 2 * Math.abs(u));
return Math.round(trueValue + noise);
}
/**
* Apply differential privacy to query results
*/
export function applyDifferentialPrivacy<T extends Record<string, number>>(
results: T[],
numericFields: (keyof T)[],
options: {
epsilon: number; // Privacy budget (lower = more private)
sensitivity: number; // Max impact of single user
minThreshold: number; // Don't report values below this
}
): T[] {
const { epsilon, sensitivity, minThreshold } = options;
return results
.map((row) => {
const noisyRow = { ...row };
for (const field of numericFields) {
const trueValue = row[field] as number;
const noisyValue = laplaceMechanism(trueValue, sensitivity, epsilon);
// Suppress if below threshold after noise
(noisyRow as Record<string, number>)[field as string] =
noisyValue >= minThreshold ? Math.max(0, noisyValue) : 0;
}
return noisyRow;
})
.filter((row) => {
// Remove rows where all numeric fields are 0
return numericFields.some(
(field) => (row[field] as number) > 0
);
});
}
/**
* Privacy-preserving unique count using HyperLogLog
*/
export async function approximateUniqueCount(
clickhouse: any,
dateRange: DateRange,
dimension: string
): Promise<number> {
const result = await clickhouse.query({
query: `
SELECT uniqHLL12(session_id) as approx_unique
FROM events
WHERE event_date BETWEEN {start:Date} AND {end:Date}
AND is_bot = 0
${dimension ? `AND ${dimension}` : ''}
`,
query_params: {
start: dateRange.start.toISOString().split('T')[0],
end: dateRange.end.toISOString().split('T')[0],
},
format: 'JSONEachRow',
});
const rows = await result.json<{ approx_unique: number }>();
// HyperLogLog already provides privacy through approximation
return rows[0].approx_unique;
}
Consent Management Integration
// consent.ts
type ConsentCategory = 'necessary' | 'analytics' | 'marketing';
interface ConsentState {
necessary: boolean; // Always true
analytics: boolean;
marketing: boolean;
timestamp: number;
version: string;
}
const CONSENT_VERSION = '1.0.0';
const CONSENT_KEY = 'privacy_consent';
export function getConsent(): ConsentState | null {
const stored = localStorage.getItem(CONSENT_KEY);
if (!stored) return null;
try {
return JSON.parse(stored);
} catch {
return null;
}
}
export function setConsent(
categories: Partial<Record<ConsentCategory, boolean>>
): ConsentState {
const consent: ConsentState = {
necessary: true, // Always required
analytics: categories.analytics ?? false,
marketing: categories.marketing ?? false,
timestamp: Date.now(),
version: CONSENT_VERSION,
};
localStorage.setItem(CONSENT_KEY, JSON.stringify(consent));
// Emit event for analytics SDK to react
window.dispatchEvent(
new CustomEvent('consent-updated', { detail: consent })
);
return consent;
}
/**
* Check if analytics is allowed
*/
export function canTrack(): boolean {
const consent = getConsent();
// No consent recorded = no tracking (GDPR compliant default)
if (!consent) return false;
return consent.analytics === true;
}
/**
* React hook for consent state
*/
export function useConsent() {
const [consent, setConsentState] = useState<ConsentState | null>(
getConsent()
);
useEffect(() => {
const handler = (e: CustomEvent<ConsentState>) => {
setConsentState(e.detail);
};
window.addEventListener(
'consent-updated',
handler as EventListener
);
return () => {
window.removeEventListener(
'consent-updated',
handler as EventListener
);
};
}, []);
const updateConsent = useCallback(
(categories: Partial<Record<ConsentCategory, boolean>>) => {
const newConsent = setConsent(categories);
setConsentState(newConsent);
},
[]
);
return { consent, updateConsent, canTrack: consent?.analytics ?? false };
}
/**
* Consent-aware analytics wrapper
*/
export class ConsentAwareAnalytics {
private analytics: Analytics;
private queuedEvents: Array<{ name: string; properties: Record<string, unknown> }> = [];
constructor(analytics: Analytics) {
this.analytics = analytics;
// Listen for consent changes
window.addEventListener('consent-updated', ((e: CustomEvent<ConsentState>) => {
if (e.detail.analytics) {
this.flushQueue();
} else {
this.queuedEvents = []; // Clear queue if consent withdrawn
}
}) as EventListener);
}
track(name: string, properties: Record<string, unknown> = {}): void {
if (canTrack()) {
this.analytics.track(name, properties);
} else {
// Queue for potential future consent
this.queuedEvents.push({ name, properties });
}
}
private flushQueue(): void {
for (const event of this.queuedEvents) {
this.analytics.track(event.name, event.properties);
}
this.queuedEvents = [];
}
}
Tinybird Integration
Serverless analytics with Tinybird for simpler ops:
// tinybird.ts
const TINYBIRD_HOST = 'https://api.tinybird.co';
const TINYBIRD_TOKEN = process.env.TINYBIRD_TOKEN!;
interface TinybirdQuery {
pipe: string;
params?: Record<string, string | number>;
}
export async function queryTinybird<T>(query: TinybirdQuery): Promise<T[]> {
const url = new URL(`${TINYBIRD_HOST}/v0/pipes/${query.pipe}.json`);
if (query.params) {
for (const [key, value] of Object.entries(query.params)) {
url.searchParams.append(key, String(value));
}
}
const response = await fetch(url.toString(), {
headers: {
Authorization: `Bearer ${TINYBIRD_TOKEN}`,
},
});
if (!response.ok) {
throw new Error(`Tinybird query failed: ${response.status}`);
}
const result = await response.json();
return result.data;
}
// Example: Tinybird pipe definition (page_views.pipe)
/*
DESCRIPTION >
Page views aggregation with privacy controls
NODE page_views_aggregated
SQL >
SELECT
toDate(timestamp) as date,
page_path,
country,
device_type,
count() as views,
uniqExact(session_id) as unique_sessions
FROM events
WHERE timestamp >= now() - INTERVAL 90 DAY
AND is_bot = 0
GROUP BY date, page_path, country, device_type
HAVING unique_sessions >= 5 -- k-anonymity
NODE endpoint
SQL >
SELECT *
FROM page_views_aggregated
WHERE date >= {{Date(start_date, '2024-01-01')}}
AND date <= {{Date(end_date, '2024-12-31')}}
{% if defined(country) %}
AND country = {{String(country, '')}}
{% end %}
ORDER BY views DESC
LIMIT {{Int32(limit, 100)}}
*/
// Usage
async function getPageViews(options: {
startDate: string;
endDate: string;
country?: string;
limit?: number;
}) {
return queryTinybird<{
date: string;
page_path: string;
country: string;
device_type: string;
views: number;
unique_sessions: number;
}>({
pipe: 'page_views',
params: {
start_date: options.startDate,
end_date: options.endDate,
...(options.country && { country: options.country }),
...(options.limit && { limit: options.limit }),
},
});
}
Real-Time Dashboard
Server-sent events for live metrics:
// realtime-api.ts
import { clickhouse } from './clickhouse';
export async function handleRealtimeStream(
request: Request
): Promise<Response> {
const encoder = new TextEncoder();
const stream = new ReadableStream({
async start(controller) {
const sendEvent = (data: object) => {
controller.enqueue(
encoder.encode(`data: ${JSON.stringify(data)}\n\n`)
);
};
// Send initial data
const initial = await getRealtimeMetrics();
sendEvent({ type: 'initial', data: initial });
// Poll for updates every 5 seconds
const interval = setInterval(async () => {
try {
const metrics = await getRealtimeMetrics();
sendEvent({ type: 'update', data: metrics });
} catch (error) {
sendEvent({ type: 'error', message: 'Failed to fetch metrics' });
}
}, 5000);
// Cleanup on close
request.signal.addEventListener('abort', () => {
clearInterval(interval);
controller.close();
});
},
});
return new Response(stream, {
headers: {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache',
Connection: 'keep-alive',
},
});
}
async function getRealtimeMetrics() {
const result = await clickhouse.query({
query: `
SELECT
count() as events_last_minute,
uniqExact(session_id) as active_sessions,
countIf(name = 'page_view') as page_views_last_minute,
topK(5)(page_path) as top_pages
FROM events
WHERE timestamp >= now() - INTERVAL 1 MINUTE
AND is_bot = 0
`,
format: 'JSONEachRow',
});
const rows = await result.json();
return rows[0];
}
// Client-side hook
function useRealtimeMetrics() {
const [metrics, setMetrics] = useState<RealtimeMetrics | null>(null);
const [error, setError] = useState<string | null>(null);
useEffect(() => {
const eventSource = new EventSource('/api/analytics/realtime');
eventSource.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.type === 'error') {
setError(data.message);
} else {
setMetrics(data.data);
setError(null);
}
};
eventSource.onerror = () => {
setError('Connection lost');
};
return () => {
eventSource.close();
};
}, []);
return { metrics, error };
}
Data Retention and Deletion
GDPR-compliant data lifecycle:
// data-lifecycle.ts
/**
* Delete user data on request (Right to Erasure)
*/
export async function deleteUserData(
hashedUserId: string
): Promise<{ deletedRows: number }> {
// Use ALTER TABLE ... DELETE for ClickHouse
const result = await clickhouse.query({
query: `
ALTER TABLE events
DELETE WHERE user_id = {userId:String}
`,
query_params: {
userId: hashedUserId,
},
});
// Log deletion for audit
await logDeletionRequest(hashedUserId);
return { deletedRows: result.affectedRows };
}
/**
* Export user data (Right to Access)
*/
export async function exportUserData(
hashedUserId: string
): Promise<UserDataExport> {
const events = await clickhouse.query({
query: `
SELECT
name,
timestamp,
page_url,
properties
FROM events
WHERE user_id = {userId:String}
ORDER BY timestamp DESC
`,
query_params: {
userId: hashedUserId,
},
format: 'JSONEachRow',
});
return {
userId: hashedUserId,
exportedAt: new Date().toISOString(),
events: await events.json(),
};
}
/**
* Automated data retention cleanup
*/
export async function enforceRetentionPolicy(
retentionDays: number = 90
): Promise<void> {
// ClickHouse TTL handles this automatically, but we can force cleanup
await clickhouse.query({
query: `
OPTIMIZE TABLE events FINAL
`,
});
// Log cleanup for compliance
await logRetentionCleanup(retentionDays);
}
/**
* Scheduled job for retention
*/
// Cron: 0 3 * * * (daily at 3 AM)
export async function retentionJob(): Promise<void> {
console.log('Starting retention policy enforcement...');
const tables = ['events', 'sessions_daily', 'page_views_hourly'];
for (const table of tables) {
await clickhouse.query({
query: `OPTIMIZE TABLE ${table} FINAL`,
});
}
console.log('Retention policy enforced.');
}
Privacy Compliance Checklist
┌─────────────────────────────────────────────────────────────────┐
│ Privacy Compliance Matrix │
├─────────────────────────────────────────────────────────────────┤
│ │
│ GDPR Requirements Status │
│ ├── Lawful basis documented [ ] │
│ ├── Consent mechanism implemented [ ] │
│ ├── Privacy policy updated [ ] │
│ ├── Data minimization applied [ ] │
│ ├── Purpose limitation enforced [ ] │
│ ├── Storage limitation (TTL) [ ] │
│ ├── Right to access implemented [ ] │
│ ├── Right to erasure implemented [ ] │
│ ├── Data portability supported [ ] │
│ └── DPA with sub-processors [ ] │
│ │
│ Technical Measures Status │
│ ├── IP anonymization [ ] │
│ ├── User ID hashing [ ] │
│ ├── PII stripping from URLs [ ] │
│ ├── PII detection in properties [ ] │
│ ├── K-anonymity (k >= 5) [ ] │
│ ├── Differential privacy (optional) [ ] │
│ ├── Data encryption at rest [ ] │
│ ├── Data encryption in transit [ ] │
│ ├── Audit logging [ ] │
│ └── Automated data deletion [ ] │
│ │
│ ePrivacy / Cookie Compliance Status │
│ ├── No cookies for analytics (optional) [ ] │
│ ├── First-party only if cookies used [ ] │
│ ├── Cookie banner implemented [ ] │
│ ├── Consent before tracking [ ] │
│ └── Easy consent withdrawal [ ] │
│ │
│ CCPA Requirements Status │
│ ├── "Do Not Sell" respected [ ] │
│ ├── Opt-out mechanism [ ] │
│ ├── Data disclosure on request [ ] │
│ └── No discrimination for opt-out [ ] │
│ │
└─────────────────────────────────────────────────────────────────┘
Cost Analysis
First-party analytics costs at scale:
┌─────────────────────────────────────────────────────────────────┐
│ Monthly Cost Comparison (1M events/day) │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Third-Party (Amplitude/Mixpanel) │
│ └── Growth plan: $2,000-5,000/month │
│ │
│ First-Party Stack │
│ ├── ClickHouse Cloud (basic) $200-400/month │
│ ├── Edge functions (Cloudflare) $50-100/month │
│ ├── Storage (S3/R2) $20-50/month │
│ └── Total: $270-550/month │
│ │
│ Tinybird Stack │
│ ├── Tinybird Pro $400-800/month │
│ ├── Edge functions $50-100/month │
│ └── Total: $450-900/month │
│ │
│ Self-Hosted ClickHouse │
│ ├── 3-node cluster (VMs) $300-600/month │
│ ├── Engineering time Variable │
│ └── Total: $300-600 + eng time │
│ │
│ Break-even typically at 500K-1M events/day │
│ Additional value: No vendor lock-in, full data control │
│ │
└─────────────────────────────────────────────────────────────────┘
Summary
Building privacy-respecting analytics requires:
- First-party collection — Events go to your servers, your domain
- Edge processing — Anonymize and enrich before storage
- OLAP storage — ClickHouse or Tinybird for fast aggregations
- Aggregation-only queries — Never expose individual user data
- K-anonymity — Don't report data below threshold
- Differential privacy — Add noise for sensitive queries
- Data lifecycle — TTL, right to erasure, data portability
- Consent integration — Track only with explicit permission
The result: analytics that answer product questions without becoming a liability or betraying user trust. Your users' data belongs on your servers, governed by your ethics, deleted when they ask.
Third-party analytics vendors built their business model on your users' data. You don't have to participate.
What did you think?