Designing an Analytics Pipeline for Frontend Events Without Selling Your Users Out
Designing an Analytics Pipeline for Frontend Events Without Selling Your Users Out
Most analytics setups are a privacy nightmare disguised as business intelligence. You drop a Google Analytics snippet, maybe Mixpanel for product analytics, Hotjar for session recording, and suddenly you've handed your users' behavioral data to three different companies who will cross-reference it, build advertising profiles, and sell insights to your competitors.
This guide covers building a first-party analytics pipeline that gives you better data, faster queries, complete data ownership, and doesn't require a cookie banner in most jurisdictions.
The Problem with Third-Party Analytics
┌─────────────────────────────────────────────────────────────────────┐
│ THIRD-PARTY ANALYTICS FLOW │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Your App │
│ ┌──────────┐ │
│ │ User │──────┬──────────────────────────────────────────┐ │
│ │ Browser │ │ │ │
│ └──────────┘ │ │ │
│ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────┐ ┌──────────────┐ │
│ │ Google │ │ Mixpanel │ │ Hotjar │ │
│ │ Analytics │ │ │ │ │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬───────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Cross-Site Tracking / Ad Networks │ │
│ │ (Your data is now their product) │ │
│ └─────────────────────────────────────────────────┘ │
│ │
│ Issues: │
│ • 45KB+ JavaScript payload per vendor │
│ • Blocked by ~40% of users (ad blockers) │
│ • GDPR requires consent banners │
│ • Data sampled/aggregated (you don't own raw events) │
│ • Cross-site tracking enables competitor intelligence │
│ │
└─────────────────────────────────────────────────────────────────────┘
What You Lose with GA4
- Raw event access - GA4 samples data and limits exports
- Real-time queries - 24-48 hour processing delay for many reports
- User privacy - Google uses your data for advertising
- Ad-blocker resistance - ~40% of technical users block GA
- Query flexibility - Limited to GA's predefined dimensions
First-Party Analytics Architecture
┌─────────────────────────────────────────────────────────────────────┐
│ FIRST-PARTY ANALYTICS FLOW │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌─────────────────┐ ┌───────────────────┐ │
│ │ User │────▶│ Your Edge │────▶│ Your Database │ │
│ │ Browser │ │ (Same Domain) │ │ (ClickHouse) │ │
│ └──────────┘ └─────────────────┘ └───────────────────┘ │
│ │ │ │ │
│ │ │ ▼ │
│ │ ┌───────┴───────┐ ┌─────────────┐ │
│ │ │ Privacy Layer │ │ Dashboard │ │
│ │ │ • IP Hashing │ │ (Grafana) │ │
│ │ │ • No cookies │ └─────────────┘ │
│ │ │ • Aggregation │ │
│ │ └───────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Lightweight Tracker (<2KB gzipped) │ │
│ │ • No external requests │ │
│ │ • Beacon API for reliability │ │
│ │ • Automatic page view tracking │ │
│ │ • Custom event API │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ Benefits: │
│ • Not blocked by ad blockers (same domain) │
│ • No cookie consent required (no PII stored) │
│ • Sub-second query latency │
│ • Complete data ownership │
│ • 100% of events captured │
│ │
└─────────────────────────────────────────────────────────────────────┘
Building a Lightweight Tracker
The tracker needs to be tiny, reliable, and privacy-respecting. Here's a production implementation:
Core Tracker (~1.5KB gzipped)
// lib/analytics/tracker.ts
interface AnalyticsEvent {
name: string;
properties?: Record<string, string | number | boolean>;
timestamp?: number;
}
interface TrackerConfig {
endpoint: string;
flushInterval?: number;
maxBatchSize?: number;
debug?: boolean;
}
interface PageContext {
url: string;
referrer: string;
title: string;
path: string;
search: string;
screenWidth: number;
screenHeight: number;
language: string;
timezone: string;
}
class AnalyticsTracker {
private config: Required<TrackerConfig>;
private queue: AnalyticsEvent[] = [];
private sessionId: string;
private flushTimer: number | null = null;
private isVisible = true;
constructor(config: TrackerConfig) {
this.config = {
flushInterval: 5000,
maxBatchSize: 10,
debug: false,
...config,
};
// Session ID persists for tab lifetime only (not across tabs)
this.sessionId = this.generateSessionId();
this.setupVisibilityTracking();
this.setupUnloadTracking();
this.startFlushTimer();
}
private generateSessionId(): string {
// Cryptographically random, no PII
const array = new Uint8Array(16);
crypto.getRandomValues(array);
return Array.from(array, b => b.toString(16).padStart(2, '0')).join('');
}
private getPageContext(): PageContext {
return {
url: window.location.href,
referrer: document.referrer,
title: document.title,
path: window.location.pathname,
search: window.location.search,
screenWidth: window.screen.width,
screenHeight: window.screen.height,
language: navigator.language,
timezone: Intl.DateTimeFormat().resolvedOptions().timeZone,
};
}
private setupVisibilityTracking(): void {
document.addEventListener('visibilitychange', () => {
this.isVisible = document.visibilityState === 'visible';
if (!this.isVisible) {
// Flush immediately when tab becomes hidden
this.flush();
}
});
}
private setupUnloadTracking(): void {
// Use pagehide for reliable unload tracking
window.addEventListener('pagehide', () => {
this.flush(true); // Force beacon
});
}
private startFlushTimer(): void {
if (this.flushTimer) return;
this.flushTimer = window.setInterval(() => {
if (this.queue.length > 0) {
this.flush();
}
}, this.config.flushInterval);
}
track(name: string, properties?: Record<string, string | number | boolean>): void {
const event: AnalyticsEvent = {
name,
properties,
timestamp: Date.now(),
};
this.queue.push(event);
if (this.config.debug) {
console.log('[Analytics]', event);
}
// Flush immediately if batch is full
if (this.queue.length >= this.config.maxBatchSize) {
this.flush();
}
}
pageview(additionalProps?: Record<string, string | number | boolean>): void {
this.track('pageview', {
...this.getPageContext(),
...additionalProps,
});
}
private flush(useBeacon = false): void {
if (this.queue.length === 0) return;
const payload = {
sessionId: this.sessionId,
events: [...this.queue],
sentAt: Date.now(),
};
this.queue = [];
const body = JSON.stringify(payload);
// Use Beacon API for reliability during page unload
if (useBeacon && navigator.sendBeacon) {
const blob = new Blob([body], { type: 'application/json' });
const success = navigator.sendBeacon(this.config.endpoint, blob);
if (!success) {
// Fallback to fetch if beacon fails
this.sendWithFetch(body);
}
} else {
this.sendWithFetch(body);
}
}
private async sendWithFetch(body: string): Promise<void> {
try {
await fetch(this.config.endpoint, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body,
keepalive: true, // Allows request to outlive page
});
} catch (error) {
if (this.config.debug) {
console.error('[Analytics] Failed to send:', error);
}
// Events are lost - acceptable tradeoff for simplicity
// Could implement localStorage retry queue if needed
}
}
// Explicit session boundary (e.g., after auth state change)
resetSession(): void {
this.sessionId = this.generateSessionId();
}
}
// Singleton export
let tracker: AnalyticsTracker | null = null;
export function initAnalytics(config: TrackerConfig): AnalyticsTracker {
if (tracker) return tracker;
tracker = new AnalyticsTracker(config);
return tracker;
}
export function track(
name: string,
properties?: Record<string, string | number | boolean>
): void {
tracker?.track(name, properties);
}
export function pageview(
properties?: Record<string, string | number | boolean>
): void {
tracker?.pageview(properties);
}
React Integration
// lib/analytics/react.tsx
import { useEffect, useRef } from 'react';
import { usePathname, useSearchParams } from 'next/navigation';
import { initAnalytics, pageview, track } from './tracker';
export function AnalyticsProvider({ children }: { children: React.ReactNode }) {
const pathname = usePathname();
const searchParams = useSearchParams();
const initialized = useRef(false);
useEffect(() => {
if (initialized.current) return;
initialized.current = true;
initAnalytics({
endpoint: '/api/analytics/collect',
debug: process.env.NODE_ENV === 'development',
});
// Initial pageview
pageview();
}, []);
// Track route changes
useEffect(() => {
pageview();
}, [pathname, searchParams]);
return <>{children}</>;
}
// Hook for tracking events in components
export function useTrack() {
return {
track,
trackClick: (element: string, properties?: Record<string, unknown>) => {
track('click', { element, ...properties });
},
trackForm: (formName: string, action: 'start' | 'submit' | 'error') => {
track('form', { formName, action });
},
trackTiming: (name: string, durationMs: number) => {
track('timing', { name, durationMs });
},
};
}
// Declarative click tracking
export function TrackClick({
name,
properties,
children,
}: {
name: string;
properties?: Record<string, unknown>;
children: React.ReactElement;
}) {
const { trackClick } = useTrack();
return (
<span
onClick={() => trackClick(name, properties)}
style={{ display: 'contents' }}
>
{children}
</span>
);
}
Auto-Tracking Web Vitals
// lib/analytics/vitals.ts
import { track } from './tracker';
interface WebVitalsMetric {
name: 'CLS' | 'FCP' | 'FID' | 'INP' | 'LCP' | 'TTFB';
value: number;
rating: 'good' | 'needs-improvement' | 'poor';
}
export function trackWebVitals(): void {
if (typeof window === 'undefined') return;
// Use web-vitals library or native PerformanceObserver
observeLCP();
observeFID();
observeCLS();
observeINP();
}
function observeLCP(): void {
const observer = new PerformanceObserver((list) => {
const entries = list.getEntries();
const lastEntry = entries[entries.length - 1] as PerformanceEntry & {
renderTime?: number;
loadTime?: number;
};
const value = lastEntry.renderTime || lastEntry.loadTime || 0;
track('web_vital', {
name: 'LCP',
value: Math.round(value),
rating: value <= 2500 ? 'good' : value <= 4000 ? 'needs-improvement' : 'poor',
});
});
observer.observe({ type: 'largest-contentful-paint', buffered: true });
}
function observeFID(): void {
const observer = new PerformanceObserver((list) => {
const entry = list.getEntries()[0] as PerformanceEventTiming;
const value = entry.processingStart - entry.startTime;
track('web_vital', {
name: 'FID',
value: Math.round(value),
rating: value <= 100 ? 'good' : value <= 300 ? 'needs-improvement' : 'poor',
});
});
observer.observe({ type: 'first-input', buffered: true });
}
function observeCLS(): void {
let clsValue = 0;
let sessionEntries: PerformanceEntry[] = [];
const observer = new PerformanceObserver((list) => {
for (const entry of list.getEntries() as (PerformanceEntry & {
hadRecentInput?: boolean;
value?: number;
})[]) {
if (!entry.hadRecentInput) {
sessionEntries.push(entry);
clsValue += entry.value || 0;
}
}
});
observer.observe({ type: 'layout-shift', buffered: true });
// Report on visibility change
document.addEventListener('visibilitychange', () => {
if (document.visibilityState === 'hidden' && clsValue > 0) {
track('web_vital', {
name: 'CLS',
value: Math.round(clsValue * 1000) / 1000,
rating: clsValue <= 0.1 ? 'good' : clsValue <= 0.25 ? 'needs-improvement' : 'poor',
});
}
});
}
function observeINP(): void {
let maxINP = 0;
const observer = new PerformanceObserver((list) => {
for (const entry of list.getEntries() as PerformanceEventTiming[]) {
const duration = entry.processingEnd - entry.processingStart;
if (duration > maxINP) {
maxINP = duration;
}
}
});
observer.observe({ type: 'event', buffered: true });
document.addEventListener('visibilitychange', () => {
if (document.visibilityState === 'hidden' && maxINP > 0) {
track('web_vital', {
name: 'INP',
value: Math.round(maxINP),
rating: maxINP <= 200 ? 'good' : maxINP <= 500 ? 'needs-improvement' : 'poor',
});
}
});
}
Edge Collection Endpoint
The collection endpoint runs at the edge for minimal latency and handles privacy transformations before data reaches storage.
Next.js Edge API Route
// app/api/analytics/collect/route.ts
import { NextRequest, NextResponse } from 'next/server';
import { createHash } from 'crypto';
export const runtime = 'edge';
interface IncomingEvent {
name: string;
properties?: Record<string, unknown>;
timestamp: number;
}
interface IncomingPayload {
sessionId: string;
events: IncomingEvent[];
sentAt: number;
}
interface ProcessedEvent {
session_id: string;
visitor_id: string;
event_name: string;
event_properties: string; // JSON string
timestamp: string;
received_at: string;
// Extracted dimensions for fast querying
path: string;
referrer_domain: string;
country: string;
device_type: string;
browser: string;
os: string;
}
// Privacy-preserving visitor ID (daily rotating, no PII)
function generateVisitorId(request: NextRequest, sessionId: string): string {
const ip = request.headers.get('x-forwarded-for')?.split(',')[0] || 'unknown';
const userAgent = request.headers.get('user-agent') || 'unknown';
const date = new Date().toISOString().split('T')[0]; // Daily rotation
// Hash combination - cannot be reversed to identify user
const input = `${ip}:${userAgent}:${date}:${process.env.ANALYTICS_SALT}`;
return createHash('sha256').update(input).digest('hex').slice(0, 16);
}
function extractReferrerDomain(referrer: string): string {
if (!referrer) return 'direct';
try {
return new URL(referrer).hostname;
} catch {
return 'invalid';
}
}
function parseUserAgent(ua: string): { deviceType: string; browser: string; os: string } {
// Simplified UA parsing - use ua-parser-js for production
const deviceType = /Mobile|Android|iPhone|iPad/.test(ua)
? (/iPad|Tablet/.test(ua) ? 'tablet' : 'mobile')
: 'desktop';
let browser = 'unknown';
if (ua.includes('Firefox')) browser = 'firefox';
else if (ua.includes('Chrome')) browser = 'chrome';
else if (ua.includes('Safari')) browser = 'safari';
else if (ua.includes('Edge')) browser = 'edge';
let os = 'unknown';
if (ua.includes('Windows')) os = 'windows';
else if (ua.includes('Mac')) os = 'macos';
else if (ua.includes('Linux')) os = 'linux';
else if (ua.includes('Android')) os = 'android';
else if (ua.includes('iOS') || ua.includes('iPhone')) os = 'ios';
return { deviceType, browser, os };
}
function getCountryFromRequest(request: NextRequest): string {
// Cloudflare, Vercel, and other edge providers set this header
return request.headers.get('x-vercel-ip-country')
|| request.headers.get('cf-ipcountry')
|| 'unknown';
}
export async function POST(request: NextRequest) {
try {
const payload: IncomingPayload = await request.json();
// Basic validation
if (!payload.sessionId || !Array.isArray(payload.events)) {
return NextResponse.json({ error: 'Invalid payload' }, { status: 400 });
}
if (payload.events.length > 100) {
return NextResponse.json({ error: 'Too many events' }, { status: 400 });
}
const visitorId = generateVisitorId(request, payload.sessionId);
const userAgent = request.headers.get('user-agent') || '';
const { deviceType, browser, os } = parseUserAgent(userAgent);
const country = getCountryFromRequest(request);
const receivedAt = new Date().toISOString();
const processedEvents: ProcessedEvent[] = payload.events.map(event => {
const properties = event.properties || {};
return {
session_id: payload.sessionId,
visitor_id: visitorId,
event_name: event.name,
event_properties: JSON.stringify(properties),
timestamp: new Date(event.timestamp).toISOString(),
received_at: receivedAt,
// Extract common dimensions
path: String(properties.path || ''),
referrer_domain: extractReferrerDomain(String(properties.referrer || '')),
country,
device_type: deviceType,
browser,
os,
};
});
// Send to ClickHouse or buffer
await sendToClickHouse(processedEvents);
return NextResponse.json({ success: true });
} catch (error) {
console.error('Analytics collection error:', error);
return NextResponse.json({ error: 'Internal error' }, { status: 500 });
}
}
async function sendToClickHouse(events: ProcessedEvent[]): Promise<void> {
// Option 1: Direct insert (simple, higher latency per request)
// Option 2: Buffer to Kafka/Redis and batch insert (complex, lower latency)
// Option 3: Use Tinybird for managed ingestion (easiest)
const response = await fetch(process.env.CLICKHOUSE_URL!, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Basic ${Buffer.from(
`${process.env.CLICKHOUSE_USER}:${process.env.CLICKHOUSE_PASSWORD}`
).toString('base64')}`,
},
body: events.map(e => JSON.stringify(e)).join('\n'),
});
if (!response.ok) {
throw new Error(`ClickHouse insert failed: ${response.status}`);
}
}
ClickHouse Schema Design
ClickHouse is purpose-built for analytics workloads with columnar storage and vectorized execution.
Table Schema
-- Main events table with optimized schema
CREATE TABLE analytics.events
(
-- Identifiers (not PII)
session_id String,
visitor_id String, -- Daily rotating hash
-- Event data
event_name LowCardinality(String),
event_properties String, -- JSON for flexibility
-- Timestamps
timestamp DateTime64(3),
received_at DateTime64(3),
-- Pre-extracted dimensions for fast filtering
path String,
referrer_domain LowCardinality(String),
country LowCardinality(String),
device_type LowCardinality(String),
browser LowCardinality(String),
os LowCardinality(String),
-- Materialized columns for common extractions
date Date MATERIALIZED toDate(timestamp),
hour UInt8 MATERIALIZED toHour(timestamp)
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (event_name, date, visitor_id, timestamp)
TTL timestamp + INTERVAL 2 YEAR DELETE
SETTINGS index_granularity = 8192;
-- Projection for time-series queries (pageviews over time)
ALTER TABLE analytics.events
ADD PROJECTION pageviews_by_day
(
SELECT
date,
path,
count() as pageviews,
uniq(visitor_id) as unique_visitors
GROUP BY date, path
);
-- Projection for referrer analysis
ALTER TABLE analytics.events
ADD PROJECTION referrer_stats
(
SELECT
date,
referrer_domain,
count() as visits,
uniq(visitor_id) as unique_visitors
GROUP BY date, referrer_domain
);
-- Materialized view for real-time dashboard
CREATE MATERIALIZED VIEW analytics.events_realtime
ENGINE = SummingMergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY (event_name, path, timestamp)
AS SELECT
toStartOfMinute(timestamp) as timestamp,
event_name,
path,
count() as event_count,
uniq(visitor_id) as unique_visitors
FROM analytics.events
GROUP BY timestamp, event_name, path;
Common Queries
-- Pageviews for last 7 days
SELECT
date,
count() as pageviews,
uniq(visitor_id) as unique_visitors,
uniq(session_id) as sessions
FROM analytics.events
WHERE event_name = 'pageview'
AND date >= today() - 7
GROUP BY date
ORDER BY date;
-- Top pages
SELECT
path,
count() as pageviews,
uniq(visitor_id) as unique_visitors,
avg(JSONExtractFloat(event_properties, 'scrollDepth')) as avg_scroll_depth
FROM analytics.events
WHERE event_name = 'pageview'
AND date >= today() - 30
GROUP BY path
ORDER BY pageviews DESC
LIMIT 20;
-- Funnel analysis
SELECT
countIf(event_name = 'pageview' AND path = '/pricing') as viewed_pricing,
countIf(event_name = 'click' AND JSONExtractString(event_properties, 'element') = 'start_trial') as clicked_trial,
countIf(event_name = 'form' AND JSONExtractString(event_properties, 'formName') = 'signup') as started_signup,
countIf(event_name = 'signup_complete') as completed_signup
FROM analytics.events
WHERE date >= today() - 30;
-- Session duration calculation
SELECT
session_id,
min(timestamp) as session_start,
max(timestamp) as session_end,
dateDiff('second', min(timestamp), max(timestamp)) as duration_seconds,
count() as event_count
FROM analytics.events
WHERE date = today()
GROUP BY session_id
HAVING event_count > 1
ORDER BY duration_seconds DESC
LIMIT 100;
-- Geographic distribution
SELECT
country,
count() as pageviews,
uniq(visitor_id) as unique_visitors
FROM analytics.events
WHERE event_name = 'pageview'
AND date >= today() - 30
GROUP BY country
ORDER BY unique_visitors DESC;
-- Web Vitals percentiles
SELECT
JSONExtractString(event_properties, 'name') as metric,
quantile(0.5)(JSONExtractFloat(event_properties, 'value')) as p50,
quantile(0.75)(JSONExtractFloat(event_properties, 'value')) as p75,
quantile(0.95)(JSONExtractFloat(event_properties, 'value')) as p95,
quantile(0.99)(JSONExtractFloat(event_properties, 'value')) as p99
FROM analytics.events
WHERE event_name = 'web_vital'
AND date >= today() - 7
GROUP BY metric;
Tinybird Alternative (Managed ClickHouse)
Tinybird provides managed ClickHouse with a simpler API, built-in endpoints, and automatic scaling.
┌─────────────────────────────────────────────────────────────────────┐
│ TINYBIRD ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌─────────────────┐ ┌───────────────────┐ │
│ │ Edge │────▶│ Tinybird │────▶│ Pre-built │ │
│ │ Worker │ │ Events API │ │ Endpoints │ │
│ └──────────┘ └─────────────────┘ └───────────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌───────────────┐ ┌─────────────┐ │
│ │ Data Source │ │ Dashboard │ │
│ │ (Auto-schema)│ │ (JSON API) │ │
│ └───────────────┘ └─────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────┐ │
│ │ Materialized │ │
│ │ Views (Pipes)│ │
│ └───────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
Tinybird Data Source
// datasources/events.datasource
SCHEMA >
session_id String,
visitor_id String,
event_name String,
event_properties JSON,
timestamp DateTime64(3),
received_at DateTime64(3),
path String,
referrer_domain String,
country String,
device_type String,
browser String,
os String
ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(timestamp)"
ENGINE_SORTING_KEY "event_name, toDate(timestamp), visitor_id, timestamp"
ENGINE_TTL "timestamp + INTERVAL 2 YEAR"
Tinybird Endpoints (Pipes)
-- pipes/pageviews_over_time.pipe
NODE daily_pageviews
SQL >
SELECT
toDate(timestamp) as date,
count() as pageviews,
uniq(visitor_id) as unique_visitors,
uniq(session_id) as sessions
FROM events
WHERE event_name = 'pageview'
AND timestamp >= now() - INTERVAL {{Int32(days, 7)}} DAY
GROUP BY date
ORDER BY date
NODE endpoint
SQL >
SELECT * FROM daily_pageviews
-- pipes/top_pages.pipe
NODE top_pages
SQL >
SELECT
path,
count() as pageviews,
uniq(visitor_id) as unique_visitors
FROM events
WHERE event_name = 'pageview'
AND timestamp >= now() - INTERVAL {{Int32(days, 30)}} DAY
GROUP BY path
ORDER BY pageviews DESC
LIMIT {{Int32(limit, 20)}}
NODE endpoint
SQL >
SELECT * FROM top_pages
-- pipes/realtime_visitors.pipe
NODE active_visitors
SQL >
SELECT
uniq(visitor_id) as active_visitors,
count() as events_last_5min
FROM events
WHERE timestamp >= now() - INTERVAL 5 MINUTE
NODE endpoint
SQL >
SELECT * FROM active_visitors
Tinybird Client
// lib/analytics/tinybird.ts
const TINYBIRD_TOKEN = process.env.TINYBIRD_TOKEN!;
const TINYBIRD_HOST = 'https://api.tinybird.co';
export async function ingestEvents(events: ProcessedEvent[]): Promise<void> {
const response = await fetch(`${TINYBIRD_HOST}/v0/events?name=events`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${TINYBIRD_TOKEN}`,
'Content-Type': 'application/json',
},
body: events.map(e => JSON.stringify(e)).join('\n'),
});
if (!response.ok) {
throw new Error(`Tinybird ingest failed: ${await response.text()}`);
}
}
export async function queryEndpoint<T>(
pipe: string,
params: Record<string, string | number> = {}
): Promise<T[]> {
const searchParams = new URLSearchParams();
for (const [key, value] of Object.entries(params)) {
searchParams.set(key, String(value));
}
const response = await fetch(
`${TINYBIRD_HOST}/v0/pipes/${pipe}.json?${searchParams}`,
{
headers: {
'Authorization': `Bearer ${TINYBIRD_TOKEN}`,
},
}
);
if (!response.ok) {
throw new Error(`Tinybird query failed: ${await response.text()}`);
}
const data = await response.json();
return data.data;
}
// Usage
const pageviews = await queryEndpoint<{
date: string;
pageviews: number;
unique_visitors: number;
}>('pageviews_over_time', { days: 7 });
const realtimeVisitors = await queryEndpoint<{
active_visitors: number;
events_last_5min: number;
}>('realtime_visitors');
Privacy-Preserving Aggregation
True privacy goes beyond "we don't sell your data." It means collecting only what's necessary and making re-identification impossible.
Differential Privacy for Small Datasets
// lib/analytics/differential-privacy.ts
// Add Laplacian noise to protect small groups
function addLaplacianNoise(value: number, sensitivity: number, epsilon: number): number {
const scale = sensitivity / epsilon;
const u = Math.random() - 0.5;
const noise = -scale * Math.sign(u) * Math.log(1 - 2 * Math.abs(u));
return Math.round(value + noise);
}
// Apply k-anonymity threshold
function applyKAnonymity<T extends Record<string, unknown>>(
data: T[],
countField: keyof T,
threshold: number = 5
): T[] {
return data.filter(row => (row[countField] as number) >= threshold);
}
// Privacy-safe aggregation pipeline
export async function getPrivacyPreservingStats(
clickhouse: ClickHouseClient
): Promise<PrivateStats> {
const rawData = await clickhouse.query(`
SELECT
country,
count() as raw_count,
uniq(visitor_id) as raw_visitors
FROM analytics.events
WHERE date >= today() - 30
GROUP BY country
`);
// Apply k-anonymity (suppress small groups)
const filtered = applyKAnonymity(rawData, 'raw_visitors', 5);
// Add differential privacy noise (epsilon = 1.0 for moderate privacy)
const privatized = filtered.map(row => ({
country: row.country,
count: addLaplacianNoise(row.raw_count, 1, 1.0),
visitors: addLaplacianNoise(row.raw_visitors, 1, 1.0),
}));
return {
countries: privatized,
suppressedCountries: rawData.length - filtered.length,
};
}
Session-Level Aggregation (No User Tracking)
// lib/analytics/session-aggregation.ts
// Instead of tracking users across sessions, aggregate at session level
interface SessionAggregate {
date: string;
total_sessions: number;
avg_duration_seconds: number;
avg_pages_per_session: number;
bounce_rate: number;
// Device breakdown (percentages, not counts)
device_distribution: {
desktop: number;
mobile: number;
tablet: number;
};
}
// This query cannot identify individual users
const SESSION_AGGREGATION_QUERY = `
WITH session_stats AS (
SELECT
toDate(min(timestamp)) as date,
session_id,
dateDiff('second', min(timestamp), max(timestamp)) as duration,
count() as page_count,
any(device_type) as device_type
FROM analytics.events
WHERE timestamp >= today() - 30
GROUP BY session_id
)
SELECT
date,
count() as total_sessions,
avg(duration) as avg_duration_seconds,
avg(page_count) as avg_pages_per_session,
countIf(page_count = 1) / count() as bounce_rate,
countIf(device_type = 'desktop') / count() as desktop_pct,
countIf(device_type = 'mobile') / count() as mobile_pct,
countIf(device_type = 'tablet') / count() as tablet_pct
FROM session_stats
GROUP BY date
ORDER BY date
`;
Building the Dashboard
A minimal dashboard using React and the Tinybird/ClickHouse endpoints.
Dashboard API Routes
// app/api/analytics/dashboard/route.ts
import { NextRequest, NextResponse } from 'next/server';
import { queryClickHouse } from '@/lib/clickhouse';
export async function GET(request: NextRequest) {
const { searchParams } = new URL(request.url);
const days = parseInt(searchParams.get('days') || '7', 10);
const [
overview,
pageviews,
topPages,
referrers,
devices,
webVitals,
] = await Promise.all([
getOverview(days),
getPageviewsOverTime(days),
getTopPages(days),
getTopReferrers(days),
getDeviceBreakdown(days),
getWebVitals(days),
]);
return NextResponse.json({
overview,
pageviews,
topPages,
referrers,
devices,
webVitals,
});
}
async function getOverview(days: number) {
return queryClickHouse(`
SELECT
count() as total_pageviews,
uniq(visitor_id) as unique_visitors,
uniq(session_id) as total_sessions,
count() / uniq(session_id) as pages_per_session
FROM analytics.events
WHERE event_name = 'pageview'
AND date >= today() - ${days}
`);
}
async function getPageviewsOverTime(days: number) {
return queryClickHouse(`
SELECT
toDate(timestamp) as date,
count() as pageviews,
uniq(visitor_id) as visitors
FROM analytics.events
WHERE event_name = 'pageview'
AND date >= today() - ${days}
GROUP BY date
ORDER BY date
`);
}
// ... other query functions
Dashboard Component
// app/dashboard/analytics/page.tsx
'use client';
import { useQuery } from '@tanstack/react-query';
import {
LineChart, Line, XAxis, YAxis, Tooltip, ResponsiveContainer,
BarChart, Bar, PieChart, Pie, Cell
} from 'recharts';
interface DashboardData {
overview: {
total_pageviews: number;
unique_visitors: number;
total_sessions: number;
pages_per_session: number;
};
pageviews: Array<{ date: string; pageviews: number; visitors: number }>;
topPages: Array<{ path: string; pageviews: number; unique_visitors: number }>;
referrers: Array<{ referrer_domain: string; visits: number }>;
devices: Array<{ device_type: string; percentage: number }>;
webVitals: Array<{ metric: string; p50: number; p75: number; p95: number }>;
}
export default function AnalyticsDashboard() {
const [days, setDays] = useState(7);
const { data, isLoading } = useQuery<DashboardData>({
queryKey: ['analytics', days],
queryFn: () => fetch(`/api/analytics/dashboard?days=${days}`).then(r => r.json()),
refetchInterval: 60000, // Refresh every minute
});
if (isLoading) return <DashboardSkeleton />;
if (!data) return <div>Failed to load analytics</div>;
return (
<div className="p-6 space-y-6">
{/* Period Selector */}
<div className="flex gap-2">
{[7, 30, 90].map(d => (
<button
key={d}
onClick={() => setDays(d)}
className={`px-3 py-1 rounded ${
days === d ? 'bg-blue-600 text-white' : 'bg-gray-100'
}`}
>
{d}d
</button>
))}
</div>
{/* Overview Cards */}
<div className="grid grid-cols-4 gap-4">
<MetricCard
label="Pageviews"
value={data.overview.total_pageviews.toLocaleString()}
/>
<MetricCard
label="Unique Visitors"
value={data.overview.unique_visitors.toLocaleString()}
/>
<MetricCard
label="Sessions"
value={data.overview.total_sessions.toLocaleString()}
/>
<MetricCard
label="Pages/Session"
value={data.overview.pages_per_session.toFixed(1)}
/>
</div>
{/* Pageviews Chart */}
<div className="bg-white rounded-lg p-4 shadow">
<h3 className="font-semibold mb-4">Pageviews Over Time</h3>
<ResponsiveContainer width="100%" height={300}>
<LineChart data={data.pageviews}>
<XAxis dataKey="date" />
<YAxis />
<Tooltip />
<Line
type="monotone"
dataKey="pageviews"
stroke="#3b82f6"
strokeWidth={2}
/>
<Line
type="monotone"
dataKey="visitors"
stroke="#10b981"
strokeWidth={2}
/>
</LineChart>
</ResponsiveContainer>
</div>
{/* Two-column layout */}
<div className="grid grid-cols-2 gap-4">
{/* Top Pages */}
<div className="bg-white rounded-lg p-4 shadow">
<h3 className="font-semibold mb-4">Top Pages</h3>
<table className="w-full">
<thead>
<tr className="text-left text-gray-500 text-sm">
<th>Page</th>
<th className="text-right">Views</th>
<th className="text-right">Visitors</th>
</tr>
</thead>
<tbody>
{data.topPages.map(page => (
<tr key={page.path} className="border-t">
<td className="py-2 truncate max-w-xs">{page.path}</td>
<td className="text-right">{page.pageviews.toLocaleString()}</td>
<td className="text-right">{page.unique_visitors.toLocaleString()}</td>
</tr>
))}
</tbody>
</table>
</div>
{/* Top Referrers */}
<div className="bg-white rounded-lg p-4 shadow">
<h3 className="font-semibold mb-4">Top Referrers</h3>
<ResponsiveContainer width="100%" height={250}>
<BarChart data={data.referrers} layout="vertical">
<XAxis type="number" />
<YAxis dataKey="referrer_domain" type="category" width={120} />
<Tooltip />
<Bar dataKey="visits" fill="#3b82f6" />
</BarChart>
</ResponsiveContainer>
</div>
</div>
{/* Web Vitals */}
<div className="bg-white rounded-lg p-4 shadow">
<h3 className="font-semibold mb-4">Core Web Vitals</h3>
<div className="grid grid-cols-4 gap-4">
{data.webVitals.map(vital => (
<VitalCard
key={vital.metric}
name={vital.metric}
p75={vital.p75}
/>
))}
</div>
</div>
</div>
);
}
function MetricCard({ label, value }: { label: string; value: string }) {
return (
<div className="bg-white rounded-lg p-4 shadow">
<div className="text-sm text-gray-500">{label}</div>
<div className="text-2xl font-bold">{value}</div>
</div>
);
}
function VitalCard({ name, p75 }: { name: string; p75: number }) {
const thresholds: Record<string, { good: number; poor: number }> = {
LCP: { good: 2500, poor: 4000 },
FID: { good: 100, poor: 300 },
CLS: { good: 0.1, poor: 0.25 },
INP: { good: 200, poor: 500 },
};
const threshold = thresholds[name];
const rating = !threshold
? 'unknown'
: p75 <= threshold.good
? 'good'
: p75 <= threshold.poor
? 'needs-improvement'
: 'poor';
const colors = {
good: 'text-green-600 bg-green-50',
'needs-improvement': 'text-yellow-600 bg-yellow-50',
poor: 'text-red-600 bg-red-50',
unknown: 'text-gray-600 bg-gray-50',
};
return (
<div className={`rounded-lg p-4 ${colors[rating]}`}>
<div className="text-sm font-medium">{name}</div>
<div className="text-xl font-bold">
{name === 'CLS' ? p75.toFixed(3) : `${Math.round(p75)}ms`}
</div>
<div className="text-xs">p75</div>
</div>
);
}
Handling High Scale
For applications with millions of events per day, you need buffering and batch inserts.
Event Buffering with Redis
// lib/analytics/buffer.ts
import { Redis } from '@upstash/redis';
const redis = new Redis({
url: process.env.UPSTASH_REDIS_URL!,
token: process.env.UPSTASH_REDIS_TOKEN!,
});
const BUFFER_KEY = 'analytics:buffer';
const BATCH_SIZE = 1000;
const MAX_BUFFER_AGE_MS = 10000; // 10 seconds
export async function bufferEvent(event: ProcessedEvent): Promise<void> {
await redis.rpush(BUFFER_KEY, JSON.stringify(event));
// Check if we should flush
const bufferSize = await redis.llen(BUFFER_KEY);
if (bufferSize >= BATCH_SIZE) {
await flushBuffer();
}
}
export async function flushBuffer(): Promise<number> {
// Atomic pop of batch
const events: string[] = [];
for (let i = 0; i < BATCH_SIZE; i++) {
const event = await redis.lpop(BUFFER_KEY);
if (!event) break;
events.push(event as string);
}
if (events.length === 0) return 0;
const parsed = events.map(e => JSON.parse(e) as ProcessedEvent);
await sendToClickHouse(parsed);
return events.length;
}
// Cron job to flush aged events (run every 10 seconds)
export async function flushAgedEvents(): Promise<void> {
// Always flush if there are events older than MAX_BUFFER_AGE
const bufferSize = await redis.llen(BUFFER_KEY);
if (bufferSize > 0) {
await flushBuffer();
}
}
Kafka for Enterprise Scale
// lib/analytics/kafka.ts
import { Kafka, Producer, Consumer, EachMessagePayload } from 'kafkajs';
const kafka = new Kafka({
clientId: 'analytics-pipeline',
brokers: process.env.KAFKA_BROKERS!.split(','),
ssl: true,
sasl: {
mechanism: 'scram-sha-256',
username: process.env.KAFKA_USERNAME!,
password: process.env.KAFKA_PASSWORD!,
},
});
// Producer (edge workers send here)
const producer: Producer = kafka.producer();
export async function produceEvent(event: ProcessedEvent): Promise<void> {
await producer.send({
topic: 'analytics-events',
messages: [
{
key: event.session_id,
value: JSON.stringify(event),
timestamp: String(Date.now()),
},
],
});
}
// Consumer (batch worker)
const consumer: Consumer = kafka.consumer({
groupId: 'analytics-clickhouse-writer'
});
export async function startConsumer(): Promise<void> {
await consumer.connect();
await consumer.subscribe({ topic: 'analytics-events', fromBeginning: false });
const batch: ProcessedEvent[] = [];
let lastFlush = Date.now();
await consumer.run({
eachMessage: async ({ message }: EachMessagePayload) => {
const event = JSON.parse(message.value!.toString()) as ProcessedEvent;
batch.push(event);
// Flush on batch size or time
if (batch.length >= 1000 || Date.now() - lastFlush > 5000) {
await sendToClickHouse([...batch]);
batch.length = 0;
lastFlush = Date.now();
}
},
});
}
Comparison: GA4 vs First-Party
┌─────────────────────────────────────────────────────────────────────┐
│ ANALYTICS COMPARISON │
├──────────────────────┬─────────────────────┬────────────────────────┤
│ Aspect │ Google Analytics 4 │ First-Party Pipeline │
├──────────────────────┼─────────────────────┼────────────────────────┤
│ Data Ownership │ Google owns it │ You own it completely │
├──────────────────────┼─────────────────────┼────────────────────────┤
│ Privacy │ Cross-site tracking │ No PII, daily rotation │
├──────────────────────┼─────────────────────┼────────────────────────┤
│ Cookie Consent │ Required (GDPR) │ Not required* │
├──────────────────────┼─────────────────────┼────────────────────────┤
│ Ad-blocker Bypass │ ~40% blocked │ 0% blocked │
├──────────────────────┼─────────────────────┼────────────────────────┤
│ Query Latency │ 24-48 hours │ Sub-second │
├──────────────────────┼─────────────────────┼────────────────────────┤
│ Raw Data Access │ Sampled, limited │ Complete, forever │
├──────────────────────┼─────────────────────┼────────────────────────┤
│ Custom Dimensions │ 25 event, 25 user │ Unlimited │
├──────────────────────┼─────────────────────┼────────────────────────┤
│ JavaScript Size │ 45KB+ (gtag.js) │ <2KB │
├──────────────────────┼─────────────────────┼────────────────────────┤
│ Setup Complexity │ Low │ Medium │
├──────────────────────┼─────────────────────┼────────────────────────┤
│ Monthly Cost (1M/mo) │ Free (with limits) │ ~$50-200 │
├──────────────────────┼─────────────────────┼────────────────────────┤
│ Monthly Cost (100M) │ GA360: $150K/year │ ~$500-2000 │
├──────────────────────┼─────────────────────┼────────────────────────┤
│ Cross-device Track │ Yes (invasive) │ No (by design) │
├──────────────────────┼─────────────────────┼────────────────────────┤
│ Real-time Dashboard │ Limited │ Full flexibility │
└──────────────────────┴─────────────────────┴────────────────────────┘
* When no PII is collected and no cross-site tracking occurs, most
privacy regulations don't require consent. Consult legal counsel.
Production Checklist
Tracker
- Bundle size under 2KB gzipped
- Beacon API for reliable page unload tracking
- Automatic pageview tracking on route change
- Session ID rotation on auth state change
- Web Vitals collection
- Error boundary integration
- Debug mode for development
Collection Endpoint
- Edge deployment for minimal latency
- IP hashing with daily salt rotation
- No cookies set
- Rate limiting per IP
- Payload validation and sanitization
- Batch insert buffering
Storage
- ClickHouse or Tinybird for OLAP queries
- Proper partitioning (monthly)
- TTL for data retention
- Projections for common query patterns
- Backup strategy
Privacy
- No PII in events
- Daily visitor ID rotation
- k-anonymity for small segments
- Data retention policy
- GDPR data export capability
- Documented privacy practices
Dashboard
- Real-time visitor count
- Pageviews over time
- Top pages and referrers
- Geographic distribution
- Device breakdown
- Web Vitals monitoring
- Funnel visualization
- Export capability
When to Still Use GA4
First-party analytics isn't always the right choice:
- You need cross-device tracking - If tracking users across devices is a business requirement, you need cookies and consent anyway
- Marketing attribution - GA4's integration with Google Ads is unmatched for paid acquisition analysis
- Zero engineering bandwidth - If you can't maintain infrastructure, GA4's free tier works
- You need benchmarks - GA4 provides industry benchmarks; first-party can't
The hybrid approach works well: first-party for product analytics (accurate, privacy-respecting), GA4 for marketing attribution (consent-gated, marketing team's domain).
Summary
Building a first-party analytics pipeline requires more upfront investment than dropping in a GA snippet, but the benefits compound:
- Better data - No ad-blocker sampling, no GA4 data limits
- Faster insights - Sub-second queries vs 24-48 hour delays
- Complete ownership - Your data stays yours
- User trust - No privacy-hostile tracking
- Cost efficiency - ClickHouse at scale is dramatically cheaper than GA360
The architecture presented here—lightweight tracker, edge collection with privacy transforms, ClickHouse storage, and custom dashboards—scales from zero to millions of events per day while keeping your users' trust intact.
What did you think?