System Design & Architecture
Part 0 of 9Designing for Failure, Not Against It
Designing for Failure, Not Against It
Circuit breakers, graceful degradation, and fallback UI patterns — how frontend architects need to think beyond the happy path
The Uncomfortable Truth
Your application will fail. Not might fail—will fail. Services go down. Networks flake. APIs return garbage. Third-party scripts hang. Databases hit capacity. CDNs have bad days.
The difference between applications that users trust and applications that users abandon isn't whether they fail—it's how they fail.
Most frontend code is written for the happy path: data loads, APIs respond, everything works. But production reality is messier. The best frontend architects I've worked with spend as much time designing for failure as they do designing for success.
This post is about shifting your mindset from "how do I prevent failure" to "how do I fail gracefully."
The Failure Spectrum
┌─────────────────────────────────────────────────────────────────────────────┐
│ FAILURE SEVERITY SPECTRUM │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Catastrophic Degraded Minor Hidden │
│ ──────────────────────────────────────────────────────────────────────── │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ │ │ │ │ │ │ │ │
│ │ White │ │ Feature │ │ Stale │ │ Silent │ │
│ │ Screen │ │ Broken │ │ Data │ │ Retry │ │
│ │ of Death │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ User impact: User impact: User impact: User impact: │
│ Cannot use app Cannot use Slight delay, None noticed │
│ at all specific feature data updates │
│ │
│ User reaction: User reaction: User reaction: User reaction: │
│ Leave, never Frustrated, Mildly annoyed, "Works great!" │
│ come back complain accepts it │
│ │
│ Goal: Move failures from left to right │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Failure Categories
┌─────────────────────────────────────────────────────────────────────────────┐
│ TYPES OF FRONTEND FAILURES │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Network Failures │
│ ├── Complete network loss │
│ ├── Intermittent connectivity │
│ ├── High latency (slow but connected) │
│ ├── DNS failures │
│ └── SSL/TLS errors │
│ │
│ API Failures │
│ ├── 5xx server errors │
│ ├── 4xx client errors (unexpected) │
│ ├── Timeout (no response) │
│ ├── Malformed response (invalid JSON) │
│ ├── Unexpected response shape │
│ └── Rate limiting (429) │
│ │
│ Third-Party Failures │
│ ├── Analytics script hanging │
│ ├── Chat widget not loading │
│ ├── CDN serving stale/wrong assets │
│ ├── Auth provider down │
│ └── Payment processor unavailable │
│ │
│ Client-Side Failures │
│ ├── JavaScript error (crash) │
│ ├── Memory exhaustion │
│ ├── Storage quota exceeded │
│ ├── Browser API not available │
│ └── State corruption │
│ │
│ Data Failures │
│ ├── Stale cache │
│ ├── Inconsistent data │
│ ├── Missing required fields │
│ ├── Type mismatches │
│ └── Referential integrity issues │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Error Boundaries: Your First Line of Defense
Error boundaries prevent the entire app from crashing when a component fails.
Basic Error Boundary
// components/error-boundary.tsx
'use client';
import React, { Component, ReactNode } from 'react';
interface Props {
children: ReactNode;
fallback?: ReactNode;
onError?: (error: Error, errorInfo: React.ErrorInfo) => void;
}
interface State {
hasError: boolean;
error: Error | null;
}
export class ErrorBoundary extends Component<Props, State> {
constructor(props: Props) {
super(props);
this.state = { hasError: false, error: null };
}
static getDerivedStateFromError(error: Error): State {
return { hasError: true, error };
}
componentDidCatch(error: Error, errorInfo: React.ErrorInfo) {
// Log to error tracking service
console.error('Error boundary caught:', error, errorInfo);
this.props.onError?.(error, errorInfo);
}
render() {
if (this.state.hasError) {
return this.props.fallback ?? (
<DefaultErrorFallback
error={this.state.error}
reset={() => this.setState({ hasError: false, error: null })}
/>
);
}
return this.props.children;
}
}
function DefaultErrorFallback({
error,
reset,
}: {
error: Error | null;
reset: () => void;
}) {
return (
<div className="p-6 rounded-lg bg-red-50 border border-red-200">
<h2 className="text-lg font-semibold text-red-800">Something went wrong</h2>
<p className="mt-2 text-red-600">
{error?.message || 'An unexpected error occurred'}
</p>
<button
onClick={reset}
className="mt-4 px-4 py-2 bg-red-600 text-white rounded hover:bg-red-700"
>
Try again
</button>
</div>
);
}
Strategic Error Boundary Placement
// Don't just wrap the whole app - be strategic
// app/layout.tsx
export default function RootLayout({ children }: { children: React.ReactNode }) {
return (
<html>
<body>
{/* Critical: Header should always work */}
<ErrorBoundary fallback={<MinimalHeader />}>
<Header />
</ErrorBoundary>
{/* Main content can fail independently */}
<ErrorBoundary fallback={<MainContentError />}>
{children}
</ErrorBoundary>
{/* Non-critical: Footer can fail silently */}
<ErrorBoundary fallback={null}>
<Footer />
</ErrorBoundary>
{/* Third-party widgets should never crash the app */}
<ErrorBoundary fallback={null}>
<ChatWidget />
</ErrorBoundary>
</body>
</html>
);
}
// Page-level boundaries
// app/dashboard/page.tsx
export default function DashboardPage() {
return (
<div className="grid grid-cols-3 gap-4">
{/* Each widget fails independently */}
<ErrorBoundary fallback={<WidgetError title="Revenue" />}>
<RevenueWidget />
</ErrorBoundary>
<ErrorBoundary fallback={<WidgetError title="Users" />}>
<UsersWidget />
</ErrorBoundary>
<ErrorBoundary fallback={<WidgetError title="Orders" />}>
<OrdersWidget />
</ErrorBoundary>
</div>
);
}
Next.js App Router Error Handling
// app/dashboard/error.tsx
'use client';
import { useEffect } from 'react';
export default function DashboardError({
error,
reset,
}: {
error: Error & { digest?: string };
reset: () => void;
}) {
useEffect(() => {
// Log to error tracking
reportError(error);
}, [error]);
return (
<div className="flex flex-col items-center justify-center min-h-[400px]">
<h2 className="text-xl font-semibold">Dashboard temporarily unavailable</h2>
<p className="mt-2 text-gray-600">
We're having trouble loading your dashboard.
</p>
<div className="mt-6 space-x-4">
<button
onClick={reset}
className="px-4 py-2 bg-blue-600 text-white rounded"
>
Try again
</button>
<a href="/" className="px-4 py-2 border rounded">
Go home
</a>
</div>
{process.env.NODE_ENV === 'development' && (
<pre className="mt-4 p-4 bg-gray-100 rounded text-sm overflow-auto max-w-full">
{error.message}
</pre>
)}
</div>
);
}
// app/dashboard/loading.tsx
// Provides instant feedback while data loads
export default function DashboardLoading() {
return (
<div className="grid grid-cols-3 gap-4">
<WidgetSkeleton />
<WidgetSkeleton />
<WidgetSkeleton />
</div>
);
}
Circuit Breakers: Stop the Bleeding
Circuit breakers prevent cascading failures by stopping requests to failing services.
Frontend Circuit Breaker
// lib/circuit-breaker.ts
type CircuitState = 'closed' | 'open' | 'half-open';
interface CircuitBreakerOptions {
failureThreshold: number; // Failures before opening
resetTimeout: number; // ms before trying again
monitorInterval: number; // Window for counting failures
halfOpenRequests: number; // Requests to test when half-open
}
interface CircuitBreakerState {
state: CircuitState;
failures: number;
lastFailure: number | null;
halfOpenSuccesses: number;
}
export class CircuitBreaker {
private state: CircuitBreakerState;
private options: CircuitBreakerOptions;
private stateChangeCallbacks: ((state: CircuitState) => void)[] = [];
constructor(options: Partial<CircuitBreakerOptions> = {}) {
this.options = {
failureThreshold: 5,
resetTimeout: 30000, // 30 seconds
monitorInterval: 60000, // 1 minute
halfOpenRequests: 3,
...options,
};
this.state = {
state: 'closed',
failures: 0,
lastFailure: null,
halfOpenSuccesses: 0,
};
}
async execute<T>(fn: () => Promise<T>, fallback?: () => T): Promise<T> {
// Check if circuit should transition
this.checkStateTransition();
if (this.state.state === 'open') {
if (fallback) {
return fallback();
}
throw new CircuitOpenError('Circuit breaker is open');
}
try {
const result = await fn();
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
if (fallback && this.state.state === 'open') {
return fallback();
}
throw error;
}
}
private checkStateTransition() {
const now = Date.now();
if (this.state.state === 'open') {
// Check if we should try again
if (
this.state.lastFailure &&
now - this.state.lastFailure > this.options.resetTimeout
) {
this.transitionTo('half-open');
}
}
// Reset failure count if outside monitoring window
if (
this.state.lastFailure &&
now - this.state.lastFailure > this.options.monitorInterval
) {
this.state.failures = 0;
}
}
private onSuccess() {
if (this.state.state === 'half-open') {
this.state.halfOpenSuccesses++;
if (this.state.halfOpenSuccesses >= this.options.halfOpenRequests) {
this.transitionTo('closed');
}
} else {
this.state.failures = 0;
}
}
private onFailure() {
this.state.failures++;
this.state.lastFailure = Date.now();
if (this.state.state === 'half-open') {
this.transitionTo('open');
} else if (this.state.failures >= this.options.failureThreshold) {
this.transitionTo('open');
}
}
private transitionTo(newState: CircuitState) {
if (this.state.state !== newState) {
this.state.state = newState;
this.state.halfOpenSuccesses = 0;
if (newState === 'closed') {
this.state.failures = 0;
}
this.stateChangeCallbacks.forEach(cb => cb(newState));
}
}
onStateChange(callback: (state: CircuitState) => void) {
this.stateChangeCallbacks.push(callback);
return () => {
this.stateChangeCallbacks = this.stateChangeCallbacks.filter(
cb => cb !== callback
);
};
}
getState(): CircuitState {
return this.state.state;
}
}
class CircuitOpenError extends Error {
constructor(message: string) {
super(message);
this.name = 'CircuitOpenError';
}
}
Circuit Breaker Per Service
// lib/api/circuits.ts
import { CircuitBreaker } from '../circuit-breaker';
// Create a circuit breaker for each external dependency
export const circuits = {
userService: new CircuitBreaker({
failureThreshold: 5,
resetTimeout: 30000,
}),
paymentService: new CircuitBreaker({
failureThreshold: 3, // More sensitive for payments
resetTimeout: 60000, // Longer cooldown
}),
analyticsService: new CircuitBreaker({
failureThreshold: 10, // More tolerant for non-critical
resetTimeout: 10000, // Quick retry
}),
searchService: new CircuitBreaker({
failureThreshold: 5,
resetTimeout: 15000,
}),
};
// Usage with React Query
import { useQuery } from '@tanstack/react-query';
export function useUserProfile(userId: string) {
return useQuery({
queryKey: ['user', userId],
queryFn: () =>
circuits.userService.execute(
() => fetchUserProfile(userId),
() => getCachedUserProfile(userId) // Fallback to cache
),
retry: (failureCount, error) => {
// Don't retry if circuit is open
if (error instanceof CircuitOpenError) return false;
return failureCount < 3;
},
});
}
Circuit Breaker UI Integration
// components/service-status.tsx
'use client';
import { useEffect, useState } from 'react';
import { circuits } from '@/lib/api/circuits';
type ServiceStatus = 'healthy' | 'degraded' | 'down';
export function ServiceStatusIndicator() {
const [statuses, setStatuses] = useState<Record<string, ServiceStatus>>({});
useEffect(() => {
// Subscribe to circuit state changes
const unsubscribes = Object.entries(circuits).map(([name, circuit]) =>
circuit.onStateChange((state) => {
setStatuses(prev => ({
...prev,
[name]: state === 'closed' ? 'healthy' :
state === 'half-open' ? 'degraded' : 'down',
}));
})
);
return () => unsubscribes.forEach(unsub => unsub());
}, []);
const overallStatus = Object.values(statuses).some(s => s === 'down')
? 'down'
: Object.values(statuses).some(s => s === 'degraded')
? 'degraded'
: 'healthy';
// Only show if there's an issue
if (overallStatus === 'healthy') return null;
return (
<div className={`
fixed bottom-4 right-4 p-4 rounded-lg shadow-lg
${overallStatus === 'down' ? 'bg-red-100' : 'bg-yellow-100'}
`}>
<p className="font-medium">
{overallStatus === 'down'
? 'Some features are currently unavailable'
: 'Some features may be slower than usual'}
</p>
</div>
);
}
Graceful Degradation Patterns
Feature Hierarchy
// lib/features/degradation.ts
type FeatureTier = 'critical' | 'important' | 'enhancement' | 'optional';
interface Feature {
name: string;
tier: FeatureTier;
dependencies: string[]; // Other features this depends on
fallback?: () => React.ReactNode;
}
const featureRegistry: Feature[] = [
// Critical: App doesn't work without these
{ name: 'auth', tier: 'critical', dependencies: [] },
{ name: 'navigation', tier: 'critical', dependencies: [] },
{ name: 'core-content', tier: 'critical', dependencies: ['auth'] },
// Important: Core functionality, but has fallbacks
{ name: 'search', tier: 'important', dependencies: [], fallback: () => <BasicSearch /> },
{ name: 'notifications', tier: 'important', dependencies: ['auth'], fallback: () => null },
{ name: 'real-time-updates', tier: 'important', dependencies: [], fallback: () => null },
// Enhancement: Nice to have
{ name: 'recommendations', tier: 'enhancement', dependencies: ['auth'] },
{ name: 'analytics-dashboard', tier: 'enhancement', dependencies: [] },
{ name: 'advanced-filters', tier: 'enhancement', dependencies: ['search'] },
// Optional: Can fail silently
{ name: 'chat-widget', tier: 'optional', dependencies: [] },
{ name: 'feedback-form', tier: 'optional', dependencies: [] },
{ name: 'social-share', tier: 'optional', dependencies: [] },
];
// Degradation levels based on system health
type DegradationLevel = 'full' | 'reduced' | 'minimal' | 'emergency';
function getEnabledFeatures(level: DegradationLevel): string[] {
const tiersByLevel: Record<DegradationLevel, FeatureTier[]> = {
full: ['critical', 'important', 'enhancement', 'optional'],
reduced: ['critical', 'important', 'enhancement'],
minimal: ['critical', 'important'],
emergency: ['critical'],
};
const enabledTiers = tiersByLevel[level];
return featureRegistry
.filter(f => enabledTiers.includes(f.tier))
.map(f => f.name);
}
Degradation-Aware Components
// contexts/degradation-context.tsx
'use client';
import { createContext, useContext, useState, useEffect } from 'react';
interface DegradationContextType {
level: DegradationLevel;
isFeatureEnabled: (featureName: string) => boolean;
getFeatureFallback: (featureName: string) => React.ReactNode | null;
}
const DegradationContext = createContext<DegradationContextType | null>(null);
export function DegradationProvider({ children }: { children: React.ReactNode }) {
const [level, setLevel] = useState<DegradationLevel>('full');
useEffect(() => {
// Check system health periodically
const checkHealth = async () => {
try {
const response = await fetch('/api/health', { timeout: 5000 });
const health = await response.json();
if (health.status === 'critical') {
setLevel('emergency');
} else if (health.status === 'degraded') {
setLevel('minimal');
} else if (health.status === 'partial') {
setLevel('reduced');
} else {
setLevel('full');
}
} catch {
// If health check fails, degrade
setLevel('reduced');
}
};
checkHealth();
const interval = setInterval(checkHealth, 30000);
return () => clearInterval(interval);
}, []);
const enabledFeatures = getEnabledFeatures(level);
const isFeatureEnabled = (name: string) => enabledFeatures.includes(name);
const getFeatureFallback = (name: string) => {
const feature = featureRegistry.find(f => f.name === name);
return feature?.fallback?.() ?? null;
};
return (
<DegradationContext.Provider value={{ level, isFeatureEnabled, getFeatureFallback }}>
{children}
</DegradationContext.Provider>
);
}
export function useDegradation() {
const context = useContext(DegradationContext);
if (!context) throw new Error('useDegradation must be used within DegradationProvider');
return context;
}
// Hook for feature gating
export function useFeature(featureName: string) {
const { isFeatureEnabled, getFeatureFallback } = useDegradation();
return {
isEnabled: isFeatureEnabled(featureName),
fallback: getFeatureFallback(featureName),
};
}
// Component wrapper
export function Feature({
name,
children,
fallback,
}: {
name: string;
children: React.ReactNode;
fallback?: React.ReactNode;
}) {
const { isEnabled, fallback: registeredFallback } = useFeature(name);
if (!isEnabled) {
return fallback ?? registeredFallback ?? null;
}
return <>{children}</>;
}
Using Degradation in UI
// app/dashboard/page.tsx
import { Feature } from '@/contexts/degradation-context';
export default function Dashboard() {
return (
<div className="grid grid-cols-12 gap-6">
{/* Critical: Always shown */}
<div className="col-span-8">
<MainContent />
</div>
{/* Important: Falls back to simpler version */}
<div className="col-span-4">
<Feature name="notifications" fallback={<NotificationBadge />}>
<NotificationsFeed />
</Feature>
</div>
{/* Enhancement: Hidden when degraded */}
<Feature name="recommendations">
<div className="col-span-12">
<Recommendations />
</div>
</Feature>
{/* Enhancement: Hidden when degraded */}
<Feature name="analytics-dashboard">
<div className="col-span-12">
<AnalyticsDashboard />
</div>
</Feature>
{/* Optional: Fails silently */}
<Feature name="chat-widget">
<ChatWidget />
</Feature>
</div>
);
}
Fallback UI Patterns
Loading States That Don't Lie
// components/data-states.tsx
interface DataStateProps<T> {
data: T | undefined;
isLoading: boolean;
isError: boolean;
error?: Error;
// Different states
loadingComponent?: React.ReactNode;
errorComponent?: React.ReactNode | ((error: Error) => React.ReactNode);
emptyComponent?: React.ReactNode;
// Render function
children: (data: T) => React.ReactNode;
}
export function DataState<T>({
data,
isLoading,
isError,
error,
loadingComponent,
errorComponent,
emptyComponent,
children,
}: DataStateProps<T>) {
// Loading
if (isLoading && !data) {
return <>{loadingComponent ?? <DefaultLoading />}</>;
}
// Error with no data
if (isError && !data) {
const errorUI = typeof errorComponent === 'function'
? errorComponent(error!)
: errorComponent ?? <DefaultError error={error} />;
return <>{errorUI}</>;
}
// Empty state
if (!data || (Array.isArray(data) && data.length === 0)) {
return <>{emptyComponent ?? <DefaultEmpty />}</>;
}
// Has data (might be stale if isError is true)
return (
<div className="relative">
{children(data)}
{/* Show stale indicator if data exists but refresh failed */}
{isError && data && (
<StaleDataIndicator
message="Showing cached data. Refresh failed."
onRetry={() => {/* trigger refetch */}}
/>
)}
</div>
);
}
// Usage
function UserList() {
const { data, isLoading, isError, error, refetch } = useUsers();
return (
<DataState
data={data}
isLoading={isLoading}
isError={isError}
error={error}
loadingComponent={<UserListSkeleton />}
errorComponent={(err) => (
<UserListError error={err} onRetry={refetch} />
)}
emptyComponent={<NoUsersFound />}
>
{(users) => (
<ul>
{users.map(user => <UserCard key={user.id} user={user} />)}
</ul>
)}
</DataState>
);
}
Skeleton Screens That Match Reality
// components/skeletons/user-card-skeleton.tsx
// Skeletons should match the actual component layout
export function UserCardSkeleton() {
return (
<div className="flex items-center p-4 border rounded-lg animate-pulse">
{/* Avatar placeholder - same size as real avatar */}
<div className="w-12 h-12 rounded-full bg-gray-200" />
<div className="ml-4 flex-1">
{/* Name placeholder - approximate width */}
<div className="h-4 w-32 bg-gray-200 rounded" />
{/* Email placeholder - different width */}
<div className="h-3 w-48 bg-gray-200 rounded mt-2" />
</div>
{/* Action button placeholder */}
<div className="w-20 h-8 bg-gray-200 rounded" />
</div>
);
}
// Generate skeleton list matching expected count
export function UserListSkeleton({ count = 5 }: { count?: number }) {
return (
<div className="space-y-3">
{Array.from({ length: count }).map((_, i) => (
<UserCardSkeleton key={i} />
))}
</div>
);
}
// Skeleton that knows about screen size
export function ResponsiveGridSkeleton() {
// Match actual grid layout
return (
<div className="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-3 gap-4">
{Array.from({ length: 6 }).map((_, i) => (
<CardSkeleton key={i} />
))}
</div>
);
}
Progressive Enhancement for Critical Actions
// components/checkout-button.tsx
interface CheckoutButtonProps {
cartId: string;
amount: number;
}
export function CheckoutButton({ cartId, amount }: CheckoutButtonProps) {
const { mutate, isPending, isError, error } = useCheckout();
const [showFallback, setShowFallback] = useState(false);
const handleCheckout = async () => {
try {
await mutate({ cartId });
} catch (e) {
// If checkout fails, show fallback options
setShowFallback(true);
}
};
if (showFallback) {
return (
<CheckoutFallback
cartId={cartId}
amount={amount}
onRetry={() => {
setShowFallback(false);
handleCheckout();
}}
/>
);
}
return (
<button
onClick={handleCheckout}
disabled={isPending}
className="w-full py-3 bg-blue-600 text-white rounded-lg disabled:opacity-50"
>
{isPending ? 'Processing...' : `Pay ${formatCurrency(amount)}`}
</button>
);
}
function CheckoutFallback({
cartId,
amount,
onRetry,
}: {
cartId: string;
amount: number;
onRetry: () => void;
}) {
return (
<div className="p-4 border rounded-lg space-y-4">
<p className="text-red-600">
We're having trouble processing your payment.
</p>
<div className="space-y-2">
<button
onClick={onRetry}
className="w-full py-2 bg-blue-600 text-white rounded"
>
Try again
</button>
<a
href={`/checkout/manual?cart=${cartId}`}
className="block w-full py-2 text-center border rounded"
>
Use manual checkout
</a>
<button
onClick={() => saveCartForLater(cartId)}
className="w-full py-2 text-gray-600"
>
Save cart for later
</button>
</div>
<p className="text-sm text-gray-500">
Your cart has been saved. You can also{' '}
<a href="/contact" className="text-blue-600">contact support</a>.
</p>
</div>
);
}
Retry Strategies
Smart Retry Logic
// lib/retry.ts
interface RetryOptions {
maxAttempts: number;
baseDelay: number;
maxDelay: number;
backoffMultiplier: number;
retryCondition?: (error: Error, attempt: number) => boolean;
onRetry?: (error: Error, attempt: number, delay: number) => void;
}
const defaultOptions: RetryOptions = {
maxAttempts: 3,
baseDelay: 1000,
maxDelay: 30000,
backoffMultiplier: 2,
};
export async function withRetry<T>(
fn: () => Promise<T>,
options: Partial<RetryOptions> = {}
): Promise<T> {
const opts = { ...defaultOptions, ...options };
let lastError: Error;
let attempt = 0;
while (attempt < opts.maxAttempts) {
try {
return await fn();
} catch (error) {
lastError = error instanceof Error ? error : new Error(String(error));
attempt++;
// Check if we should retry
if (attempt >= opts.maxAttempts) break;
if (opts.retryCondition && !opts.retryCondition(lastError, attempt)) break;
if (!isRetryable(lastError)) break;
// Calculate delay with exponential backoff + jitter
const delay = Math.min(
opts.baseDelay * Math.pow(opts.backoffMultiplier, attempt - 1),
opts.maxDelay
);
const jitter = delay * 0.2 * Math.random();
const totalDelay = delay + jitter;
opts.onRetry?.(lastError, attempt, totalDelay);
await sleep(totalDelay);
}
}
throw lastError!;
}
function isRetryable(error: Error): boolean {
// Don't retry client errors (4xx except 429)
if ('status' in error) {
const status = (error as any).status;
if (status >= 400 && status < 500 && status !== 429) {
return false;
}
}
// Don't retry validation errors
if (error.name === 'ValidationError') return false;
// Don't retry auth errors
if (error.name === 'AuthenticationError') return false;
// Retry everything else
return true;
}
function sleep(ms: number): Promise<void> {
return new Promise(resolve => setTimeout(resolve, ms));
}
React Query with Smart Retry
// lib/api/query-client.ts
import { QueryClient } from '@tanstack/react-query';
export const queryClient = new QueryClient({
defaultOptions: {
queries: {
retry: (failureCount, error) => {
// Don't retry on auth errors
if ((error as any)?.status === 401) return false;
// Don't retry on not found
if ((error as any)?.status === 404) return false;
// Don't retry on validation errors
if ((error as any)?.status === 400) return false;
// Retry up to 3 times for other errors
return failureCount < 3;
},
retryDelay: (attemptIndex) => {
// Exponential backoff with jitter
const delay = Math.min(1000 * 2 ** attemptIndex, 30000);
return delay + delay * 0.2 * Math.random();
},
staleTime: 5 * 60 * 1000, // 5 minutes
gcTime: 30 * 60 * 1000, // 30 minutes (formerly cacheTime)
},
mutations: {
retry: (failureCount, error) => {
// Be more conservative with mutations
if ((error as any)?.status >= 400) return false;
return failureCount < 2;
},
},
},
});
// Per-query retry customization
function useUserProfile(userId: string) {
return useQuery({
queryKey: ['user', userId],
queryFn: () => fetchUserProfile(userId),
retry: 3,
retryDelay: attemptIndex => Math.min(1000 * 2 ** attemptIndex, 10000),
// Use stale data while retrying
staleTime: 10 * 60 * 1000,
// Keep showing old data on error
placeholderData: keepPreviousData,
});
}
Offline Support
Basic Offline Detection
// hooks/use-online-status.ts
import { useState, useEffect, useSyncExternalStore } from 'react';
function subscribe(callback: () => void) {
window.addEventListener('online', callback);
window.addEventListener('offline', callback);
return () => {
window.removeEventListener('online', callback);
window.removeEventListener('offline', callback);
};
}
function getSnapshot() {
return navigator.onLine;
}
function getServerSnapshot() {
return true; // Assume online for SSR
}
export function useOnlineStatus() {
return useSyncExternalStore(subscribe, getSnapshot, getServerSnapshot);
}
// Usage with UI indicator
export function OnlineStatusIndicator() {
const isOnline = useOnlineStatus();
if (isOnline) return null;
return (
<div className="fixed top-0 inset-x-0 bg-yellow-500 text-white text-center py-2 z-50">
You're offline. Some features may be unavailable.
</div>
);
}
Offline-First Mutations
// hooks/use-offline-mutation.ts
import { useMutation, useQueryClient } from '@tanstack/react-query';
interface OfflineMutationOptions<T, V> {
mutationFn: (variables: V) => Promise<T>;
onMutate: (variables: V) => { optimisticUpdate: unknown; rollback: () => void };
queueKey: string;
}
export function useOfflineMutation<T, V>({
mutationFn,
onMutate,
queueKey,
}: OfflineMutationOptions<T, V>) {
const queryClient = useQueryClient();
const isOnline = useOnlineStatus();
return useMutation({
mutationFn: async (variables: V) => {
if (!isOnline) {
// Queue for later
queueMutation(queueKey, variables);
throw new OfflineError('Queued for when online');
}
return mutationFn(variables);
},
onMutate: async (variables) => {
// Apply optimistic update regardless of online status
const { optimisticUpdate, rollback } = onMutate(variables);
return { rollback };
},
onError: (error, variables, context) => {
if (error instanceof OfflineError) {
// Don't rollback - keep optimistic update until online
return;
}
// Rollback on real errors
context?.rollback();
},
onSettled: () => {
// Process queue when online
if (isOnline) {
processQueue(queueKey, mutationFn);
}
},
});
}
// Queue management
const mutationQueue = new Map<string, unknown[]>();
function queueMutation(key: string, mutation: unknown) {
const queue = mutationQueue.get(key) ?? [];
queue.push(mutation);
mutationQueue.set(key, queue);
// Persist to localStorage for page refreshes
localStorage.setItem(`mutation-queue:${key}`, JSON.stringify(queue));
}
async function processQueue<T, V>(
key: string,
mutationFn: (v: V) => Promise<T>
) {
const queue = mutationQueue.get(key) ?? [];
for (const mutation of queue) {
try {
await mutationFn(mutation as V);
} catch (error) {
console.error('Failed to process queued mutation:', error);
// Re-queue failed mutations
break;
}
}
mutationQueue.delete(key);
localStorage.removeItem(`mutation-queue:${key}`);
}
class OfflineError extends Error {
constructor(message: string) {
super(message);
this.name = 'OfflineError';
}
}
Monitoring and Learning from Failures
Client-Side Error Tracking
// lib/error-tracking.ts
interface ErrorContext {
userId?: string;
sessionId?: string;
route?: string;
component?: string;
action?: string;
metadata?: Record<string, unknown>;
}
interface ErrorReport {
error: {
message: string;
name: string;
stack?: string;
};
context: ErrorContext;
browser: {
userAgent: string;
language: string;
online: boolean;
memory?: number;
};
timestamp: number;
}
class ErrorTracker {
private queue: ErrorReport[] = [];
private flushInterval: number = 10000;
private maxQueueSize: number = 50;
constructor() {
// Periodic flush
setInterval(() => this.flush(), this.flushInterval);
// Flush on page hide
document.addEventListener('visibilitychange', () => {
if (document.visibilityState === 'hidden') {
this.flush();
}
});
// Global error handler
window.addEventListener('error', (event) => {
this.capture(event.error, { component: 'window' });
});
// Unhandled promise rejections
window.addEventListener('unhandledrejection', (event) => {
this.capture(
new Error(event.reason?.message || 'Unhandled rejection'),
{ component: 'promise' }
);
});
}
capture(error: Error, context: Partial<ErrorContext> = {}) {
const report: ErrorReport = {
error: {
message: error.message,
name: error.name,
stack: error.stack,
},
context: {
userId: getCurrentUserId(),
sessionId: getSessionId(),
route: window.location.pathname,
...context,
},
browser: {
userAgent: navigator.userAgent,
language: navigator.language,
online: navigator.onLine,
memory: (performance as any).memory?.usedJSHeapSize,
},
timestamp: Date.now(),
};
this.queue.push(report);
// Flush if queue is full
if (this.queue.length >= this.maxQueueSize) {
this.flush();
}
}
private async flush() {
if (this.queue.length === 0) return;
const reports = [...this.queue];
this.queue = [];
try {
// Use sendBeacon for reliability on page unload
const blob = new Blob([JSON.stringify(reports)], {
type: 'application/json',
});
if (navigator.sendBeacon) {
navigator.sendBeacon('/api/errors', blob);
} else {
await fetch('/api/errors', {
method: 'POST',
body: JSON.stringify(reports),
keepalive: true,
});
}
} catch {
// Re-queue on failure (with limit)
this.queue.unshift(...reports.slice(0, 10));
}
}
}
export const errorTracker = new ErrorTracker();
// Usage with Error Boundary
class TrackedErrorBoundary extends ErrorBoundary {
componentDidCatch(error: Error, errorInfo: React.ErrorInfo) {
super.componentDidCatch(error, errorInfo);
errorTracker.capture(error, {
component: this.props.name,
metadata: {
componentStack: errorInfo.componentStack,
},
});
}
}
Failure Analytics Dashboard
// Tracking patterns for failure analysis
// Track all fetch failures
const originalFetch = window.fetch;
window.fetch = async function (...args) {
const start = performance.now();
const url = typeof args[0] === 'string' ? args[0] : args[0].url;
try {
const response = await originalFetch.apply(this, args);
// Track failed responses
if (!response.ok) {
trackApiFailure({
url,
status: response.status,
duration: performance.now() - start,
});
}
return response;
} catch (error) {
// Track network failures
trackApiFailure({
url,
status: 0, // Network error
duration: performance.now() - start,
error: error instanceof Error ? error.message : 'Unknown',
});
throw error;
}
};
// Aggregate metrics
interface FailureMetrics {
endpoint: string;
failureRate: number;
avgLatency: number;
errorsByType: Record<number, number>;
lastFailure: number;
}
// Use this data to:
// 1. Identify problematic endpoints
// 2. Trigger circuit breakers proactively
// 3. Alert when failure rate exceeds threshold
// 4. A/B test retry strategies
Quick Reference
Failure Handling Checklist
## For Every Data-Fetching Component
### Loading States
□ Skeleton matches actual component layout
□ Loading state is shown immediately (no flash)
□ Long-running operations show progress
### Error States
□ Error boundary catches render errors
□ Fetch errors show actionable message
□ Retry option is available
□ Fallback content is meaningful
### Empty States
□ Distinguish between "loading" and "empty"
□ Empty state guides user to action
□ No confusing "No results" during load
### Stale Data
□ Show stale data with indicator vs. blocking
□ Background refresh doesn't clear content
□ Cache invalidation is intentional
## For Critical User Actions
### Mutations
□ Optimistic update where appropriate
□ Clear feedback during processing
□ Rollback on failure
□ Retry mechanism for transient failures
### Forms
□ Validation before submission
□ Preserve input on failure
□ Clear error messages per field
□ Alternative submission methods
### Payments/Sensitive
□ Never lose user data on failure
□ Multiple fallback paths
□ Customer support escape hatch
□ Transaction status is always queryable
Error Message Guidelines
┌─────────────────────────────────────────────────────────────────────────────┐
│ ERROR MESSAGE PRINCIPLES │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ❌ Bad: "Error: ECONNREFUSED" │
│ ✅ Good: "We couldn't connect to our servers. Please check your internet │
│ connection and try again." │
│ │
│ ❌ Bad: "Something went wrong" │
│ ✅ Good: "We couldn't load your orders. [Try again] or [Contact support]" │
│ │
│ ❌ Bad: "Error 500" │
│ ✅ Good: "Our servers are having trouble. Your data is safe. │
│ We're working on it." │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ Every error message should: │
│ 1. Explain what happened (in human terms) │
│ 2. Reassure (their data/action isn't lost, if true) │
│ 3. Guide (what they can do next) │
│ 4. Offer alternatives (different paths to goal) │
│ │
│ Never: │
│ • Show stack traces to users │
│ • Use technical jargon │
│ • Blame the user │
│ • Leave them with no options │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Closing Thoughts
Designing for failure isn't pessimism—it's engineering maturity. The best applications aren't the ones that never fail. They're the ones that fail gracefully, recover automatically, and keep users informed and productive.
The mental shift required:
-
Assume failure, not success. Every API call can fail. Every third-party script can hang. Every user can lose connectivity. Design for this first.
-
Degrade gracefully. Not all features are equally important. When things break, maintain core functionality while gracefully hiding broken parts.
-
Fail visibly but helpfully. Users should know something is wrong, but they should also know what they can do about it. Every error is a UX opportunity.
-
Recover automatically. Retry with backoff. Circuit breakers that reset. Queues that process when online. The system should heal itself when possible.
-
Learn from failures. Track everything. Analyze patterns. Use failure data to improve. The goal isn't zero failures—it's continuous improvement in how you handle them.
The applications that earn user trust aren't the perfect ones. They're the ones that handle imperfection with grace.
Your users will forgive failures. They won't forgive failures that waste their time, lose their data, or leave them confused. Design accordingly.
What did you think?