Back to Blog

Designing for Failure, Not Against It

Circuit breakers, graceful degradation, and fallback UI patterns — how frontend architects need to think beyond the happy path


The Uncomfortable Truth

Your application will fail. Not might fail—will fail. Services go down. Networks flake. APIs return garbage. Third-party scripts hang. Databases hit capacity. CDNs have bad days.

The difference between applications that users trust and applications that users abandon isn't whether they fail—it's how they fail.

Most frontend code is written for the happy path: data loads, APIs respond, everything works. But production reality is messier. The best frontend architects I've worked with spend as much time designing for failure as they do designing for success.

This post is about shifting your mindset from "how do I prevent failure" to "how do I fail gracefully."


The Failure Spectrum

┌─────────────────────────────────────────────────────────────────────────────┐
│                    FAILURE SEVERITY SPECTRUM                                 │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Catastrophic          Degraded              Minor              Hidden      │
│  ────────────────────────────────────────────────────────────────────────   │
│                                                                              │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐  │
│  │             │    │             │    │             │    │             │  │
│  │  White      │    │  Feature    │    │  Stale      │    │  Silent     │  │
│  │  Screen     │    │  Broken     │    │  Data       │    │  Retry      │  │
│  │  of Death   │    │             │    │             │    │             │  │
│  │             │    │             │    │             │    │             │  │
│  └─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘  │
│                                                                              │
│  User impact:        User impact:        User impact:        User impact:   │
│  Cannot use app      Cannot use          Slight delay,       None noticed   │
│  at all             specific feature     data updates                       │
│                                                                              │
│  User reaction:      User reaction:      User reaction:      User reaction: │
│  Leave, never        Frustrated,         Mildly annoyed,     "Works great!" │
│  come back           complain            accepts it                         │
│                                                                              │
│  Goal: Move failures from left to right                                     │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Failure Categories

┌─────────────────────────────────────────────────────────────────────────────┐
│                    TYPES OF FRONTEND FAILURES                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Network Failures                                                            │
│  ├── Complete network loss                                                  │
│  ├── Intermittent connectivity                                              │
│  ├── High latency (slow but connected)                                      │
│  ├── DNS failures                                                           │
│  └── SSL/TLS errors                                                         │
│                                                                              │
│  API Failures                                                                │
│  ├── 5xx server errors                                                      │
│  ├── 4xx client errors (unexpected)                                         │
│  ├── Timeout (no response)                                                  │
│  ├── Malformed response (invalid JSON)                                      │
│  ├── Unexpected response shape                                              │
│  └── Rate limiting (429)                                                    │
│                                                                              │
│  Third-Party Failures                                                        │
│  ├── Analytics script hanging                                               │
│  ├── Chat widget not loading                                                │
│  ├── CDN serving stale/wrong assets                                         │
│  ├── Auth provider down                                                     │
│  └── Payment processor unavailable                                          │
│                                                                              │
│  Client-Side Failures                                                        │
│  ├── JavaScript error (crash)                                               │
│  ├── Memory exhaustion                                                      │
│  ├── Storage quota exceeded                                                 │
│  ├── Browser API not available                                              │
│  └── State corruption                                                       │
│                                                                              │
│  Data Failures                                                               │
│  ├── Stale cache                                                            │
│  ├── Inconsistent data                                                      │
│  ├── Missing required fields                                                │
│  ├── Type mismatches                                                        │
│  └── Referential integrity issues                                           │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Error Boundaries: Your First Line of Defense

Error boundaries prevent the entire app from crashing when a component fails.

Basic Error Boundary

// components/error-boundary.tsx
'use client';

import React, { Component, ReactNode } from 'react';

interface Props {
  children: ReactNode;
  fallback?: ReactNode;
  onError?: (error: Error, errorInfo: React.ErrorInfo) => void;
}

interface State {
  hasError: boolean;
  error: Error | null;
}

export class ErrorBoundary extends Component<Props, State> {
  constructor(props: Props) {
    super(props);
    this.state = { hasError: false, error: null };
  }

  static getDerivedStateFromError(error: Error): State {
    return { hasError: true, error };
  }

  componentDidCatch(error: Error, errorInfo: React.ErrorInfo) {
    // Log to error tracking service
    console.error('Error boundary caught:', error, errorInfo);
    this.props.onError?.(error, errorInfo);
  }

  render() {
    if (this.state.hasError) {
      return this.props.fallback ?? (
        <DefaultErrorFallback
          error={this.state.error}
          reset={() => this.setState({ hasError: false, error: null })}
        />
      );
    }

    return this.props.children;
  }
}

function DefaultErrorFallback({
  error,
  reset,
}: {
  error: Error | null;
  reset: () => void;
}) {
  return (
    <div className="p-6 rounded-lg bg-red-50 border border-red-200">
      <h2 className="text-lg font-semibold text-red-800">Something went wrong</h2>
      <p className="mt-2 text-red-600">
        {error?.message || 'An unexpected error occurred'}
      </p>
      <button
        onClick={reset}
        className="mt-4 px-4 py-2 bg-red-600 text-white rounded hover:bg-red-700"
      >
        Try again
      </button>
    </div>
  );
}

Strategic Error Boundary Placement

// Don't just wrap the whole app - be strategic
// app/layout.tsx

export default function RootLayout({ children }: { children: React.ReactNode }) {
  return (
    <html>
      <body>
        {/* Critical: Header should always work */}
        <ErrorBoundary fallback={<MinimalHeader />}>
          <Header />
        </ErrorBoundary>

        {/* Main content can fail independently */}
        <ErrorBoundary fallback={<MainContentError />}>
          {children}
        </ErrorBoundary>

        {/* Non-critical: Footer can fail silently */}
        <ErrorBoundary fallback={null}>
          <Footer />
        </ErrorBoundary>

        {/* Third-party widgets should never crash the app */}
        <ErrorBoundary fallback={null}>
          <ChatWidget />
        </ErrorBoundary>
      </body>
    </html>
  );
}

// Page-level boundaries
// app/dashboard/page.tsx
export default function DashboardPage() {
  return (
    <div className="grid grid-cols-3 gap-4">
      {/* Each widget fails independently */}
      <ErrorBoundary fallback={<WidgetError title="Revenue" />}>
        <RevenueWidget />
      </ErrorBoundary>

      <ErrorBoundary fallback={<WidgetError title="Users" />}>
        <UsersWidget />
      </ErrorBoundary>

      <ErrorBoundary fallback={<WidgetError title="Orders" />}>
        <OrdersWidget />
      </ErrorBoundary>
    </div>
  );
}

Next.js App Router Error Handling

// app/dashboard/error.tsx
'use client';

import { useEffect } from 'react';

export default function DashboardError({
  error,
  reset,
}: {
  error: Error & { digest?: string };
  reset: () => void;
}) {
  useEffect(() => {
    // Log to error tracking
    reportError(error);
  }, [error]);

  return (
    <div className="flex flex-col items-center justify-center min-h-[400px]">
      <h2 className="text-xl font-semibold">Dashboard temporarily unavailable</h2>
      <p className="mt-2 text-gray-600">
        We're having trouble loading your dashboard.
      </p>
      <div className="mt-6 space-x-4">
        <button
          onClick={reset}
          className="px-4 py-2 bg-blue-600 text-white rounded"
        >
          Try again
        </button>
        <a href="/" className="px-4 py-2 border rounded">
          Go home
        </a>
      </div>
      {process.env.NODE_ENV === 'development' && (
        <pre className="mt-4 p-4 bg-gray-100 rounded text-sm overflow-auto max-w-full">
          {error.message}
        </pre>
      )}
    </div>
  );
}

// app/dashboard/loading.tsx
// Provides instant feedback while data loads
export default function DashboardLoading() {
  return (
    <div className="grid grid-cols-3 gap-4">
      <WidgetSkeleton />
      <WidgetSkeleton />
      <WidgetSkeleton />
    </div>
  );
}

Circuit Breakers: Stop the Bleeding

Circuit breakers prevent cascading failures by stopping requests to failing services.

Frontend Circuit Breaker

// lib/circuit-breaker.ts

type CircuitState = 'closed' | 'open' | 'half-open';

interface CircuitBreakerOptions {
  failureThreshold: number;      // Failures before opening
  resetTimeout: number;          // ms before trying again
  monitorInterval: number;       // Window for counting failures
  halfOpenRequests: number;      // Requests to test when half-open
}

interface CircuitBreakerState {
  state: CircuitState;
  failures: number;
  lastFailure: number | null;
  halfOpenSuccesses: number;
}

export class CircuitBreaker {
  private state: CircuitBreakerState;
  private options: CircuitBreakerOptions;
  private stateChangeCallbacks: ((state: CircuitState) => void)[] = [];

  constructor(options: Partial<CircuitBreakerOptions> = {}) {
    this.options = {
      failureThreshold: 5,
      resetTimeout: 30000,        // 30 seconds
      monitorInterval: 60000,     // 1 minute
      halfOpenRequests: 3,
      ...options,
    };

    this.state = {
      state: 'closed',
      failures: 0,
      lastFailure: null,
      halfOpenSuccesses: 0,
    };
  }

  async execute<T>(fn: () => Promise<T>, fallback?: () => T): Promise<T> {
    // Check if circuit should transition
    this.checkStateTransition();

    if (this.state.state === 'open') {
      if (fallback) {
        return fallback();
      }
      throw new CircuitOpenError('Circuit breaker is open');
    }

    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();

      if (fallback && this.state.state === 'open') {
        return fallback();
      }

      throw error;
    }
  }

  private checkStateTransition() {
    const now = Date.now();

    if (this.state.state === 'open') {
      // Check if we should try again
      if (
        this.state.lastFailure &&
        now - this.state.lastFailure > this.options.resetTimeout
      ) {
        this.transitionTo('half-open');
      }
    }

    // Reset failure count if outside monitoring window
    if (
      this.state.lastFailure &&
      now - this.state.lastFailure > this.options.monitorInterval
    ) {
      this.state.failures = 0;
    }
  }

  private onSuccess() {
    if (this.state.state === 'half-open') {
      this.state.halfOpenSuccesses++;

      if (this.state.halfOpenSuccesses >= this.options.halfOpenRequests) {
        this.transitionTo('closed');
      }
    } else {
      this.state.failures = 0;
    }
  }

  private onFailure() {
    this.state.failures++;
    this.state.lastFailure = Date.now();

    if (this.state.state === 'half-open') {
      this.transitionTo('open');
    } else if (this.state.failures >= this.options.failureThreshold) {
      this.transitionTo('open');
    }
  }

  private transitionTo(newState: CircuitState) {
    if (this.state.state !== newState) {
      this.state.state = newState;
      this.state.halfOpenSuccesses = 0;

      if (newState === 'closed') {
        this.state.failures = 0;
      }

      this.stateChangeCallbacks.forEach(cb => cb(newState));
    }
  }

  onStateChange(callback: (state: CircuitState) => void) {
    this.stateChangeCallbacks.push(callback);
    return () => {
      this.stateChangeCallbacks = this.stateChangeCallbacks.filter(
        cb => cb !== callback
      );
    };
  }

  getState(): CircuitState {
    return this.state.state;
  }
}

class CircuitOpenError extends Error {
  constructor(message: string) {
    super(message);
    this.name = 'CircuitOpenError';
  }
}

Circuit Breaker Per Service

// lib/api/circuits.ts
import { CircuitBreaker } from '../circuit-breaker';

// Create a circuit breaker for each external dependency
export const circuits = {
  userService: new CircuitBreaker({
    failureThreshold: 5,
    resetTimeout: 30000,
  }),

  paymentService: new CircuitBreaker({
    failureThreshold: 3,      // More sensitive for payments
    resetTimeout: 60000,      // Longer cooldown
  }),

  analyticsService: new CircuitBreaker({
    failureThreshold: 10,     // More tolerant for non-critical
    resetTimeout: 10000,      // Quick retry
  }),

  searchService: new CircuitBreaker({
    failureThreshold: 5,
    resetTimeout: 15000,
  }),
};

// Usage with React Query
import { useQuery } from '@tanstack/react-query';

export function useUserProfile(userId: string) {
  return useQuery({
    queryKey: ['user', userId],
    queryFn: () =>
      circuits.userService.execute(
        () => fetchUserProfile(userId),
        () => getCachedUserProfile(userId) // Fallback to cache
      ),
    retry: (failureCount, error) => {
      // Don't retry if circuit is open
      if (error instanceof CircuitOpenError) return false;
      return failureCount < 3;
    },
  });
}

Circuit Breaker UI Integration

// components/service-status.tsx
'use client';

import { useEffect, useState } from 'react';
import { circuits } from '@/lib/api/circuits';

type ServiceStatus = 'healthy' | 'degraded' | 'down';

export function ServiceStatusIndicator() {
  const [statuses, setStatuses] = useState<Record<string, ServiceStatus>>({});

  useEffect(() => {
    // Subscribe to circuit state changes
    const unsubscribes = Object.entries(circuits).map(([name, circuit]) =>
      circuit.onStateChange((state) => {
        setStatuses(prev => ({
          ...prev,
          [name]: state === 'closed' ? 'healthy' :
                  state === 'half-open' ? 'degraded' : 'down',
        }));
      })
    );

    return () => unsubscribes.forEach(unsub => unsub());
  }, []);

  const overallStatus = Object.values(statuses).some(s => s === 'down')
    ? 'down'
    : Object.values(statuses).some(s => s === 'degraded')
    ? 'degraded'
    : 'healthy';

  // Only show if there's an issue
  if (overallStatus === 'healthy') return null;

  return (
    <div className={`
      fixed bottom-4 right-4 p-4 rounded-lg shadow-lg
      ${overallStatus === 'down' ? 'bg-red-100' : 'bg-yellow-100'}
    `}>
      <p className="font-medium">
        {overallStatus === 'down'
          ? 'Some features are currently unavailable'
          : 'Some features may be slower than usual'}
      </p>
    </div>
  );
}

Graceful Degradation Patterns

Feature Hierarchy

// lib/features/degradation.ts

type FeatureTier = 'critical' | 'important' | 'enhancement' | 'optional';

interface Feature {
  name: string;
  tier: FeatureTier;
  dependencies: string[];  // Other features this depends on
  fallback?: () => React.ReactNode;
}

const featureRegistry: Feature[] = [
  // Critical: App doesn't work without these
  { name: 'auth', tier: 'critical', dependencies: [] },
  { name: 'navigation', tier: 'critical', dependencies: [] },
  { name: 'core-content', tier: 'critical', dependencies: ['auth'] },

  // Important: Core functionality, but has fallbacks
  { name: 'search', tier: 'important', dependencies: [], fallback: () => <BasicSearch /> },
  { name: 'notifications', tier: 'important', dependencies: ['auth'], fallback: () => null },
  { name: 'real-time-updates', tier: 'important', dependencies: [], fallback: () => null },

  // Enhancement: Nice to have
  { name: 'recommendations', tier: 'enhancement', dependencies: ['auth'] },
  { name: 'analytics-dashboard', tier: 'enhancement', dependencies: [] },
  { name: 'advanced-filters', tier: 'enhancement', dependencies: ['search'] },

  // Optional: Can fail silently
  { name: 'chat-widget', tier: 'optional', dependencies: [] },
  { name: 'feedback-form', tier: 'optional', dependencies: [] },
  { name: 'social-share', tier: 'optional', dependencies: [] },
];

// Degradation levels based on system health
type DegradationLevel = 'full' | 'reduced' | 'minimal' | 'emergency';

function getEnabledFeatures(level: DegradationLevel): string[] {
  const tiersByLevel: Record<DegradationLevel, FeatureTier[]> = {
    full: ['critical', 'important', 'enhancement', 'optional'],
    reduced: ['critical', 'important', 'enhancement'],
    minimal: ['critical', 'important'],
    emergency: ['critical'],
  };

  const enabledTiers = tiersByLevel[level];
  return featureRegistry
    .filter(f => enabledTiers.includes(f.tier))
    .map(f => f.name);
}

Degradation-Aware Components

// contexts/degradation-context.tsx
'use client';

import { createContext, useContext, useState, useEffect } from 'react';

interface DegradationContextType {
  level: DegradationLevel;
  isFeatureEnabled: (featureName: string) => boolean;
  getFeatureFallback: (featureName: string) => React.ReactNode | null;
}

const DegradationContext = createContext<DegradationContextType | null>(null);

export function DegradationProvider({ children }: { children: React.ReactNode }) {
  const [level, setLevel] = useState<DegradationLevel>('full');

  useEffect(() => {
    // Check system health periodically
    const checkHealth = async () => {
      try {
        const response = await fetch('/api/health', { timeout: 5000 });
        const health = await response.json();

        if (health.status === 'critical') {
          setLevel('emergency');
        } else if (health.status === 'degraded') {
          setLevel('minimal');
        } else if (health.status === 'partial') {
          setLevel('reduced');
        } else {
          setLevel('full');
        }
      } catch {
        // If health check fails, degrade
        setLevel('reduced');
      }
    };

    checkHealth();
    const interval = setInterval(checkHealth, 30000);
    return () => clearInterval(interval);
  }, []);

  const enabledFeatures = getEnabledFeatures(level);

  const isFeatureEnabled = (name: string) => enabledFeatures.includes(name);

  const getFeatureFallback = (name: string) => {
    const feature = featureRegistry.find(f => f.name === name);
    return feature?.fallback?.() ?? null;
  };

  return (
    <DegradationContext.Provider value={{ level, isFeatureEnabled, getFeatureFallback }}>
      {children}
    </DegradationContext.Provider>
  );
}

export function useDegradation() {
  const context = useContext(DegradationContext);
  if (!context) throw new Error('useDegradation must be used within DegradationProvider');
  return context;
}

// Hook for feature gating
export function useFeature(featureName: string) {
  const { isFeatureEnabled, getFeatureFallback } = useDegradation();

  return {
    isEnabled: isFeatureEnabled(featureName),
    fallback: getFeatureFallback(featureName),
  };
}

// Component wrapper
export function Feature({
  name,
  children,
  fallback,
}: {
  name: string;
  children: React.ReactNode;
  fallback?: React.ReactNode;
}) {
  const { isEnabled, fallback: registeredFallback } = useFeature(name);

  if (!isEnabled) {
    return fallback ?? registeredFallback ?? null;
  }

  return <>{children}</>;
}

Using Degradation in UI

// app/dashboard/page.tsx
import { Feature } from '@/contexts/degradation-context';

export default function Dashboard() {
  return (
    <div className="grid grid-cols-12 gap-6">
      {/* Critical: Always shown */}
      <div className="col-span-8">
        <MainContent />
      </div>

      {/* Important: Falls back to simpler version */}
      <div className="col-span-4">
        <Feature name="notifications" fallback={<NotificationBadge />}>
          <NotificationsFeed />
        </Feature>
      </div>

      {/* Enhancement: Hidden when degraded */}
      <Feature name="recommendations">
        <div className="col-span-12">
          <Recommendations />
        </div>
      </Feature>

      {/* Enhancement: Hidden when degraded */}
      <Feature name="analytics-dashboard">
        <div className="col-span-12">
          <AnalyticsDashboard />
        </div>
      </Feature>

      {/* Optional: Fails silently */}
      <Feature name="chat-widget">
        <ChatWidget />
      </Feature>
    </div>
  );
}

Fallback UI Patterns

Loading States That Don't Lie

// components/data-states.tsx

interface DataStateProps<T> {
  data: T | undefined;
  isLoading: boolean;
  isError: boolean;
  error?: Error;
  // Different states
  loadingComponent?: React.ReactNode;
  errorComponent?: React.ReactNode | ((error: Error) => React.ReactNode);
  emptyComponent?: React.ReactNode;
  // Render function
  children: (data: T) => React.ReactNode;
}

export function DataState<T>({
  data,
  isLoading,
  isError,
  error,
  loadingComponent,
  errorComponent,
  emptyComponent,
  children,
}: DataStateProps<T>) {
  // Loading
  if (isLoading && !data) {
    return <>{loadingComponent ?? <DefaultLoading />}</>;
  }

  // Error with no data
  if (isError && !data) {
    const errorUI = typeof errorComponent === 'function'
      ? errorComponent(error!)
      : errorComponent ?? <DefaultError error={error} />;
    return <>{errorUI}</>;
  }

  // Empty state
  if (!data || (Array.isArray(data) && data.length === 0)) {
    return <>{emptyComponent ?? <DefaultEmpty />}</>;
  }

  // Has data (might be stale if isError is true)
  return (
    <div className="relative">
      {children(data)}

      {/* Show stale indicator if data exists but refresh failed */}
      {isError && data && (
        <StaleDataIndicator
          message="Showing cached data. Refresh failed."
          onRetry={() => {/* trigger refetch */}}
        />
      )}
    </div>
  );
}

// Usage
function UserList() {
  const { data, isLoading, isError, error, refetch } = useUsers();

  return (
    <DataState
      data={data}
      isLoading={isLoading}
      isError={isError}
      error={error}
      loadingComponent={<UserListSkeleton />}
      errorComponent={(err) => (
        <UserListError error={err} onRetry={refetch} />
      )}
      emptyComponent={<NoUsersFound />}
    >
      {(users) => (
        <ul>
          {users.map(user => <UserCard key={user.id} user={user} />)}
        </ul>
      )}
    </DataState>
  );
}

Skeleton Screens That Match Reality

// components/skeletons/user-card-skeleton.tsx

// Skeletons should match the actual component layout
export function UserCardSkeleton() {
  return (
    <div className="flex items-center p-4 border rounded-lg animate-pulse">
      {/* Avatar placeholder - same size as real avatar */}
      <div className="w-12 h-12 rounded-full bg-gray-200" />

      <div className="ml-4 flex-1">
        {/* Name placeholder - approximate width */}
        <div className="h-4 w-32 bg-gray-200 rounded" />

        {/* Email placeholder - different width */}
        <div className="h-3 w-48 bg-gray-200 rounded mt-2" />
      </div>

      {/* Action button placeholder */}
      <div className="w-20 h-8 bg-gray-200 rounded" />
    </div>
  );
}

// Generate skeleton list matching expected count
export function UserListSkeleton({ count = 5 }: { count?: number }) {
  return (
    <div className="space-y-3">
      {Array.from({ length: count }).map((_, i) => (
        <UserCardSkeleton key={i} />
      ))}
    </div>
  );
}

// Skeleton that knows about screen size
export function ResponsiveGridSkeleton() {
  // Match actual grid layout
  return (
    <div className="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-3 gap-4">
      {Array.from({ length: 6 }).map((_, i) => (
        <CardSkeleton key={i} />
      ))}
    </div>
  );
}

Progressive Enhancement for Critical Actions

// components/checkout-button.tsx

interface CheckoutButtonProps {
  cartId: string;
  amount: number;
}

export function CheckoutButton({ cartId, amount }: CheckoutButtonProps) {
  const { mutate, isPending, isError, error } = useCheckout();
  const [showFallback, setShowFallback] = useState(false);

  const handleCheckout = async () => {
    try {
      await mutate({ cartId });
    } catch (e) {
      // If checkout fails, show fallback options
      setShowFallback(true);
    }
  };

  if (showFallback) {
    return (
      <CheckoutFallback
        cartId={cartId}
        amount={amount}
        onRetry={() => {
          setShowFallback(false);
          handleCheckout();
        }}
      />
    );
  }

  return (
    <button
      onClick={handleCheckout}
      disabled={isPending}
      className="w-full py-3 bg-blue-600 text-white rounded-lg disabled:opacity-50"
    >
      {isPending ? 'Processing...' : `Pay ${formatCurrency(amount)}`}
    </button>
  );
}

function CheckoutFallback({
  cartId,
  amount,
  onRetry,
}: {
  cartId: string;
  amount: number;
  onRetry: () => void;
}) {
  return (
    <div className="p-4 border rounded-lg space-y-4">
      <p className="text-red-600">
        We're having trouble processing your payment.
      </p>

      <div className="space-y-2">
        <button
          onClick={onRetry}
          className="w-full py-2 bg-blue-600 text-white rounded"
        >
          Try again
        </button>

        <a
          href={`/checkout/manual?cart=${cartId}`}
          className="block w-full py-2 text-center border rounded"
        >
          Use manual checkout
        </a>

        <button
          onClick={() => saveCartForLater(cartId)}
          className="w-full py-2 text-gray-600"
        >
          Save cart for later
        </button>
      </div>

      <p className="text-sm text-gray-500">
        Your cart has been saved. You can also{' '}
        <a href="/contact" className="text-blue-600">contact support</a>.
      </p>
    </div>
  );
}

Retry Strategies

Smart Retry Logic

// lib/retry.ts

interface RetryOptions {
  maxAttempts: number;
  baseDelay: number;
  maxDelay: number;
  backoffMultiplier: number;
  retryCondition?: (error: Error, attempt: number) => boolean;
  onRetry?: (error: Error, attempt: number, delay: number) => void;
}

const defaultOptions: RetryOptions = {
  maxAttempts: 3,
  baseDelay: 1000,
  maxDelay: 30000,
  backoffMultiplier: 2,
};

export async function withRetry<T>(
  fn: () => Promise<T>,
  options: Partial<RetryOptions> = {}
): Promise<T> {
  const opts = { ...defaultOptions, ...options };
  let lastError: Error;
  let attempt = 0;

  while (attempt < opts.maxAttempts) {
    try {
      return await fn();
    } catch (error) {
      lastError = error instanceof Error ? error : new Error(String(error));
      attempt++;

      // Check if we should retry
      if (attempt >= opts.maxAttempts) break;
      if (opts.retryCondition && !opts.retryCondition(lastError, attempt)) break;
      if (!isRetryable(lastError)) break;

      // Calculate delay with exponential backoff + jitter
      const delay = Math.min(
        opts.baseDelay * Math.pow(opts.backoffMultiplier, attempt - 1),
        opts.maxDelay
      );
      const jitter = delay * 0.2 * Math.random();
      const totalDelay = delay + jitter;

      opts.onRetry?.(lastError, attempt, totalDelay);

      await sleep(totalDelay);
    }
  }

  throw lastError!;
}

function isRetryable(error: Error): boolean {
  // Don't retry client errors (4xx except 429)
  if ('status' in error) {
    const status = (error as any).status;
    if (status >= 400 && status < 500 && status !== 429) {
      return false;
    }
  }

  // Don't retry validation errors
  if (error.name === 'ValidationError') return false;

  // Don't retry auth errors
  if (error.name === 'AuthenticationError') return false;

  // Retry everything else
  return true;
}

function sleep(ms: number): Promise<void> {
  return new Promise(resolve => setTimeout(resolve, ms));
}

React Query with Smart Retry

// lib/api/query-client.ts
import { QueryClient } from '@tanstack/react-query';

export const queryClient = new QueryClient({
  defaultOptions: {
    queries: {
      retry: (failureCount, error) => {
        // Don't retry on auth errors
        if ((error as any)?.status === 401) return false;

        // Don't retry on not found
        if ((error as any)?.status === 404) return false;

        // Don't retry on validation errors
        if ((error as any)?.status === 400) return false;

        // Retry up to 3 times for other errors
        return failureCount < 3;
      },
      retryDelay: (attemptIndex) => {
        // Exponential backoff with jitter
        const delay = Math.min(1000 * 2 ** attemptIndex, 30000);
        return delay + delay * 0.2 * Math.random();
      },
      staleTime: 5 * 60 * 1000, // 5 minutes
      gcTime: 30 * 60 * 1000,   // 30 minutes (formerly cacheTime)
    },
    mutations: {
      retry: (failureCount, error) => {
        // Be more conservative with mutations
        if ((error as any)?.status >= 400) return false;
        return failureCount < 2;
      },
    },
  },
});

// Per-query retry customization
function useUserProfile(userId: string) {
  return useQuery({
    queryKey: ['user', userId],
    queryFn: () => fetchUserProfile(userId),
    retry: 3,
    retryDelay: attemptIndex => Math.min(1000 * 2 ** attemptIndex, 10000),
    // Use stale data while retrying
    staleTime: 10 * 60 * 1000,
    // Keep showing old data on error
    placeholderData: keepPreviousData,
  });
}

Offline Support

Basic Offline Detection

// hooks/use-online-status.ts
import { useState, useEffect, useSyncExternalStore } from 'react';

function subscribe(callback: () => void) {
  window.addEventListener('online', callback);
  window.addEventListener('offline', callback);
  return () => {
    window.removeEventListener('online', callback);
    window.removeEventListener('offline', callback);
  };
}

function getSnapshot() {
  return navigator.onLine;
}

function getServerSnapshot() {
  return true; // Assume online for SSR
}

export function useOnlineStatus() {
  return useSyncExternalStore(subscribe, getSnapshot, getServerSnapshot);
}

// Usage with UI indicator
export function OnlineStatusIndicator() {
  const isOnline = useOnlineStatus();

  if (isOnline) return null;

  return (
    <div className="fixed top-0 inset-x-0 bg-yellow-500 text-white text-center py-2 z-50">
      You're offline. Some features may be unavailable.
    </div>
  );
}

Offline-First Mutations

// hooks/use-offline-mutation.ts
import { useMutation, useQueryClient } from '@tanstack/react-query';

interface OfflineMutationOptions<T, V> {
  mutationFn: (variables: V) => Promise<T>;
  onMutate: (variables: V) => { optimisticUpdate: unknown; rollback: () => void };
  queueKey: string;
}

export function useOfflineMutation<T, V>({
  mutationFn,
  onMutate,
  queueKey,
}: OfflineMutationOptions<T, V>) {
  const queryClient = useQueryClient();
  const isOnline = useOnlineStatus();

  return useMutation({
    mutationFn: async (variables: V) => {
      if (!isOnline) {
        // Queue for later
        queueMutation(queueKey, variables);
        throw new OfflineError('Queued for when online');
      }
      return mutationFn(variables);
    },

    onMutate: async (variables) => {
      // Apply optimistic update regardless of online status
      const { optimisticUpdate, rollback } = onMutate(variables);

      return { rollback };
    },

    onError: (error, variables, context) => {
      if (error instanceof OfflineError) {
        // Don't rollback - keep optimistic update until online
        return;
      }
      // Rollback on real errors
      context?.rollback();
    },

    onSettled: () => {
      // Process queue when online
      if (isOnline) {
        processQueue(queueKey, mutationFn);
      }
    },
  });
}

// Queue management
const mutationQueue = new Map<string, unknown[]>();

function queueMutation(key: string, mutation: unknown) {
  const queue = mutationQueue.get(key) ?? [];
  queue.push(mutation);
  mutationQueue.set(key, queue);

  // Persist to localStorage for page refreshes
  localStorage.setItem(`mutation-queue:${key}`, JSON.stringify(queue));
}

async function processQueue<T, V>(
  key: string,
  mutationFn: (v: V) => Promise<T>
) {
  const queue = mutationQueue.get(key) ?? [];

  for (const mutation of queue) {
    try {
      await mutationFn(mutation as V);
    } catch (error) {
      console.error('Failed to process queued mutation:', error);
      // Re-queue failed mutations
      break;
    }
  }

  mutationQueue.delete(key);
  localStorage.removeItem(`mutation-queue:${key}`);
}

class OfflineError extends Error {
  constructor(message: string) {
    super(message);
    this.name = 'OfflineError';
  }
}

Monitoring and Learning from Failures

Client-Side Error Tracking

// lib/error-tracking.ts

interface ErrorContext {
  userId?: string;
  sessionId?: string;
  route?: string;
  component?: string;
  action?: string;
  metadata?: Record<string, unknown>;
}

interface ErrorReport {
  error: {
    message: string;
    name: string;
    stack?: string;
  };
  context: ErrorContext;
  browser: {
    userAgent: string;
    language: string;
    online: boolean;
    memory?: number;
  };
  timestamp: number;
}

class ErrorTracker {
  private queue: ErrorReport[] = [];
  private flushInterval: number = 10000;
  private maxQueueSize: number = 50;

  constructor() {
    // Periodic flush
    setInterval(() => this.flush(), this.flushInterval);

    // Flush on page hide
    document.addEventListener('visibilitychange', () => {
      if (document.visibilityState === 'hidden') {
        this.flush();
      }
    });

    // Global error handler
    window.addEventListener('error', (event) => {
      this.capture(event.error, { component: 'window' });
    });

    // Unhandled promise rejections
    window.addEventListener('unhandledrejection', (event) => {
      this.capture(
        new Error(event.reason?.message || 'Unhandled rejection'),
        { component: 'promise' }
      );
    });
  }

  capture(error: Error, context: Partial<ErrorContext> = {}) {
    const report: ErrorReport = {
      error: {
        message: error.message,
        name: error.name,
        stack: error.stack,
      },
      context: {
        userId: getCurrentUserId(),
        sessionId: getSessionId(),
        route: window.location.pathname,
        ...context,
      },
      browser: {
        userAgent: navigator.userAgent,
        language: navigator.language,
        online: navigator.onLine,
        memory: (performance as any).memory?.usedJSHeapSize,
      },
      timestamp: Date.now(),
    };

    this.queue.push(report);

    // Flush if queue is full
    if (this.queue.length >= this.maxQueueSize) {
      this.flush();
    }
  }

  private async flush() {
    if (this.queue.length === 0) return;

    const reports = [...this.queue];
    this.queue = [];

    try {
      // Use sendBeacon for reliability on page unload
      const blob = new Blob([JSON.stringify(reports)], {
        type: 'application/json',
      });

      if (navigator.sendBeacon) {
        navigator.sendBeacon('/api/errors', blob);
      } else {
        await fetch('/api/errors', {
          method: 'POST',
          body: JSON.stringify(reports),
          keepalive: true,
        });
      }
    } catch {
      // Re-queue on failure (with limit)
      this.queue.unshift(...reports.slice(0, 10));
    }
  }
}

export const errorTracker = new ErrorTracker();

// Usage with Error Boundary
class TrackedErrorBoundary extends ErrorBoundary {
  componentDidCatch(error: Error, errorInfo: React.ErrorInfo) {
    super.componentDidCatch(error, errorInfo);

    errorTracker.capture(error, {
      component: this.props.name,
      metadata: {
        componentStack: errorInfo.componentStack,
      },
    });
  }
}

Failure Analytics Dashboard

// Tracking patterns for failure analysis

// Track all fetch failures
const originalFetch = window.fetch;
window.fetch = async function (...args) {
  const start = performance.now();
  const url = typeof args[0] === 'string' ? args[0] : args[0].url;

  try {
    const response = await originalFetch.apply(this, args);

    // Track failed responses
    if (!response.ok) {
      trackApiFailure({
        url,
        status: response.status,
        duration: performance.now() - start,
      });
    }

    return response;
  } catch (error) {
    // Track network failures
    trackApiFailure({
      url,
      status: 0, // Network error
      duration: performance.now() - start,
      error: error instanceof Error ? error.message : 'Unknown',
    });
    throw error;
  }
};

// Aggregate metrics
interface FailureMetrics {
  endpoint: string;
  failureRate: number;
  avgLatency: number;
  errorsByType: Record<number, number>;
  lastFailure: number;
}

// Use this data to:
// 1. Identify problematic endpoints
// 2. Trigger circuit breakers proactively
// 3. Alert when failure rate exceeds threshold
// 4. A/B test retry strategies

Quick Reference

Failure Handling Checklist

## For Every Data-Fetching Component

### Loading States
□ Skeleton matches actual component layout
□ Loading state is shown immediately (no flash)
□ Long-running operations show progress

### Error States
□ Error boundary catches render errors
□ Fetch errors show actionable message
□ Retry option is available
□ Fallback content is meaningful

### Empty States
□ Distinguish between "loading" and "empty"
□ Empty state guides user to action
□ No confusing "No results" during load

### Stale Data
□ Show stale data with indicator vs. blocking
□ Background refresh doesn't clear content
□ Cache invalidation is intentional

## For Critical User Actions

### Mutations
□ Optimistic update where appropriate
□ Clear feedback during processing
□ Rollback on failure
□ Retry mechanism for transient failures

### Forms
□ Validation before submission
□ Preserve input on failure
□ Clear error messages per field
□ Alternative submission methods

### Payments/Sensitive
□ Never lose user data on failure
□ Multiple fallback paths
□ Customer support escape hatch
□ Transaction status is always queryable

Error Message Guidelines

┌─────────────────────────────────────────────────────────────────────────────┐
│                    ERROR MESSAGE PRINCIPLES                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ❌ Bad: "Error: ECONNREFUSED"                                              │
│  ✅ Good: "We couldn't connect to our servers. Please check your internet   │
│           connection and try again."                                        │
│                                                                              │
│  ❌ Bad: "Something went wrong"                                             │
│  ✅ Good: "We couldn't load your orders. [Try again] or [Contact support]" │
│                                                                              │
│  ❌ Bad: "Error 500"                                                        │
│  ✅ Good: "Our servers are having trouble. Your data is safe.              │
│           We're working on it."                                             │
│                                                                              │
│  ─────────────────────────────────────────────────────────────────────      │
│                                                                              │
│  Every error message should:                                                │
│  1. Explain what happened (in human terms)                                 │
│  2. Reassure (their data/action isn't lost, if true)                       │
│  3. Guide (what they can do next)                                          │
│  4. Offer alternatives (different paths to goal)                           │
│                                                                              │
│  Never:                                                                      │
│  • Show stack traces to users                                               │
│  • Use technical jargon                                                     │
│  • Blame the user                                                           │
│  • Leave them with no options                                               │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Closing Thoughts

Designing for failure isn't pessimism—it's engineering maturity. The best applications aren't the ones that never fail. They're the ones that fail gracefully, recover automatically, and keep users informed and productive.

The mental shift required:

  1. Assume failure, not success. Every API call can fail. Every third-party script can hang. Every user can lose connectivity. Design for this first.

  2. Degrade gracefully. Not all features are equally important. When things break, maintain core functionality while gracefully hiding broken parts.

  3. Fail visibly but helpfully. Users should know something is wrong, but they should also know what they can do about it. Every error is a UX opportunity.

  4. Recover automatically. Retry with backoff. Circuit breakers that reset. Queues that process when online. The system should heal itself when possible.

  5. Learn from failures. Track everything. Analyze patterns. Use failure data to improve. The goal isn't zero failures—it's continuous improvement in how you handle them.

The applications that earn user trust aren't the perfect ones. They're the ones that handle imperfection with grace.


Your users will forgive failures. They won't forgive failures that waste their time, lose their data, or leave them confused. Design accordingly.

What did you think?

© 2026 Vidhya Sagar Thakur. All rights reserved.