API Gateway System Design

April 14, 202619 min read0 views

api gateway

API Gateway System Design

Real-World Problem Context

Your company runs 40 microservices. The mobile app needs to authenticate, then call the product service, the pricing service, the recommendation service, and the inventory service — all for a single product page. Each service has its own host, its own auth mechanism, its own rate limits. The mobile client is making 4-5 HTTP calls per screen, eating battery and data. On top of that, your team just shipped a breaking change to the pricing API, and now every client is broken.

This is exactly the problem an API gateway solves. It sits between your clients and your services, acting as a single entry point that handles authentication, routing, rate limiting, protocol translation, and response aggregation — so your clients talk to one endpoint and your services stay decoupled.

Problem Statement

Without a gateway, every client must:

Know the address of every service
Handle authentication with each service independently
Deal with different response formats and API versions
Make multiple round-trips for a single screen
Implement retry logic, circuit breaking, and timeout handling

This creates tight coupling between clients and services, makes cross-cutting concerns (auth, logging, rate limiting) inconsistent, and makes it nearly impossible to change your backend topology without breaking clients.

The core challenge: how do you provide a unified, stable API surface to clients while allowing backend services to evolve independently?

Potential Solutions

1. Simple Reverse Proxy (Nginx / Envoy)

Route requests based on URL path to different backend services:

# nginx.conf — basic API gateway
upstream product_service {
    server product-svc:8080;
    server product-svc-2:8080;
}

upstream pricing_service {
    server pricing-svc:8080;
}

server {
    listen 443 ssl;
    server_name api.example.com;

    # Route by path prefix
    location /api/v1/products {
        proxy_pass http://product_service;
        proxy_set_header X-Request-ID $request_id;
        proxy_connect_timeout 5s;
        proxy_read_timeout 30s;
    }

    location /api/v1/pricing {
        proxy_pass http://pricing_service;
        proxy_set_header X-Request-ID $request_id;
    }

    # Rate limiting
    limit_req_zone $binary_remote_addr zone=api:10m rate=100r/s;
    location /api/ {
        limit_req zone=api burst=50 nodelay;
    }
}

2. Dedicated API Gateway (Kong, AWS API Gateway, Apigee)

Full-featured gateway with plugins for auth, rate limiting, transformation:

# Kong declarative config
services:
  - name: product-service
    url: http://product-svc:8080
    routes:
      - name: products-route
        paths: ["/api/v1/products"]
        strip_path: false
    plugins:
      - name: jwt
        config:
          claims_to_verify: ["exp"]
      - name: rate-limiting
        config:
          minute: 60
          policy: redis
          redis_host: redis
      - name: request-transformer
        config:
          add:
            headers: ["X-Internal-Source:gateway"]

  - name: pricing-service
    url: http://pricing-svc:8080
    routes:
      - name: pricing-route
        paths: ["/api/v1/pricing"]
    plugins:
      - name: jwt
      - name: rate-limiting
        config:
          minute: 120

3. Custom Gateway with Request Aggregation

Build a gateway that composes multiple service calls into a single response:

from fastapi import FastAPI, Depends, HTTPException
from fastapi.security import HTTPBearer
import httpx
import asyncio

app = FastAPI()
security = HTTPBearer()

async def verify_token(credentials = Depends(security)):
    """Centralized auth — services don't need to validate tokens."""
    token = credentials.credentials
    # Validate JWT, check expiration, extract claims
    claims = jwt.decode(token, PUBLIC_KEY, algorithms=["RS256"])
    return claims

@app.get("/api/v1/product-page/{product_id}")
async def get_product_page(product_id: str, user = Depends(verify_token)):
    """
    Aggregate data from 3 services into a single response.
    Client makes 1 call instead of 3.
    """
    async with httpx.AsyncClient(timeout=5.0) as client:
        # Fan out to services in parallel
        product_task = client.get(f"http://product-svc:8080/products/{product_id}")
        pricing_task = client.get(f"http://pricing-svc:8080/pricing/{product_id}")
        inventory_task = client.get(f"http://inventory-svc:8080/stock/{product_id}")

        product_resp, pricing_resp, inventory_resp = await asyncio.gather(
            product_task, pricing_task, inventory_task,
            return_exceptions=True,
        )

    # Compose response — graceful degradation if a service fails
    result = {}
    if not isinstance(product_resp, Exception):
        result["product"] = product_resp.json()
    if not isinstance(pricing_resp, Exception):
        result["pricing"] = pricing_resp.json()
    else:
        result["pricing"] = {"error": "temporarily unavailable"}
    if not isinstance(inventory_resp, Exception):
        result["inventory"] = inventory_resp.json()

    return result

4. Backend for Frontend (BFF) Pattern

Separate gateways per client type:

┌──────────┐     ┌───────────────┐
│ Mobile   │────▶│ Mobile BFF    │──┐
│ App      │     │ (lightweight  │  │
└──────────┘     │  responses)   │  │
                 └───────────────┘  │   ┌─────────────┐
                                    ├──▶│ Product Svc │
┌──────────┐     ┌───────────────┐  │   ├─────────────┤
│ Web      │────▶│ Web BFF       │──┤   │ Pricing Svc │
│ Browser  │     │ (rich         │  │   ├─────────────┤
└──────────┘     │  responses)   │  │   │ Inventory   │
                 └───────────────┘  │   └─────────────┘
                                    │
┌──────────┐     ┌───────────────┐  │
│ 3rd Party│────▶│ Public API    │──┘
│ Devs     │     │ Gateway       │
└──────────┘     │ (versioned)   │
                 └───────────────┘

Trade-offs & Considerations

Approach          Pros                         Cons                        Best When
────────────────────────────────────────────────────────────────────────────────────
Reverse Proxy     Simple, fast, battle-tested  No aggregation, limited     Few services,
(Nginx/Envoy)     Low latency overhead         transformation              simple routing

Managed Gateway   Rich plugin ecosystem,       Vendor lock-in, cost at     Medium teams,
(Kong/AWS)        Managed infra, dashboards    scale, latency overhead     standard patterns

Custom Gateway    Full control, aggregation,   Maintenance burden,         Complex aggregation,
(Code)            exact business logic          another service to deploy  unique requirements

BFF Pattern       Optimal per-client UX,       Multiple gateways to        Multiple client
                  independent deployment       maintain, code duplication  types (mobile/web)

Best Practices

Keep the gateway thin — route, authenticate, rate-limit. Don't put business logic in the gateway. It should be a pass-through, not a monolith.
Centralize cross-cutting concerns — authentication, logging, request tracing, CORS, and rate limiting belong in the gateway, not in every service.
Set aggressive timeouts — the gateway should timeout faster than the client. If a backend is slow, fail fast and return a partial response.
Version your APIs at the gateway — route /api/v1/products to service-v1 and /api/v2/products to service-v2. Clients don't know about the internal routing.
Implement circuit breakers — if a backend service is failing, stop sending traffic to it. Return cached/default responses instead of cascading the failure.
Cache aggressively at the edge — GET responses with Cache-Control headers can be served from the gateway's cache, eliminating backend calls entirely.

Step-by-Step Approach

Step 1: Start with a reverse proxy (Nginx or Envoy)
  ├── Route by path prefix to backend services
  ├── Add TLS termination
  └── Configure health checks

Step 2: Add authentication
  ├── JWT validation at the gateway
  ├── Forward user claims as headers to services
  └── Services trust the gateway (internal network only)

Step 3: Add rate limiting and throttling
  ├── Per-client rate limits (API key based)
  ├── Global rate limits per endpoint
  └── Use Redis for distributed rate limit counters

Step 4: Add observability
  ├── Inject X-Request-ID for distributed tracing
  ├── Log request/response metadata (not bodies)
  └── Expose metrics: latency, error rate, throughput

Step 5: Add response aggregation (if needed)
  ├── Identify screens that call multiple services
  ├── Build composite endpoints in the gateway
  └── Fan out in parallel, merge responses

Step 6: Add resilience
  ├── Circuit breakers per backend service
  ├── Retry with exponential backoff for idempotent GETs
  └── Fallback responses for non-critical data

Conclusion

An API gateway is the front door to your microservices architecture. Start simple with a reverse proxy handling routing and TLS, then layer on auth, rate limiting, and observability as your system grows. Avoid the trap of putting business logic in the gateway — it should remain a thin orchestration layer. For complex client needs, consider the BFF pattern with separate gateways per client type. The key decision is build vs. buy: managed gateways (Kong, AWS API Gateway) save time but cost money and flexibility; custom gateways give full control but add maintenance burden.

What did you think?