Ambassador Pattern System Design

April 13, 202629 min read0 views

ambassador pattern

Ambassador Pattern System Design

Real-World Problem Context

Your application connects to a cloud-managed database. It needs connection retry logic with exponential backoff, TLS termination, circuit breaking when the database is unhealthy, and query logging for debugging. Every microservice in your fleet needs these same capabilities. Initially each team copies the same boilerplate. Then team A updates the retry logic but team B doesn't. Team C adds circuit breaking but team D's version has a bug. Six months later, you have 15 services with 15 slightly different database connectivity implementations, and nobody knows which version has the latest fixes.

The Ambassador pattern extracts cross-cutting connectivity concerns into a separate proxy process that runs alongside your application. Every service gets the same ambassador — same retries, same circuit breaking, same logging — without embedding that logic in application code.

Problem Statement

Distributed applications need resilient connectivity: retries, timeouts, circuit breaking, TLS, authentication, logging, and monitoring. These concerns are identical across services but get implemented inconsistently when embedded in application code. Different languages make it worse — your Go service and your Python service need the same retry logic written twice.

The core challenge: how do you provide consistent, language-agnostic connectivity features (retries, circuit breaking, TLS, observability) to all services without duplicating logic across every codebase?

Potential Solutions

1. Sidecar Ambassador for Database Connections

Run a proxy alongside your application that handles all database connectivity concerns:

Without ambassador:                  With ambassador:

  ┌──────────────────────┐           ┌──────────────────────┐
  │  Application         │           │  Application         │
  │  ┌────────────────┐  │           │                      │
  │  │ Retry logic    │  │           │  connect to          │
  │  │ Circuit breaker│  │           │  localhost:5432       │
  │  │ TLS config     │  │           │  (plain TCP)         │
  │  │ Auth refresh   │  │           └──────────┬───────────┘
  │  │ Logging        │  │                      │ localhost
  │  └────────┬───────┘  │           ┌──────────▼───────────┐
  │           │           │           │  Ambassador Proxy    │
  └───────────┼───────────┘           │  ✓ Retry + backoff   │
              │                       │  ✓ Circuit breaker   │
              │ TLS                   │  ✓ TLS termination   │
              ▼                       │  ✓ Auth token refresh │
    ┌──────────────────┐              │  ✓ Query logging     │
    │  Cloud Database  │              │  ✓ Metrics export    │
    └──────────────────┘              └──────────┬───────────┘
                                                 │ TLS + Auth
                                                 ▼
                                      ┌──────────────────┐
                                      │  Cloud Database  │
                                      └──────────────────┘

  Application thinks it's connecting to a local database.
  Ambassador handles all the hard networking stuff.

# Kubernetes: ambassador as a sidecar container
apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-service
spec:
  template:
    spec:
      containers:
        # Main application
        - name: order-service
          image: order-service:v2
          env:
            - name: DB_HOST
              value: "localhost"     # Connects to ambassador, not DB directly
            - name: DB_PORT
              value: "5432"
        
        # Ambassador sidecar
        - name: db-ambassador
          image: db-ambassador:v1
          ports:
            - containerPort: 5432
          env:
            - name: UPSTREAM_HOST
              value: "mydb.us-east-1.rds.amazonaws.com"
            - name: UPSTREAM_PORT
              value: "5432"
            - name: RETRY_MAX
              value: "3"
            - name: CIRCUIT_BREAKER_THRESHOLD
              value: "5"
            - name: TLS_ENABLED
              value: "true"

2. Ambassador for External API Calls

Proxy outbound HTTP requests through an ambassador that adds auth, retries, and observability:

# Ambassador proxy (runs as sidecar on port 8081)
from fastapi import FastAPI, Request
import httpx
import time

app = FastAPI()

# Circuit breaker state
circuit = {"failures": 0, "state": "closed", "last_failure": 0}

@app.api_route("/{path:path}", methods=["GET", "POST", "PUT", "DELETE"])
async def proxy(request: Request, path: str):
    upstream = request.headers.get("X-Upstream-Host", "api.payment-provider.com")
    url = f"https://{upstream}/{path}"
    
    # Circuit breaker check
    if circuit["state"] == "open":
        if time.time() - circuit["last_failure"] < 30:
            return {"error": "circuit open"}, 503
        circuit["state"] = "half-open"
    
    # Retry with backoff
    for attempt in range(3):
        try:
            # Add auth headers
            headers = dict(request.headers)
            headers["Authorization"] = f"Bearer {get_fresh_token()}"
            headers["X-Request-ID"] = request.headers.get("X-Request-ID", str(uuid4()))
            
            # Forward request
            start = time.time()
            async with httpx.AsyncClient() as client:
                response = await client.request(
                    method=request.method,
                    url=url,
                    headers=headers,
                    content=await request.body(),
                    timeout=10.0
                )
            
            latency = time.time() - start
            
            # Metrics
            metrics.observe("ambassador_request_duration", latency, 
                          labels={"upstream": upstream, "status": response.status_code})
            
            # Circuit breaker: reset on success
            circuit["failures"] = 0
            circuit["state"] = "closed"
            
            return Response(
                content=response.content,
                status_code=response.status_code,
                headers=dict(response.headers)
            )
        
        except (httpx.ConnectTimeout, httpx.ReadTimeout):
            circuit["failures"] += 1
            if circuit["failures"] >= 5:
                circuit["state"] = "open"
                circuit["last_failure"] = time.time()
            
            if attempt < 2:
                await asyncio.sleep(2 ** attempt)  # Exponential backoff
                continue
            raise

# Application code is simple — no retry/auth/circuit logic:
"""
response = requests.post(
    "http://localhost:8081/v1/charges",  # ← ambassador, not payment provider
    headers={"X-Upstream-Host": "api.payment-provider.com"},
    json={"amount": 9999, "currency": "usd"}
)
"""

3. Cloud-Specific Ambassadors (Real-World Examples)

Google Cloud SQL Auth Proxy:
  ┌────────────┐       ┌───────────────────┐       ┌──────────────┐
  │ Application │──────▶│ Cloud SQL Proxy   │──────▶│ Cloud SQL DB │
  │ (localhost  │       │ (sidecar)         │       │ (TLS + IAM)  │
  │  :5432)     │       │ - IAM auth        │       │              │
  └────────────┘       │ - TLS encryption  │       └──────────────┘
                        │ - Connection pool │
                        │ - Auto-reconnect  │
                        └───────────────────┘

AWS RDS Proxy:
  ┌────────────┐       ┌───────────────────┐       ┌──────────────┐
  │ Lambda /   │──────▶│ RDS Proxy         │──────▶│ RDS Database │
  │ Application │       │ (managed service) │       │              │
  └────────────┘       │ - Connection pool │       └──────────────┘
                        │ - IAM auth        │
                        │ - Failover        │
                        └───────────────────┘

Envoy as Ambassador:
  ┌────────────┐       ┌───────────────────┐       ┌──────────────┐
  │ Application │──────▶│ Envoy sidecar     │──────▶│ External APIs│
  │ (localhost  │       │ - mTLS            │       │              │
  │  :8081)     │       │ - Retries         │       └──────────────┘
  └────────────┘       │ - Circuit breaker │
                        │ - Rate limiting   │
                        │ - Load balancing  │
                        │ - Observability   │
                        └───────────────────┘

# Envoy ambassador configuration
static_resources:
  listeners:
    - name: local_listener
      address:
        socket_address:
          address: 127.0.0.1
          port_value: 8081
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                route_config:
                  virtual_hosts:
                    - name: payment_api
                      routes:
                        - match: { prefix: "/" }
                          route:
                            cluster: payment_upstream
                            retry_policy:
                              retry_on: "5xx,connect-failure"
                              num_retries: 3
                            timeout: 10s
  clusters:
    - name: payment_upstream
      connect_timeout: 5s
      type: STRICT_DNS
      lb_policy: ROUND_ROBIN
      circuit_breakers:
        thresholds:
          - max_connections: 100
            max_pending_requests: 50
            max_retries: 3
      load_assignment:
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: api.payment-provider.com
                      port_value: 443
      transport_socket:
        name: envoy.transport_sockets.tls

Trade-offs & Considerations

Approach              Complexity    Resource Overhead    Language Agnostic    Consistency
──────────────────────────────────────────────────────────────────────────────────────────
Library (in-process)  Low           None                 No (per language)    Varies
Ambassador sidecar    Medium        CPU + memory per pod Yes                  High
Managed proxy (RDS)   Low           Service cost         Yes                  High
Service mesh          High          CPU + memory + CRDs  Yes                  High

Ambassador vs. Service Mesh:

Ambassador:                          Service Mesh (Istio/Linkerd):
  - One proxy per service            - One proxy per pod (same sidecar)
  - Hand-configured                  - Centrally managed (control plane)
  - Specific to external calls       - Handles ALL inter-service traffic
  - Simple, targeted                 - Complex, comprehensive

Use ambassador when:
  - You need to proxy specific external connections (database, 3rd party API)
  - You don't want the overhead of a full service mesh
  - You have 5-20 services, not 200

Use service mesh when:
  - You need mTLS between ALL services
  - You need traffic management across the entire fleet
  - You have 50+ services

Best Practices

Keep the ambassador focused — an ambassador should handle connectivity concerns (retries, TLS, auth, circuit breaking), not business logic. If you're putting request transformation or routing logic in it, you might need an API gateway instead.
Make the application unaware of the ambassador — the app connects to localhost:5432 and doesn't know whether it's talking to a local proxy or a real database. This keeps application code simple and ambassador swappable.
Share one ambassador image across all services — the whole point is consistency. If each team builds their own ambassador, you've recreated the original problem.
Set resource limits on the sidecar — in Kubernetes, the ambassador sidecar consumes CPU and memory. Set resources.limits to prevent it from starving the main application.
Health check the ambassador, not just the app — if the ambassador sidecar crashes, the application loses connectivity. Include ambassador health in your readiness probes.
Centralize ambassador configuration — use ConfigMaps or a configuration service so all ambassadors get the same retry/timeout/circuit-breaker settings. Don't hardcode per deployment.

Step-by-Step Approach

Step 1: Identify the connectivity concern
  ├── Database connections needing TLS + auth + retries?
  ├── External API calls needing retry + circuit breaker?
  └── Multiple services duplicating the same connectivity code?

Step 2: Choose or build the ambassador
  ├── Cloud database? → Use vendor proxy (Cloud SQL Proxy, RDS Proxy)
  ├── HTTP APIs? → Envoy sidecar with retry/circuit config
  ├── Custom protocol? → Build a lightweight proxy
  └── Already have a service mesh? → Use it instead

Step 3: Deploy as a sidecar
  ├── Same pod as the application (shared network namespace)
  ├── Application connects to localhost:PORT
  ├── Ambassador forwards to upstream with resilience logic
  └── Set resource limits and health checks

Step 4: Remove connectivity logic from application code
  ├── Remove retry logic → ambassador handles it
  ├── Remove TLS configuration → ambassador handles it
  ├── Remove auth token refresh → ambassador handles it
  └── Application code becomes: connect to localhost, send request

Step 5: Add observability
  ├── Ambassador exports metrics (request count, latency, errors)
  ├── Dashboard: upstream health per service
  ├── Alert: circuit breaker open events
  └── Logs: request traces with correlation IDs

Step 6: Standardize across fleet
  ├── One ambassador image, versioned and tested
  ├── Configuration via ConfigMap (not baked into image)
  ├── Upgrade path: roll out new ambassador version across all services
  └── Monitor: ensure all services run the same ambassador version

Conclusion

The Ambassador pattern moves connectivity concerns — retries, circuit breaking, TLS, authentication, logging — out of application code and into a sidecar proxy. This gives you consistency (every service gets the same resilience logic), language independence (the Go service and the Python service use the same ambassador), and simplicity (application code just connects to localhost). Use vendor-provided proxies for cloud databases (Cloud SQL Proxy, RDS Proxy), Envoy for HTTP/gRPC external calls, or build a lightweight custom proxy for specialized protocols. The key decision: if you need these capabilities across ALL inter-service traffic, a service mesh is more appropriate. If you need them for specific external connections, the ambassador pattern is simpler and more targeted.

What did you think?