Ambassador Pattern System Design
Ambassador Pattern System Design
Real-World Problem Context
Your application connects to a cloud-managed database. It needs connection retry logic with exponential backoff, TLS termination, circuit breaking when the database is unhealthy, and query logging for debugging. Every microservice in your fleet needs these same capabilities. Initially each team copies the same boilerplate. Then team A updates the retry logic but team B doesn't. Team C adds circuit breaking but team D's version has a bug. Six months later, you have 15 services with 15 slightly different database connectivity implementations, and nobody knows which version has the latest fixes.
The Ambassador pattern extracts cross-cutting connectivity concerns into a separate proxy process that runs alongside your application. Every service gets the same ambassador — same retries, same circuit breaking, same logging — without embedding that logic in application code.
Problem Statement
Distributed applications need resilient connectivity: retries, timeouts, circuit breaking, TLS, authentication, logging, and monitoring. These concerns are identical across services but get implemented inconsistently when embedded in application code. Different languages make it worse — your Go service and your Python service need the same retry logic written twice.
The core challenge: how do you provide consistent, language-agnostic connectivity features (retries, circuit breaking, TLS, observability) to all services without duplicating logic across every codebase?
Potential Solutions
1. Sidecar Ambassador for Database Connections
Run a proxy alongside your application that handles all database connectivity concerns:
Without ambassador: With ambassador:
┌──────────────────────┐ ┌──────────────────────┐
│ Application │ │ Application │
│ ┌────────────────┐ │ │ │
│ │ Retry logic │ │ │ connect to │
│ │ Circuit breaker│ │ │ localhost:5432 │
│ │ TLS config │ │ │ (plain TCP) │
│ │ Auth refresh │ │ └──────────┬───────────┘
│ │ Logging │ │ │ localhost
│ └────────┬───────┘ │ ┌──────────▼───────────┐
│ │ │ │ Ambassador Proxy │
└───────────┼───────────┘ │ ✓ Retry + backoff │
│ │ ✓ Circuit breaker │
│ TLS │ ✓ TLS termination │
▼ │ ✓ Auth token refresh │
┌──────────────────┐ │ ✓ Query logging │
│ Cloud Database │ │ ✓ Metrics export │
└──────────────────┘ └──────────┬───────────┘
│ TLS + Auth
▼
┌──────────────────┐
│ Cloud Database │
└──────────────────┘
Application thinks it's connecting to a local database.
Ambassador handles all the hard networking stuff.
# Kubernetes: ambassador as a sidecar container
apiVersion: apps/v1
kind: Deployment
metadata:
name: order-service
spec:
template:
spec:
containers:
# Main application
- name: order-service
image: order-service:v2
env:
- name: DB_HOST
value: "localhost" # Connects to ambassador, not DB directly
- name: DB_PORT
value: "5432"
# Ambassador sidecar
- name: db-ambassador
image: db-ambassador:v1
ports:
- containerPort: 5432
env:
- name: UPSTREAM_HOST
value: "mydb.us-east-1.rds.amazonaws.com"
- name: UPSTREAM_PORT
value: "5432"
- name: RETRY_MAX
value: "3"
- name: CIRCUIT_BREAKER_THRESHOLD
value: "5"
- name: TLS_ENABLED
value: "true"
2. Ambassador for External API Calls
Proxy outbound HTTP requests through an ambassador that adds auth, retries, and observability:
# Ambassador proxy (runs as sidecar on port 8081)
from fastapi import FastAPI, Request
import httpx
import time
app = FastAPI()
# Circuit breaker state
circuit = {"failures": 0, "state": "closed", "last_failure": 0}
@app.api_route("/{path:path}", methods=["GET", "POST", "PUT", "DELETE"])
async def proxy(request: Request, path: str):
upstream = request.headers.get("X-Upstream-Host", "api.payment-provider.com")
url = f"https://{upstream}/{path}"
# Circuit breaker check
if circuit["state"] == "open":
if time.time() - circuit["last_failure"] < 30:
return {"error": "circuit open"}, 503
circuit["state"] = "half-open"
# Retry with backoff
for attempt in range(3):
try:
# Add auth headers
headers = dict(request.headers)
headers["Authorization"] = f"Bearer {get_fresh_token()}"
headers["X-Request-ID"] = request.headers.get("X-Request-ID", str(uuid4()))
# Forward request
start = time.time()
async with httpx.AsyncClient() as client:
response = await client.request(
method=request.method,
url=url,
headers=headers,
content=await request.body(),
timeout=10.0
)
latency = time.time() - start
# Metrics
metrics.observe("ambassador_request_duration", latency,
labels={"upstream": upstream, "status": response.status_code})
# Circuit breaker: reset on success
circuit["failures"] = 0
circuit["state"] = "closed"
return Response(
content=response.content,
status_code=response.status_code,
headers=dict(response.headers)
)
except (httpx.ConnectTimeout, httpx.ReadTimeout):
circuit["failures"] += 1
if circuit["failures"] >= 5:
circuit["state"] = "open"
circuit["last_failure"] = time.time()
if attempt < 2:
await asyncio.sleep(2 ** attempt) # Exponential backoff
continue
raise
# Application code is simple — no retry/auth/circuit logic:
"""
response = requests.post(
"http://localhost:8081/v1/charges", # ← ambassador, not payment provider
headers={"X-Upstream-Host": "api.payment-provider.com"},
json={"amount": 9999, "currency": "usd"}
)
"""
3. Cloud-Specific Ambassadors (Real-World Examples)
Google Cloud SQL Auth Proxy:
┌────────────┐ ┌───────────────────┐ ┌──────────────┐
│ Application │──────▶│ Cloud SQL Proxy │──────▶│ Cloud SQL DB │
│ (localhost │ │ (sidecar) │ │ (TLS + IAM) │
│ :5432) │ │ - IAM auth │ │ │
└────────────┘ │ - TLS encryption │ └──────────────┘
│ - Connection pool │
│ - Auto-reconnect │
└───────────────────┘
AWS RDS Proxy:
┌────────────┐ ┌───────────────────┐ ┌──────────────┐
│ Lambda / │──────▶│ RDS Proxy │──────▶│ RDS Database │
│ Application │ │ (managed service) │ │ │
└────────────┘ │ - Connection pool │ └──────────────┘
│ - IAM auth │
│ - Failover │
└───────────────────┘
Envoy as Ambassador:
┌────────────┐ ┌───────────────────┐ ┌──────────────┐
│ Application │──────▶│ Envoy sidecar │──────▶│ External APIs│
│ (localhost │ │ - mTLS │ │ │
│ :8081) │ │ - Retries │ └──────────────┘
└────────────┘ │ - Circuit breaker │
│ - Rate limiting │
│ - Load balancing │
│ - Observability │
└───────────────────┘
# Envoy ambassador configuration
static_resources:
listeners:
- name: local_listener
address:
socket_address:
address: 127.0.0.1
port_value: 8081
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
route_config:
virtual_hosts:
- name: payment_api
routes:
- match: { prefix: "/" }
route:
cluster: payment_upstream
retry_policy:
retry_on: "5xx,connect-failure"
num_retries: 3
timeout: 10s
clusters:
- name: payment_upstream
connect_timeout: 5s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
circuit_breakers:
thresholds:
- max_connections: 100
max_pending_requests: 50
max_retries: 3
load_assignment:
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: api.payment-provider.com
port_value: 443
transport_socket:
name: envoy.transport_sockets.tls
Trade-offs & Considerations
Approach Complexity Resource Overhead Language Agnostic Consistency
──────────────────────────────────────────────────────────────────────────────────────────
Library (in-process) Low None No (per language) Varies
Ambassador sidecar Medium CPU + memory per pod Yes High
Managed proxy (RDS) Low Service cost Yes High
Service mesh High CPU + memory + CRDs Yes High
Ambassador vs. Service Mesh:
Ambassador: Service Mesh (Istio/Linkerd):
- One proxy per service - One proxy per pod (same sidecar)
- Hand-configured - Centrally managed (control plane)
- Specific to external calls - Handles ALL inter-service traffic
- Simple, targeted - Complex, comprehensive
Use ambassador when:
- You need to proxy specific external connections (database, 3rd party API)
- You don't want the overhead of a full service mesh
- You have 5-20 services, not 200
Use service mesh when:
- You need mTLS between ALL services
- You need traffic management across the entire fleet
- You have 50+ services
Best Practices
-
Keep the ambassador focused — an ambassador should handle connectivity concerns (retries, TLS, auth, circuit breaking), not business logic. If you're putting request transformation or routing logic in it, you might need an API gateway instead.
-
Make the application unaware of the ambassador — the app connects to
localhost:5432and doesn't know whether it's talking to a local proxy or a real database. This keeps application code simple and ambassador swappable. -
Share one ambassador image across all services — the whole point is consistency. If each team builds their own ambassador, you've recreated the original problem.
-
Set resource limits on the sidecar — in Kubernetes, the ambassador sidecar consumes CPU and memory. Set
resources.limitsto prevent it from starving the main application. -
Health check the ambassador, not just the app — if the ambassador sidecar crashes, the application loses connectivity. Include ambassador health in your readiness probes.
-
Centralize ambassador configuration — use ConfigMaps or a configuration service so all ambassadors get the same retry/timeout/circuit-breaker settings. Don't hardcode per deployment.
Step-by-Step Approach
Step 1: Identify the connectivity concern
├── Database connections needing TLS + auth + retries?
├── External API calls needing retry + circuit breaker?
└── Multiple services duplicating the same connectivity code?
Step 2: Choose or build the ambassador
├── Cloud database? → Use vendor proxy (Cloud SQL Proxy, RDS Proxy)
├── HTTP APIs? → Envoy sidecar with retry/circuit config
├── Custom protocol? → Build a lightweight proxy
└── Already have a service mesh? → Use it instead
Step 3: Deploy as a sidecar
├── Same pod as the application (shared network namespace)
├── Application connects to localhost:PORT
├── Ambassador forwards to upstream with resilience logic
└── Set resource limits and health checks
Step 4: Remove connectivity logic from application code
├── Remove retry logic → ambassador handles it
├── Remove TLS configuration → ambassador handles it
├── Remove auth token refresh → ambassador handles it
└── Application code becomes: connect to localhost, send request
Step 5: Add observability
├── Ambassador exports metrics (request count, latency, errors)
├── Dashboard: upstream health per service
├── Alert: circuit breaker open events
└── Logs: request traces with correlation IDs
Step 6: Standardize across fleet
├── One ambassador image, versioned and tested
├── Configuration via ConfigMap (not baked into image)
├── Upgrade path: roll out new ambassador version across all services
└── Monitor: ensure all services run the same ambassador version
Conclusion
The Ambassador pattern moves connectivity concerns — retries, circuit breaking, TLS, authentication, logging — out of application code and into a sidecar proxy. This gives you consistency (every service gets the same resilience logic), language independence (the Go service and the Python service use the same ambassador), and simplicity (application code just connects to localhost). Use vendor-provided proxies for cloud databases (Cloud SQL Proxy, RDS Proxy), Envoy for HTTP/gRPC external calls, or build a lightweight custom proxy for specialized protocols. The key decision: if you need these capabilities across ALL inter-service traffic, a service mesh is more appropriate. If you need them for specific external connections, the ambassador pattern is simpler and more targeted.
What did you think?