Async Processing System Design: Decoupling Work from Request-Response Cycles
Async Processing System Design: Decoupling Work from Request-Response Cycles
Introduction
Async processing means moving work out of the synchronous request-response path. Instead of making the user wait while you send an email, resize an image, or update a search index, you acknowledge the request immediately and process the work in the background. This improves response times, enables retry logic, absorbs traffic spikes, and lets you scale producers and consumers independently.
The core pattern: the request handler enqueues a task and returns immediately. A separate worker process dequeues and processes it later — "later" being milliseconds to minutes depending on the queue depth.
When to Go Async
Synchronous (keep in request path):
✅ Reading data the user needs in the response
✅ Validating input
✅ Deducting payment (user needs confirmation)
✅ Anything the response depends on
Asynchronous (move out of request path):
✅ Sending emails and notifications
✅ Image/video processing
✅ Updating search indexes
✅ Generating reports or PDFs
✅ Syncing data to analytics systems
✅ Calling slow third-party APIs
✅ Any work that can tolerate seconds-to-minutes delay
Architecture Patterns
Simple Producer-Consumer
┌──────────┐ ┌───────────┐ ┌──────────┐
│ API │────▶│ Queue │────▶│ Worker │
│ Server │ │ (SQS/ │ │ Process │
│ │ │ RabbitMQ)│ │ │
│ Enqueue │ │ │ │ Dequeue │
│ task │ │ Buffered │ │ & process│
└──────────┘ └───────────┘ └──────────┘
API server returns 202 Accepted immediately.
Worker processes the task at its own pace.
Queue absorbs spikes — if 1000 requests arrive in 1 second,
the worker processes them at whatever rate it can sustain.
Fan-Out for Parallel Processing
Single event → Multiple consumers process different aspects:
Order placed event
├──▶ Queue: email-notifications → Worker: Send confirmation email
├──▶ Queue: inventory-updates → Worker: Decrement stock
├──▶ Queue: analytics-events → Worker: Update dashboards
└──▶ Queue: search-index → Worker: Update product availability
Each consumer is independently scalable.
Email worker slow? Add more email workers. Analytics worker doesn't
affect email delivery.
Task with Status Tracking
import uuid
import json
from datetime import datetime
class TaskQueue:
def __init__(self, queue_client, db):
self.queue = queue_client
self.db = db
def submit(self, task_type, payload):
"""Submit a task and return a task ID for status polling."""
task_id = str(uuid.uuid4())
# Store task metadata for status queries
self.db.execute(
"INSERT INTO tasks (id, type, status, created_at) VALUES (%s,%s,%s,%s)",
(task_id, task_type, "pending", datetime.utcnow())
)
# Enqueue the task
self.queue.send_message(json.dumps({
"task_id": task_id,
"type": task_type,
"payload": payload,
}))
return task_id
def get_status(self, task_id):
"""Check task progress."""
row = self.db.query("SELECT status, result, error FROM tasks WHERE id=%s", (task_id,))
return row
# API handler
def create_report(request):
task_id = task_queue.submit("generate_report", {
"user_id": request.user_id,
"date_range": request.json["date_range"],
})
# Return immediately with a task ID
return {"task_id": task_id, "status_url": f"/tasks/{task_id}"}, 202
# Client polls for status:
# GET /tasks/abc-123 → {"status": "processing", "progress": 45}
# GET /tasks/abc-123 → {"status": "complete", "result_url": "/reports/abc-123"}
Message Queue Patterns
At-Least-Once Delivery
Most queues guarantee at-least-once delivery — your worker might receive the same message twice:
class Worker:
def process_message(self, message):
task = json.loads(message.body)
# Idempotency check: have we already processed this?
if self.db.exists("processed_tasks", task["task_id"]):
message.delete() # Already done, acknowledge and skip
return
# Process the task
result = self.do_work(task)
# Mark as processed (idempotency key) BEFORE deleting from queue
self.db.insert("processed_tasks", {
"task_id": task["task_id"],
"completed_at": datetime.utcnow(),
})
# Acknowledge (delete from queue)
message.delete()
Dead Letter Queue (DLQ)
When a message fails repeatedly, move it to a DLQ for investigation:
Main Queue → Worker attempts processing
→ Failure (3 times) → Dead Letter Queue
DLQ contains poison messages that couldn't be processed.
An operator reviews them, fixes the bug, and replays them.
AWS SQS configuration:
{
"maxReceiveCount": 3,
"deadLetterTargetArn": "arn:aws:sqs:us-east-1:123:my-queue-dlq"
}
After 3 failed attempts, message moves to DLQ automatically.
Visibility Timeout and Acknowledgment
SQS model:
1. Worker receives message (message becomes invisible to other workers)
2. Visibility timeout = 30 seconds
3. Worker processes the message
4. Worker deletes the message (acknowledgment)
If worker crashes before deleting:
→ After 30 seconds, message becomes visible again
→ Another worker picks it up (at-least-once delivery)
If processing takes longer than 30 seconds:
→ Message becomes visible → ANOTHER worker picks it up
→ Now two workers process the same message!
Fix: Extend visibility timeout during processing
Or: Set visibility timeout > max expected processing time
Scaling Workers
Worker scaling strategy:
Scale based on queue depth and processing rate.
queue_depth = 10,000 messages
processing_rate = 100 messages/second/worker
target_drain_time = 60 seconds
workers_needed = queue_depth / (processing_rate × target_drain_time)
workers_needed = 10,000 / (100 × 60) ≈ 2 workers
Kubernetes HPA can scale on custom metrics:
- Queue depth (SQS ApproximateNumberOfMessages)
- Processing latency
- Consumer lag (Kafka)
# Kubernetes: scale workers based on SQS queue depth
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: worker-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: task-worker
minReplicas: 2
maxReplicas: 50
metrics:
- type: External
external:
metric:
name: sqs_queue_depth
selector:
matchLabels:
queue: task-queue
target:
type: AverageValue
averageValue: "100" # Target 100 messages per worker
Async Processing Pitfalls
1. Fire-and-forget without retry
Task fails silently. Nobody knows. Data is lost.
Fix: DLQ, monitoring on queue depth and error rate.
2. No idempotency
Message delivered twice → user gets two emails, or payment charged twice.
Fix: Idempotency keys. Check before processing.
3. Queue as a database
Storing millions of messages for days → queue backs up → memory issues.
Fix: Process messages promptly. Use a database for long-term storage.
4. No backpressure
Producers enqueue faster than consumers can process → unbounded queue growth.
Fix: Rate-limit producers, auto-scale consumers, set queue size limits.
5. Ordering assumptions
SQS standard queues don't guarantee order.
Processing message 3 before message 1 can cause incorrect state.
Fix: Use FIFO queues when order matters, or design for out-of-order.
Key Takeaways
- Move non-essential work out of the request path — return 202 Accepted immediately; process emails, reports, and indexing asynchronously
- Queues decouple producers and consumers — producers and consumers scale independently; the queue absorbs traffic spikes
- Design workers to be idempotent — at-least-once delivery means your worker WILL see duplicates; use idempotency keys to handle them safely
- Dead letter queues catch poison messages — after N failures, park the message for investigation instead of retrying forever
- Scale workers on queue depth — more messages waiting = more workers needed; use HPA with custom queue metrics
- Provide task status for long-running work — return a task ID and a status endpoint so clients can poll for completion instead of waiting
- Monitor queue depth and consumer lag — a growing queue means consumers can't keep up; alert before it becomes a problem
What did you think?