System Design & Architecture
Part 2 of 9Designing for Scale: When to Use Microservices vs Monolith (And the Honest Tradeoffs)
Designing for Scale: When to Use Microservices vs Monolith (And the Honest Tradeoffs)
The architecture decision that can make or break your engineering organization
The Uncomfortable Truth
Here's a confession that might surprise you: most systems that "needed" microservices didn't. And many monoliths that were "too slow" just needed better architecture, not more services.
I've seen teams spend two years migrating to microservices only to realize they traded one set of problems for a much harder set. I've also seen monoliths collapse under their own weight because teams refused to acknowledge they'd outgrown their architecture.
The honest truth? Both architectures can scale. Both can fail. The difference isn't the architecture itself—it's whether it matches your team's size, skills, and actual problems.
Let me share what I've learned from building both, failing at both, and eventually getting it right.
Understanding the Spectrum
First, let's kill a myth: it's not binary. There's a spectrum of architectural choices:
┌─────────────────────────────────────────────────────────────────────────────┐
│ THE ARCHITECTURE SPECTRUM │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Monolith ───────────────────────────────────────────────► Microservices │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Simple │ │ Modular │ │ Service- │ │ Mini- │ │ Micro- │ │
│ │ Monolith │ │ Monolith │ │ Oriented │ │ Services │ │ Services │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │
│ 1 deployment 1 deployment 3-5 services 10-20 svc 50+ services │
│ tight coupling loose modules clear APIs per team per function │
│ 1-5 devs 5-20 devs 20-50 devs 50-100 dev 100+ devs │
│ │
│ Complexity: ●○○○○ ●●○○○ ●●●○○ ●●●●○ ●●●●● │
│ Ops burden: ●○○○○ ●○○○○ ●●○○○ ●●●○○ ●●●●● │
│ Autonomy: ●○○○○ ●●○○○ ●●●○○ ●●●●○ ●●●●● │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Most successful companies I've worked with land somewhere in the middle, not at the extremes.
The Monolith: Unfairly Maligned
What a Good Monolith Looks Like
A well-architected monolith isn't a "big ball of mud." It's a cohesive, modular system:
// A modular monolith structure
src/
├── modules/
│ ├── users/
│ │ ├── domain/ # Business logic
│ │ │ ├── entities/
│ │ │ ├── services/
│ │ │ └── events/
│ │ ├── application/ # Use cases
│ │ │ ├── commands/
│ │ │ └── queries/
│ │ ├── infrastructure/ # External concerns
│ │ │ ├── repositories/
│ │ │ └── adapters/
│ │ └── api/ # HTTP/GraphQL handlers
│ │ ├── routes/
│ │ └── middleware/
│ │
│ ├── orders/
│ │ └── ... (same structure)
│ │
│ ├── inventory/
│ │ └── ... (same structure)
│ │
│ └── payments/
│ └── ... (same structure)
│
├── shared/ # Cross-cutting concerns
│ ├── kernel/ # Shared domain primitives
│ ├── infrastructure/ # Common infra (db, cache)
│ └── utils/ # Pure utilities
│
└── app.ts # Composition root
Module Boundaries That Work
// modules/users/index.ts - The module's public API
// This is ALL other modules can access
export { UserService } from './application/user-service';
export { User, UserId } from './domain/entities/user';
export { UserCreatedEvent } from './domain/events';
// Types for cross-module communication
export interface UserReference {
id: UserId;
email: string;
name: string;
}
// The module facade - clean interface for other modules
export class UsersModule {
constructor(
private readonly userService: UserService,
private readonly eventBus: EventBus
) {}
async getUser(id: UserId): Promise<UserReference | null> {
const user = await this.userService.findById(id);
if (!user) return null;
return {
id: user.id,
email: user.email,
name: user.name
};
}
async createUser(data: CreateUserDto): Promise<UserReference> {
const user = await this.userService.create(data);
await this.eventBus.publish(new UserCreatedEvent(user));
return {
id: user.id,
email: user.email,
name: user.name
};
}
}
Enforcing Module Boundaries
// eslint-plugin-module-boundaries.ts
// Prevent modules from reaching into each other's internals
const moduleDirectories = [
'users', 'orders', 'inventory', 'payments'
];
module.exports = {
rules: {
'no-cross-module-imports': {
create(context) {
return {
ImportDeclaration(node) {
const importPath = node.source.value;
const currentFile = context.getFilename();
const currentModule = moduleDirectories.find(
m => currentFile.includes(`/modules/${m}/`)
);
const importedModule = moduleDirectories.find(
m => importPath.includes(`/modules/${m}/`)
);
if (currentModule && importedModule &&
currentModule !== importedModule) {
// Only allow importing from module's index.ts
if (!importPath.endsWith(`/modules/${importedModule}`) &&
!importPath.endsWith(`/modules/${importedModule}/index`)) {
context.report({
node,
message: `Cross-module import must use public API. ` +
`Import from 'modules/${importedModule}' instead.`
});
}
}
}
};
}
}
}
};
The Real Advantages of Monoliths
┌─────────────────────────────────────────────────────────────────────────────┐
│ MONOLITH ADVANTAGES (HONEST) │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ✓ Simplicity │
│ • One codebase to understand │
│ • One deployment pipeline │
│ • One place to debug │
│ • IDE "go to definition" works across everything │
│ │
│ ✓ Performance │
│ • In-process function calls (nanoseconds vs milliseconds) │
│ • No network serialization overhead │
│ • Easy to optimize hot paths │
│ • Shared connection pools │
│ │
│ ✓ Data Consistency │
│ • ACID transactions across all data │
│ • No distributed transaction complexity │
│ • Referential integrity via foreign keys │
│ • Easy to maintain data invariants │
│ │
│ ✓ Development Speed (for small teams) │
│ • One PR = one feature │
│ • Easy refactoring across boundaries │
│ • No service version compatibility issues │
│ • Simple local development │
│ │
│ ✓ Operational Simplicity │
│ • One thing to monitor │
│ • One thing to scale │
│ • One log stream │
│ • Simpler infrastructure │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
The Real Problems With Monoliths
When They Actually Break Down
┌─────────────────────────────────────────────────────────────────────────────┐
│ WHEN MONOLITHS STRUGGLE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Team Scale Problems │
│ ├── 30+ engineers = merge conflicts daily │
│ ├── Release trains become complex │
│ ├── Testing time grows non-linearly │
│ └── Code ownership becomes unclear │
│ │
│ Deployment Bottlenecks │
│ ├── All-or-nothing deploys │
│ ├── One bug blocks all features │
│ ├── Long build/test cycles (30+ min) │
│ └── Rollback affects everything │
│ │
│ Scaling Constraints │
│ ├── Can't scale components independently │
│ ├── Memory-hungry features affect everything │
│ ├── CPU-intensive tasks can't be isolated │
│ └── Different SLAs hard to maintain │
│ │
│ Technology Lock-in │
│ ├── Stuck with one language/framework │
│ ├── Library upgrades are all-or-nothing │
│ ├── Can't adopt better tools for specific problems │
│ └── Technical debt accumulates │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Scaling a Monolith Before Splitting
Before jumping to microservices, try these first:
// 1. Horizontal scaling with load balancing
// Works for most read-heavy workloads
// nginx.conf
upstream monolith {
least_conn;
server app1:3000 weight=1;
server app2:3000 weight=1;
server app3:3000 weight=1;
server app4:3000 weight=1;
}
// 2. Read replicas for database
// config/database.ts
const readPool = new Pool({
host: process.env.DB_READ_REPLICA,
max: 50,
});
const writePool = new Pool({
host: process.env.DB_PRIMARY,
max: 20,
});
export class Database {
query(sql: string, params: any[], readonly = false) {
const pool = readonly ? readPool : writePool;
return pool.query(sql, params);
}
}
// 3. Background job processing
// Extract heavy work from request path
import { Queue, Worker } from 'bullmq';
const emailQueue = new Queue('emails');
const reportQueue = new Queue('reports');
// API endpoint - fast response
app.post('/api/reports', async (req, res) => {
const job = await reportQueue.add('generate', {
userId: req.user.id,
params: req.body
});
res.json({
status: 'processing',
jobId: job.id
});
});
// 4. Caching layer
import Redis from 'ioredis';
class CacheLayer {
constructor(private redis: Redis) {}
async cached<T>(
key: string,
ttlSeconds: number,
factory: () => Promise<T>
): Promise<T> {
const cached = await this.redis.get(key);
if (cached) return JSON.parse(cached);
const value = await factory();
await this.redis.setex(key, ttlSeconds, JSON.stringify(value));
return value;
}
}
// 5. Database query optimization
// Often the biggest win before any architecture change
class OrderRepository {
async getOrdersWithDetails(userId: string) {
// Before: N+1 queries
// After: Single optimized query with proper indexes
return this.db.query(`
SELECT
o.*,
json_agg(DISTINCT oi.*) as items,
json_agg(DISTINCT p.*) as products
FROM orders o
LEFT JOIN order_items oi ON oi.order_id = o.id
LEFT JOIN products p ON p.id = oi.product_id
WHERE o.user_id = $1
GROUP BY o.id
ORDER BY o.created_at DESC
`, [userId], { readonly: true });
}
}
Microservices: The Honest Assessment
What You're Actually Signing Up For
┌─────────────────────────────────────────────────────────────────────────────┐
│ MICROSERVICES REALITY CHECK │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ What They Promise │ What They Actually Require │
│ ────────────────────────────────────────────────────────────────────── │
│ Independent deployments │ Versioned APIs, backward compatibility │
│ Team autonomy │ Clear ownership, contract testing │
│ Scale independently │ Service mesh, load balancers per service │
│ Technology diversity │ Polyglot ops expertise, more tooling │
│ Fault isolation │ Circuit breakers, bulkheads, retries │
│ Faster development │ Only after ~6 months of infrastructure │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ New Problems You'll Have: │
│ • Distributed tracing (why is this request slow?) │
│ • Service discovery (where is service X?) │
│ • Data consistency (saga patterns, eventual consistency) │
│ • Network failures (timeouts, retries, idempotency) │
│ • Testing across services (integration, contract, e2e) │
│ • Local development (running 15 services on laptop) │
│ • Debugging (logs across 10 services) │
│ • Deployment orchestration (what order? health checks?) │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
The Infrastructure Tax
Before writing your first microservice, you need:
// 1. Service discovery
// services/registry.ts
interface ServiceRegistry {
register(service: ServiceInfo): Promise<void>;
discover(serviceName: string): Promise<ServiceInstance[]>;
healthCheck(instance: ServiceInstance): Promise<boolean>;
}
// Consul, etcd, or Kubernetes DNS
// 2. API Gateway
// gateway/routes.ts
const routes = {
'/api/users/*': {
service: 'user-service',
rateLimit: { requests: 100, window: '1m' },
auth: 'required'
},
'/api/orders/*': {
service: 'order-service',
rateLimit: { requests: 50, window: '1m' },
auth: 'required'
},
'/api/products/*': {
service: 'product-service',
rateLimit: { requests: 200, window: '1m' },
auth: 'optional'
}
};
// 3. Distributed tracing
// instrumentation/tracing.ts
import { trace, context } from '@opentelemetry/api';
export function tracedFetch(
url: string,
options: RequestInit
): Promise<Response> {
const span = trace.getTracer('http').startSpan(`HTTP ${options.method}`);
const headers = {
...options.headers,
'traceparent': getTraceParent(span),
'tracestate': getTraceState(span)
};
return fetch(url, { ...options, headers })
.then(response => {
span.setStatus({ code: response.ok ? 0 : 1 });
return response;
})
.finally(() => span.end());
}
// 4. Circuit breaker
// resilience/circuit-breaker.ts
class CircuitBreaker {
private failures = 0;
private lastFailure?: Date;
private state: 'closed' | 'open' | 'half-open' = 'closed';
constructor(
private threshold: number = 5,
private resetTimeout: number = 30000
) {}
async execute<T>(fn: () => Promise<T>): Promise<T> {
if (this.state === 'open') {
if (Date.now() - this.lastFailure!.getTime() > this.resetTimeout) {
this.state = 'half-open';
} else {
throw new CircuitOpenError();
}
}
try {
const result = await fn();
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
throw error;
}
}
private onSuccess() {
this.failures = 0;
this.state = 'closed';
}
private onFailure() {
this.failures++;
this.lastFailure = new Date();
if (this.failures >= this.threshold) {
this.state = 'open';
}
}
}
// 5. Centralized logging
// logging/structured.ts
import pino from 'pino';
const logger = pino({
formatters: {
level: (label) => ({ level: label }),
},
mixin() {
return {
service: process.env.SERVICE_NAME,
version: process.env.SERVICE_VERSION,
traceId: context.active().getValue('traceId')
};
}
});
// Ship to centralized logging (ELK, Datadog, etc.)
// 6. Health checks
// health/checks.ts
app.get('/health', async (req, res) => {
const checks = await Promise.allSettled([
checkDatabase(),
checkRedis(),
checkDependentServices()
]);
const healthy = checks.every(c => c.status === 'fulfilled');
res.status(healthy ? 200 : 503).json({
status: healthy ? 'healthy' : 'unhealthy',
checks: checks.map((c, i) => ({
name: ['database', 'redis', 'dependencies'][i],
status: c.status
}))
});
});
Data Management Complexity
This is where microservices get hard:
// The Challenge: Cross-service transactions
// Scenario: Create order (Order Service) + Reserve inventory (Inventory Service)
// + Charge payment (Payment Service)
// Solution: Saga Pattern (Choreography)
// order-service/events.ts
class OrderService {
async createOrder(data: CreateOrderDto): Promise<Order> {
const order = await this.repository.create({
...data,
status: 'pending'
});
// Publish event to start the saga
await this.eventBus.publish(new OrderCreatedEvent({
orderId: order.id,
items: order.items,
userId: order.userId,
total: order.total
}));
return order;
}
// Handle saga completion/failure
@Subscribe(PaymentCompletedEvent)
async onPaymentCompleted(event: PaymentCompletedEvent) {
await this.repository.update(event.orderId, {
status: 'confirmed',
paymentId: event.paymentId
});
await this.eventBus.publish(new OrderConfirmedEvent({
orderId: event.orderId
}));
}
@Subscribe(PaymentFailedEvent)
async onPaymentFailed(event: PaymentFailedEvent) {
await this.repository.update(event.orderId, {
status: 'failed',
failureReason: event.reason
});
// Trigger compensating transactions
await this.eventBus.publish(new OrderCancelledEvent({
orderId: event.orderId,
reason: 'payment_failed'
}));
}
}
// inventory-service/events.ts
class InventoryService {
@Subscribe(OrderCreatedEvent)
async onOrderCreated(event: OrderCreatedEvent) {
try {
const reservation = await this.reserveStock(
event.items,
event.orderId
);
await this.eventBus.publish(new InventoryReservedEvent({
orderId: event.orderId,
reservationId: reservation.id
}));
} catch (error) {
await this.eventBus.publish(new InventoryReservationFailedEvent({
orderId: event.orderId,
reason: error.message
}));
}
}
// Compensating transaction
@Subscribe(OrderCancelledEvent)
async onOrderCancelled(event: OrderCancelledEvent) {
await this.releaseReservation(event.orderId);
}
}
┌─────────────────────────────────────────────────────────────────────────────┐
│ SAGA PATTERN FLOW │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Happy Path: │
│ ┌─────────┐ ┌───────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Order │───►│ Inventory │───►│ Payment │───►│ Order │ │
│ │ Created │ │ Reserved │ │ Charged │ │Confirmed│ │
│ └─────────┘ └───────────┘ └─────────┘ └─────────┘ │
│ │
│ Failure Path (Payment fails): │
│ ┌─────────┐ ┌───────────┐ ┌─────────┐ │
│ │ Order │───►│ Inventory │───►│ Payment │ │
│ │ Created │ │ Reserved │ │ Failed │ │
│ └─────────┘ └───────────┘ └────┬────┘ │
│ │ │
│ Compensating ▼ │
│ ◄──────────────────────────────────── │
│ │ │
│ ┌─────────┐ ┌───────────┐ ┌─────────┐ │
│ │ Order │◄───│ Inventory │◄───│ Order │ │
│ │ Failed │ │ Released │ │Cancelled│ │
│ └─────────┘ └───────────┘ └─────────┘ │
│ │
│ Key Challenge: Every step needs a compensating action │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
The Modular Monolith: Best of Both Worlds?
This is my recommended starting point for most teams:
// Structure: Monolith deployment, microservice architecture
src/
├── modules/
│ ├── users/
│ ├── orders/
│ ├── inventory/
│ └── payments/
├── shared/
└── infrastructure/
├── messaging/ # In-process event bus (easy to switch to real queue)
├── database/ # Shared connection, but logical separation
└── http/ # Single entry point
// Messaging abstraction that works in-process AND distributed
// infrastructure/messaging/event-bus.ts
interface EventBus {
publish<T extends DomainEvent>(event: T): Promise<void>;
subscribe<T extends DomainEvent>(
eventType: new (...args: any[]) => T,
handler: (event: T) => Promise<void>
): void;
}
// In-process implementation (start here)
class InProcessEventBus implements EventBus {
private handlers = new Map<string, Function[]>();
async publish<T extends DomainEvent>(event: T): Promise<void> {
const handlers = this.handlers.get(event.constructor.name) || [];
await Promise.all(handlers.map(h => h(event)));
}
subscribe<T extends DomainEvent>(
eventType: new (...args: any[]) => T,
handler: (event: T) => Promise<void>
): void {
const name = eventType.name;
const existing = this.handlers.get(name) || [];
this.handlers.set(name, [...existing, handler]);
}
}
// Distributed implementation (switch when ready)
class RabbitMQEventBus implements EventBus {
constructor(private channel: Channel) {}
async publish<T extends DomainEvent>(event: T): Promise<void> {
await this.channel.publish(
'domain_events',
event.constructor.name,
Buffer.from(JSON.stringify(event))
);
}
subscribe<T extends DomainEvent>(
eventType: new (...args: any[]) => T,
handler: (event: T) => Promise<void>
): void {
const queue = `${process.env.SERVICE_NAME}.${eventType.name}`;
this.channel.assertQueue(queue, { durable: true });
this.channel.bindQueue(queue, 'domain_events', eventType.name);
this.channel.consume(queue, async (msg) => {
if (msg) {
await handler(JSON.parse(msg.content.toString()));
this.channel.ack(msg);
}
});
}
}
// Database: Logical separation within same database
// Each module owns its tables via schema/prefix
// modules/users/infrastructure/repository.ts
class UserRepository {
constructor(private db: Database) {}
// Only accesses users_* tables
async findById(id: string): Promise<User | null> {
const row = await this.db.query(
'SELECT * FROM users_accounts WHERE id = $1',
[id]
);
return row ? User.fromRow(row) : null;
}
}
// modules/orders/infrastructure/repository.ts
class OrderRepository {
constructor(private db: Database) {}
// Only accesses orders_* tables
// References users by ID, not by join
async findByUserId(userId: string): Promise<Order[]> {
const rows = await this.db.query(
'SELECT * FROM orders_orders WHERE user_id = $1',
[userId]
);
return rows.map(Order.fromRow);
}
}
Migration Path: Modular Monolith to Microservices
┌─────────────────────────────────────────────────────────────────────────────┐
│ EXTRACTION PATH: MODULAR MONOLITH → MICROSERVICES │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Phase 1: Module Isolation (You Are Here) │
│ ┌────────────────────────────────────────┐ │
│ │ Monolith │ │
│ │ ┌────────┐ ┌────────┐ ┌────────┐ │ │
│ │ │ Users │ │ Orders │ │Payments│ │ Single deployment │
│ │ └────────┘ └────────┘ └────────┘ │ In-process events │
│ │ Shared Database │ Same DB, different schemas │
│ └────────────────────────────────────────┘ │
│ │
│ Phase 2: Database Separation │
│ ┌────────────────────────────────────────┐ │
│ │ Monolith │ │
│ │ ┌────────┐ ┌────────┐ ┌────────┐ │ │
│ │ │ Users │ │ Orders │ │Payments│ │ Still single deployment │
│ │ └───┬────┘ └───┬────┘ └───┬────┘ │ In-process events │
│ └──────┼──────────┼──────────┼──────────┘ Separate databases │
│ ▼ ▼ ▼ │
│ [DB:U] [DB:O] [DB:P] │
│ │
│ Phase 3: Event Externalization │
│ ┌────────────────────────────────────────┐ │
│ │ Monolith │ │
│ │ ┌────────┐ ┌────────┐ ┌────────┐ │ │
│ │ │ Users │ │ Orders │ │Payments│ │ Still single deployment │
│ │ └───┬────┘ └───┬────┘ └───┬────┘ │ External message queue │
│ └──────┼──────────┼──────────┼──────────┘ Can extract any module now │
│ │ │ │ │
│ └──────────┼──────────┘ │
│ ▼ │
│ [Message Queue] │
│ │
│ Phase 4: Extract First Service │
│ │
│ ┌─────────────────────────┐ ┌──────────┐ │
│ │ Monolith │ │ Payments │ ◄── Extracted │
│ │ ┌────────┐ ┌────────┐ │ │ Service │ │
│ │ │ Users │ │ Orders │ │ └────┬─────┘ │
│ │ └───┬────┘ └───┬────┘ │ │ │
│ └──────┼──────────┼──────┘ │ │
│ │ │ │ │
│ └──────────┼──────────────────┘ │
│ ▼ │
│ [Message Queue] │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
// Phase 2: Database per module (still single deployment)
// configuration/databases.ts
export const databases = {
users: new Database({
connectionString: process.env.USERS_DB_URL,
schema: 'users'
}),
orders: new Database({
connectionString: process.env.ORDERS_DB_URL,
schema: 'orders'
}),
payments: new Database({
connectionString: process.env.PAYMENTS_DB_URL,
schema: 'payments'
})
};
// Phase 3: Swap event bus implementation
// configuration/event-bus.ts
export function createEventBus(): EventBus {
if (process.env.EVENT_BUS === 'rabbitmq') {
return new RabbitMQEventBus(createRabbitMQConnection());
}
return new InProcessEventBus();
}
// Phase 4: Extract module to separate service
// This module is now its own deployable
// payments-service/app.ts
const app = express();
const eventBus = new RabbitMQEventBus(connection);
const paymentsModule = new PaymentsModule(database, eventBus);
// Same code, different deployment
app.use('/api/payments', paymentsModule.routes);
app.listen(process.env.PORT);
The Decision Framework
Primary Decision Factors
┌─────────────────────────────────────────────────────────────────────────────┐
│ ARCHITECTURE DECISION TREE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ How many engineers? │
│ │ │
│ ┌───────────────┼───────────────┐ │
│ ▼ ▼ ▼ │
│ < 10 10-50 > 50 │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ Monolith Modular Consider │
│ (definitely) Monolith Microservices │
│ │ │ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────────────────┐ │
│ │ Do you have these skills? │ │
│ │ • Distributed systems │ │
│ │ • DevOps/Platform team │ │
│ │ • Observability expertise │ │
│ └─────────────┬───────────────┘ │
│ │ │
│ ┌─────────────┼─────────────┐ │
│ ▼ ▼ ▼ │
│ No Some Yes │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ Modular Start with Microservices │
│ Monolith Modular, could work │
│ extract later │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Detailed Assessment Matrix
// Self-assessment tool
interface ArchitectureFactors {
teamSize: number;
deploymentFrequency: 'daily' | 'weekly' | 'monthly';
domainComplexity: 'low' | 'medium' | 'high';
scalingNeeds: 'uniform' | 'varied';
teamDistribution: 'colocated' | 'remote' | 'distributed-orgs';
opsMaturity: 'low' | 'medium' | 'high';
dataConsistency: 'strict' | 'eventual-ok';
}
function recommendArchitecture(factors: ArchitectureFactors): string {
let monolithScore = 0;
let microservicesScore = 0;
// Team size
if (factors.teamSize < 10) monolithScore += 3;
else if (factors.teamSize < 30) monolithScore += 1;
else if (factors.teamSize < 60) microservicesScore += 1;
else microservicesScore += 3;
// Deployment frequency
if (factors.deploymentFrequency === 'monthly') monolithScore += 1;
if (factors.deploymentFrequency === 'daily') microservicesScore += 2;
// Domain complexity
if (factors.domainComplexity === 'high') microservicesScore += 1;
// Scaling needs
if (factors.scalingNeeds === 'varied') microservicesScore += 2;
else monolithScore += 1;
// Team distribution
if (factors.teamDistribution === 'distributed-orgs') microservicesScore += 2;
if (factors.teamDistribution === 'colocated') monolithScore += 1;
// Ops maturity
if (factors.opsMaturity === 'low') monolithScore += 3;
if (factors.opsMaturity === 'high') microservicesScore += 1;
// Data consistency
if (factors.dataConsistency === 'strict') monolithScore += 2;
if (factors.dataConsistency === 'eventual-ok') microservicesScore += 1;
// Decision
const diff = microservicesScore - monolithScore;
if (diff < -2) return 'Simple Monolith';
if (diff < 2) return 'Modular Monolith';
if (diff < 5) return 'Service-Oriented Architecture (3-7 services)';
return 'Microservices (with strong platform investment)';
}
Real-World Examples
When a Monolith Was Right
┌─────────────────────────────────────────────────────────────────────────────┐
│ CASE STUDY: B2B SaaS Product │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Situation: │
│ • Team size: 15 engineers │
│ • Product: Enterprise project management tool │
│ • Traffic: ~100 req/sec average, 500 req/sec peak │
│ • Revenue: $5M ARR, growing 50% YoY │
│ │
│ Initial Push for Microservices: │
│ • VP of Engineering from "microservices-native" company │
│ • Concern about "scaling for the future" │
│ • Desire to "modernize" the architecture │
│ │
│ What We Did Instead: │
│ • Stayed with monolith │
│ • Invested in modular architecture │
│ • Added read replicas for database │
│ • Implemented background job processing │
│ • Proper caching layer │
│ │
│ Results After 2 Years: │
│ • Team grew to 35, still monolith (now modular) │
│ • Revenue $18M ARR │
│ • 10 deploys/day average │
│ • P99 latency < 100ms │
│ • Infrastructure cost: $15K/month │
│ │
│ Key Insight: The "scaling problems" were actually code problems │
│ that microservices would have hidden, not solved. │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
When Microservices Made Sense
┌─────────────────────────────────────────────────────────────────────────────┐
│ CASE STUDY: Consumer Marketplace │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Situation: │
│ • Team size: 150 engineers across 20 teams │
│ • Product: Two-sided marketplace (buyers + sellers) │
│ • Traffic: 10,000 req/sec average, 100K+ during sales │
│ • 8 distinct business domains with different SLAs │
│ │
│ Problems with Monolith: │
│ • Deploy queue was 2 days long │
│ • One team's bug blocked everyone │
│ • Search indexing CPU spikes affected checkout │
│ • Test suite took 3 hours │
│ • Database schema conflicts between teams │
│ │
│ Migration Strategy: │
│ • 18-month gradual extraction │
│ • Started with highest-pain domains (Search, Payments) │
│ • Built platform team first (6 months before extraction) │
│ • Kept monolith for low-change domains │
│ │
│ Final Architecture: │
│ • 12 core services (not 150) │
│ • Team-owned services (2-3 teams per service) │
│ • Legacy monolith still exists for admin tools │
│ • Kubernetes platform with service mesh │
│ │
│ Results: │
│ • Deploy time: 15 minutes (per service) │
│ • Search team deploys 20x/day independently │
│ • Checkout availability: 99.99% │
│ • Infrastructure cost: 3x higher (but team velocity 4x) │
│ │
│ Key Insight: The organizational pain drove the decision, │
│ not technical scaling needs. │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Anti-Patterns to Avoid
Monolith Anti-Patterns
// 1. The Big Ball of Mud
// ❌ No module boundaries, everything imports everything
import { User } from '../../../users/models/user';
import { Order } from '../../orders/models/order';
import { sendEmail } from '../../../utils/email';
import { calculateTax } from '../../../payments/tax';
import { logEvent } from '../../../analytics/logger';
// Fix: Enforce module boundaries, use public APIs only
// 2. The Shared Database Anti-Pattern
// ❌ Multiple modules writing to same tables
// Users module
await db.query('UPDATE orders SET status = ? WHERE user_id = ?', [...]);
// Orders module
await db.query('UPDATE orders SET status = ? WHERE id = ?', [...]);
// Fix: Each module owns its tables exclusively
// 3. The Synchronous Everything
// ❌ All operations in request path
app.post('/api/orders', async (req, res) => {
const order = await createOrder(req.body);
await sendConfirmationEmail(order); // 500ms
await updateInventory(order); // 200ms
await notifyWarehouse(order); // 300ms
await updateAnalytics(order); // 100ms
await syncToERP(order); // 800ms
res.json(order); // Total: 2+ seconds
});
// Fix: Use background jobs for non-critical operations
Microservices Anti-Patterns
// 1. The Distributed Monolith
// ❌ Microservices that must deploy together
// Service A
const userResponse = await fetch('http://user-service/users/123');
const user = await userResponse.json();
// But Service A hardcodes assumptions about user structure
// and breaks when User Service changes
// Fix: Use contracts, version APIs, design for failure
// 2. The Nano-Service
// ❌ Services that are too small
// "String Validation Service" - seriously, I've seen this
app.post('/validate/email', (req, res) => {
const isValid = /^[^\s@]+@[^\s@]+\.[^\s@]+$/.test(req.body.email);
res.json({ valid: isValid });
});
// Fix: Services should represent business capabilities, not functions
// 3. The Chatty Services
// ❌ Multiple round trips for one operation
async function getOrderDetails(orderId: string) {
const order = await fetch(`http://order-service/orders/${orderId}`);
const user = await fetch(`http://user-service/users/${order.userId}`);
const items = await Promise.all(
order.items.map(i =>
fetch(`http://product-service/products/${i.productId}`)
)
);
const shipping = await fetch(
`http://shipping-service/rates?orderId=${orderId}`
);
// 4+ network calls for one page
}
// Fix: BFF pattern, data aggregation service, or denormalization
// 4. The Shared Database
// ❌ Multiple services using same database
// User Service writes to users table
// Order Service reads from users table directly
// Fix: Each service owns its data, communicate via APIs/events
Making the Transition
Signs You've Outgrown Your Monolith
┌─────────────────────────────────────────────────────────────────────────────┐
│ MONOLITH STRESS INDICATORS │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Team Friction: │
│ □ Merge conflicts are daily occurrence │
│ □ Feature branches live for weeks │
│ □ Teams wait on each other to deploy │
│ □ Unclear code ownership │
│ □ "Don't touch that code" warnings common │
│ │
│ Deployment Pain: │
│ □ Deploy takes > 1 hour │
│ □ Rollbacks affect unrelated features │
│ □ Fear of deploying on Fridays │
│ □ Need "deployment windows" │
│ □ One team blocks others frequently │
│ │
│ Technical Constraints: │
│ □ Test suite > 30 minutes │
│ □ Memory usage requires very large instances │
│ □ Different components need different scaling │
│ □ Library/framework upgrade is 6+ month project │
│ □ Cannot meet different SLAs for different features │
│ │
│ If you checked 5+, consider extraction. But try modular first! │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Signs You Extracted Too Early
┌─────────────────────────────────────────────────────────────────────────────┐
│ PREMATURE MICROSERVICES INDICATORS │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Complexity Explosion: │
│ □ More time on infrastructure than features │
│ □ Debugging requires checking 5+ services │
│ □ Local development is painful │
│ □ "It worked locally" is common │
│ □ Nobody understands the full system │
│ │
│ Operational Burden: │
│ □ Outages in services you've never heard of │
│ □ On-call is stressful due to distributed complexity │
│ □ Monitoring dashboards are overwhelming │
│ □ Network issues cause cascading failures │
│ □ "Service mesh" solved problems you didn't have │
│ │
│ Velocity Drop: │
│ □ Simple features require changes to multiple services │
│ □ Contract changes are contentious │
│ □ Teams build duplicate functionality │
│ □ "Cross-cutting" changes are impossible │
│ □ Shipping slower than when you had a monolith │
│ │
│ If you checked 5+, consider consolidating services. │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
The Honest Tradeoffs Summary
┌─────────────────────────────────────────────────────────────────────────────┐
│ THE HONEST COMPARISON │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ MONOLITH │ MICROSERVICES │
│ ──────────────────────────────────────────────────────────────────── │
│ │
│ Development Speed │
│ Small team (<15): ████████ │ ███░░░░░ │
│ Large team (50+): ███░░░░░ │ ██████░░ │
│ │
│ Operational Complexity │
│ Infrastructure: ██░░░░░░ │ ████████ │
│ Debugging: ██░░░░░░ │ ███████░ │
│ Monitoring: ██░░░░░░ │ ███████░ │
│ │
│ Scaling │
│ Horizontal (uniform): ███████░ │ ████████ │
│ Component-specific: ██░░░░░░ │ ████████ │
│ │
│ Team Autonomy │
│ Independence: ██░░░░░░ │ ████████ │
│ Tech choices: ██░░░░░░ │ ███████░ │
│ │
│ Data Management │
│ Consistency: ████████ │ ███░░░░░ │
│ Transactions: ████████ │ ██░░░░░░ │
│ │
│ Cost │
│ Infrastructure: ██░░░░░░ │ ██████░░ │
│ Engineering overhead: ██░░░░░░ │ ███████░ │
│ Initial investment: █░░░░░░░ │ ████████ │
│ │
│ Risk Profile │
│ Single point of failure: ████████ │ ███░░░░░ │
│ Cascading failures: ██░░░░░░ │ ██████░░ │
│ Blast radius of bugs: ████████ │ ██░░░░░░ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
My Recommendations
For Most Teams (The Safe Path)
1. Start with a modular monolith
• Module boundaries from day one
• Event-driven communication between modules
• Single deployment, simple operations
2. Invest in scalability fundamentals first
• Caching
• Background jobs
• Read replicas
• CDN for static assets
3. Extract services only when you feel specific pain
• Deployment bottlenecks for one team
• Scaling needs for specific component
• Different tech requirements
4. Keep it minimal
• 3-7 services covers most cases
• Each service = team-sized scope
• Shared platform infrastructure
For Large Organizations (50+ Engineers)
1. Accept the infrastructure investment
• Dedicated platform team (10-15% of engineering)
• Kubernetes or similar orchestration
• Service mesh for observability
• Centralized logging and tracing
2. Define clear service boundaries
• Domain-driven design
• One team owns one service
• Clear contracts between services
3. Build the platform before extracting
• CI/CD pipelines
• Service templates
• Monitoring dashboards
• Developer documentation
4. Extract gradually
• Strangler fig pattern
• Start with highest-pain domains
• Keep the monolith for stable features
Quick Reference
Architecture Decision Checklist
## Before Choosing Microservices, Ask:
Team Questions:
□ Do we have 30+ engineers who step on each other's toes?
□ Do we have dedicated platform/DevOps engineers?
□ Do teams have clear domain ownership?
□ Can teams operate independently?
Technical Questions:
□ Do different components need different scaling?
□ Do we need different tech stacks for different problems?
□ Is our test suite > 30 minutes?
□ Do we have strong API versioning discipline?
Operational Questions:
□ Do we have distributed tracing?
□ Do we have centralized logging?
□ Can we handle increased infrastructure complexity?
□ Do we have on-call processes that can handle distributed systems?
If you answered "no" to more than half:
→ Start with a modular monolith
If you answered "yes" to most:
→ Microservices might be appropriate
The Golden Rules
1. Conway's Law is real
Your architecture will mirror your org structure.
Design both together.
2. You can always extract later
It's easier to split a well-designed monolith
than to merge poorly-designed microservices.
3. Complexity is never free
Every service you add increases operational burden.
Make sure it's worth it.
4. Measure before you migrate
Know your actual bottlenecks.
Most "scaling problems" are code problems.
5. Team size drives architecture
Small team → monolith
Large team → consider services
This is the #1 factor.
Closing Thoughts
The microservices vs. monolith debate misses the point. The real question is: what architecture enables your team to ship value to customers fastest, with acceptable operational burden?
For a 10-person startup, that's almost certainly a monolith. For a 500-person enterprise, it's probably some form of services. For most companies in between, it's a modular monolith that extracts services only when there's clear pain.
The best architecture is the one you can actually operate. A perfectly designed microservices architecture is worthless if your team spends all their time debugging distributed systems instead of building features.
Start simple. Stay simple as long as possible. Add complexity only when the pain of simplicity exceeds the cost of complexity.
And remember: the goal isn't to have a "modern" architecture. The goal is to build software that solves problems for your users. Everything else is a means to that end.
Build for your actual needs, not your imagined future scale. The companies that "scale" didn't get there by over-engineering early—they got there by shipping fast and adapting when real problems emerged.
What did you think?