Async Runtime Internals: Futures, Wakers, Executors & How async/await Actually Works Under the Hood

March 18, 20267 min read4 views

async runtime

asynchronous programming

runtime internals

system design

performance engineering

distributed systems

scalable systems

software architecture

Async Runtime Internals: Futures, Wakers, Executors & How async/await Actually Works Under the Hood

Why Understanding Async Runtime Internals Matters

Every time you write await fetch() in JavaScript or future.await in Rust, machinery invisible to you orchestrates suspension, scheduling, and resumption of your code. Async runtimes — Tokio (Rust), libuv (Node.js), asyncio (Python), Netty (Java) — are the engines that make non-blocking I/O appear synchronous. Understanding how Futures/Promises are state machines compiled from async functions, how Wakers notify the executor when I/O is ready, and how executors schedule tasks across thread pools explains why your async code sometimes deadlocks, why await doesn't block threads, and how a single runtime handles millions of concurrent tasks with a handful of OS threads.

What happens when you write async/await:

Source code:                    Compiled state machine:
┌─────────────────────┐        ┌──────────────────────────────┐
│ async function fetch │        │ enum FetchState {             │
│   url: string        │        │   Start,                     │
│ ) {                  │   ──►  │   AwaitingDns { url },       │
│   const ip =         │        │   AwaitingConnect { ip },    │
│     await dns(url);  │        │   AwaitingResponse { conn }, │
│   const conn =       │        │   Done(Response),            │
│     await connect(ip)│        │ }                            │
│   const data =       │        │                              │
│     await read(conn);│        │ impl Future for FetchState { │
│   return data;       │        │   fn poll() → Poll {         │
│ }                    │        │     match self.state {        │
└─────────────────────┘        │       Start → { ... }        │
                                │       AwaitingDns → { ... }  │
  Syntactic sugar               │       ...                    │
  for a state machine           │     }                        │
                                │   }                          │
                                │ }                            │
                                └──────────────────────────────┘

The async runtime:
┌──────────────────────────────────────────────────────┐
│ Executor (e.g., Tokio, libuv)                        │
│                                                      │
│  Task Queue:  [Future₁] [Future₂] [Future₃] ...     │
│                                                      │
│  Thread Pool:  T₁  T₂  T₃  T₄                       │
│                │   │   │   │                         │
│  Each thread:  polls a Future from the queue         │
│    poll() returns:                                   │
│      Ready(value) → task is done, deliver result     │
│      Pending      → task needs I/O, park it          │
│                     register Waker with I/O system   │
│                     when I/O ready → Waker wakes     │
│                     task → re-queued for polling      │
│                                                      │
│  Reactor (I/O driver):                               │
│    epoll_wait() → detects ready I/O → calls Wakers   │
│    Waker pushes task back into executor queue         │
└──────────────────────────────────────────────────────┘

Async Runtime Architecture

┌─────────────────────────────────────────────────────────┐
│                 ASYNC RUNTIME LAYERS                     │
│                                                         │
│  ┌───────────────────────────────────────────────────┐  │
│  │ Layer 4: User Code (async/await syntax)           │  │
│  │                                                   │  │
│  │  async function handleRequest(req) {              │  │
│  │    const user = await db.query(req.userId);       │  │
│  │    return { name: user.name };                    │  │
│  │  }                                                │  │
│  └───────────────────────┬───────────────────────────┘  │
│                          │ compiled to                   │
│  ┌───────────────────────▼───────────────────────────┐  │
│  │ Layer 3: Future / Promise (state machine)         │  │
│  │                                                   │  │
│  │  State 0: initial → call db.query()               │  │
│  │  State 1: waiting for query → yield Pending       │  │
│  │  State 2: query done → construct response         │  │
│  │  State 3: return Ready(response)                  │  │
│  └───────────────────────┬───────────────────────────┘  │
│                          │ scheduled by                  │
│  ┌───────────────────────▼───────────────────────────┐  │
│  │ Layer 2: Executor (task scheduler)                │  │
│  │                                                   │  │
│  │  ┌─────────┐  ┌─────────┐  ┌─────────┐          │  │
│  │  │ Worker₁ │  │ Worker₂ │  │ Worker₃ │          │  │
│  │  └────┬────┘  └────┬────┘  └────┬────┘          │  │
│  │       │            │            │                 │  │
│  │  Task Queue: [task₁] [task₂] [task₃] [task₄]    │  │
│  └───────────────────────┬───────────────────────────┘  │
│                          │ I/O events from               │
│  ┌───────────────────────▼───────────────────────────┐  │
│  │ Layer 1: Reactor (I/O event driver)               │  │
│  │                                                   │  │
│  │  epoll / kqueue / IOCP                            │  │
│  │  Monitors all pending I/O operations              │  │
│  │  When ready → invokes Waker → task re-queued      │  │
│  └───────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────┘

Building an Async Runtime from Scratch

Future / Promise State Machine

/*
  A Future represents a value that isn't available yet.
  
  In Rust:
    trait Future {
      type Output;
      fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Self::Output>;
    }
    
    enum Poll<T> {
      Ready(T),
      Pending,
    }
  
  In JavaScript:
    Promise is an object with .then() handlers.
    The engine maintains a microtask queue.
  
  The key insight: async/await is compiled into a state machine.
  Each `await` point becomes a state transition.
  
  async fn example() -> String {
    let a = await step1();    // State 0 → State 1
    let b = await step2(a);   // State 1 → State 2
    format!("{a}{b}")         // State 2 → Done
  }
  
  Becomes:
  
  enum ExampleFuture {
    State0,
    State1 { step1_future: Step1Future },
    State2 { a: String, step2_future: Step2Future },
    Done,
  }
*/

// Poll result type (Rust-inspired)
type Poll<T> = { status: 'ready'; value: T } | { status: 'pending' };

function Ready<T>(value: T): Poll<T> {
  return { status: 'ready', value };
}

function Pending<T>(): Poll<T> {
  return { status: 'pending' };
}

// Waker: mechanism to re-schedule a task when it's ready
interface Waker {
  wake(): void;
}

// Context passed to poll() — primarily carries the Waker
interface PollContext {
  waker: Waker;
}

// The Future trait
interface Future<T> {
  poll(cx: PollContext): Poll<T>;
}

// Example: A future that resolves after a delay
class TimerFuture implements Future<void> {
  private deadline: number;
  private registered: boolean = false;

  constructor(delayMs: number) {
    this.deadline = Date.now() + delayMs;
  }

  poll(cx: PollContext): Poll<void> {
    if (Date.now() >= this.deadline) {
      return Ready(undefined);
    }

    // Not ready yet — register waker to be called at deadline
    if (!this.registered) {
      this.registered = true;
      const waker = cx.waker;
      const remaining = this.deadline - Date.now();
      setTimeout(() => waker.wake(), remaining);
    }

    return Pending();
  }
}

// Chained future: models `let b = await step1(); await step2(b);`
class ChainFuture<A, B> implements Future<B> {
  private first: Future<A>;
  private thenFn: (a: A) => Future<B>;
  private state: 'first' | 'second' = 'first';
  private secondFuture: Future<B> | null = null;

  constructor(first: Future<A>, thenFn: (a: A) => Future<B>) {
    this.first = first;
    this.thenFn = thenFn;
  }

  poll(cx: PollContext): Poll<B> {
    switch (this.state) {
      case 'first': {
        const result = this.first.poll(cx);
        if (result.status === 'pending') return Pending();
        // First future resolved — create second
        this.secondFuture = this.thenFn(result.value);
        this.state = 'second';
        // Fall through to poll second immediately
      }
      case 'second': {
        return this.secondFuture!.poll(cx);
      }
    }
  }
}

// JoinAll: poll multiple futures concurrently (like Promise.all)
class JoinAll<T> implements Future<T[]> {
  private futures: Future<T>[];
  private results: (T | undefined)[];
  private completed: boolean[];
  private doneCount: number = 0;

  constructor(futures: Future<T>[]) {
    this.futures = futures;
    this.results = new Array(futures.length);
    this.completed = new Array(futures.length).fill(false);
  }

  poll(cx: PollContext): Poll<T[]> {
    for (let i = 0; i < this.futures.length; i++) {
      if (this.completed[i]) continue;

      const result = this.futures[i].poll(cx);
      if (result.status === 'ready') {
        this.results[i] = result.value;
        this.completed[i] = true;
        this.doneCount++;
      }
    }

    if (this.doneCount === this.futures.length) {
      return Ready(this.results as T[]);
    }

    return Pending();
  }
}

// Select: return first completed future (like Promise.race)
class Select<T> implements Future<{ index: number; value: T }> {
  private futures: Future<T>[];
  private done: boolean = false;

  constructor(futures: Future<T>[]) {
    this.futures = futures;
  }

  poll(cx: PollContext): Poll<{ index: number; value: T }> {
    if (this.done) {
      throw new Error('Select already resolved');
    }

    for (let i = 0; i < this.futures.length; i++) {
      const result = this.futures[i].poll(cx);
      if (result.status === 'ready') {
        this.done = true;
        return Ready({ index: i, value: result.value });
      }
    }

    return Pending();
  }
}

Task & Waker System

/*
  A Task wraps a Future with scheduling metadata.
  
  The Waker is the glue between I/O readiness and the executor:
  1. Future returns Pending → registers its Waker with I/O system
  2. I/O becomes ready → Waker.wake() is called
  3. wake() pushes the Task back into the executor's run queue
  4. Executor polls the Task again
  
  This is "demand-driven" / "lazy" execution:
  - Futures only make progress when polled
  - Polls only happen when woken
  - No busy-waiting, no unnecessary CPU usage
  
  ┌──────────────┐     ┌──────────────┐     ┌──────────────┐
  │   Executor   │     │    Task      │     │   Future     │
  │              │     │              │     │              │
  │  poll(task)──┼────►│  poll(cx) ───┼────►│  poll(cx)    │
  │              │     │              │     │  → Pending   │
  │              │     │              │     │  register    │
  │              │     │              │     │  waker with  │
  │              │     │              │     │  I/O system  │
  │              │     │              │     └──────────────┘
  │              │     │              │            │
  │              │     └──────────────┘            │
  │              │                                 │ I/O ready
  │              │     ┌──────────────┐            │
  │  ◄───────────┼─────┤  Waker.wake()◄───────────┘
  │  re-queue    │     └──────────────┘
  │  task        │
  └──────────────┘
*/

type TaskId = number;

enum TaskState {
  Queued,    // In run queue, waiting to be polled
  Running,   // Currently being polled by a worker
  Waiting,   // Returned Pending, waiting for waker
  Completed, // Returned Ready, done
}

class Task<T = unknown> {
  readonly id: TaskId;
  private future: Future<T>;
  private state: TaskState = TaskState.Queued;
  private result: T | null = null;
  private onComplete: ((value: T) => void) | null = null;
  private waker: Waker | null = null;

  constructor(id: TaskId, future: Future<T>) {
    this.id = id;
    this.future = future;
  }

  getState(): TaskState { return this.state; }

  // Create a Waker that re-queues this task
  createWaker(requeue: (task: Task<T>) => void): Waker {
    const task = this;
    this.waker = {
      wake() {
        if (task.state === TaskState.Waiting) {
          task.state = TaskState.Queued;
          requeue(task);
        }
      }
    };
    return this.waker;
  }

  // Poll the underlying future
  poll(cx: PollContext): Poll<T> {
    this.state = TaskState.Running;

    const result = this.future.poll(cx);

    if (result.status === 'ready') {
      this.state = TaskState.Completed;
      this.result = result.value;
      if (this.onComplete) {
        this.onComplete(result.value);
      }
    } else {
      this.state = TaskState.Waiting;
    }

    return result;
  }

  // Register completion callback
  then(callback: (value: T) => void): void {
    if (this.state === TaskState.Completed) {
      callback(this.result as T);
    } else {
      this.onComplete = callback;
    }
  }
}

Executor (Task Scheduler)

/*
  The Executor is responsible for:
  1. Receiving spawned tasks
  2. Polling tasks when they're ready
  3. Managing worker threads (in multi-threaded runtimes)
  
  Executor strategies:
  
  Single-threaded (Node.js, Python asyncio):
    One thread runs the event loop, polls all tasks sequentially.
    Simple but CPU-bound work blocks everything.
  
  Multi-threaded / Work-stealing (Tokio, Go runtime):
    Multiple worker threads, each with a local task queue.
    Idle workers steal from busy workers' queues.
    Better CPU utilization, more complex.
  
  ┌─────────────────────────────────────────────┐
  │ Multi-threaded Executor                     │
  │                                             │
  │  Global Queue: [task₅] [task₆]             │
  │                                             │
  │  Worker 1:                                  │
  │    Local Queue: [task₁] [task₂]            │
  │    Currently polling: task₃                 │
  │                                             │
  │  Worker 2:                                  │
  │    Local Queue: [task₄]                    │
  │    Idle → steal from Worker 1's queue       │
  │                                             │
  │  Worker 3:                                  │
  │    Local Queue: [empty]                    │
  │    Idle → check global queue               │
  └─────────────────────────────────────────────┘
*/

class SingleThreadExecutor {
  private runQueue: Task[] = [];
  private allTasks: Map<TaskId, Task> = new Map();
  private nextTaskId: number = 1;
  private running: boolean = false;
  private stats = {
    totalSpawned: 0,
    totalCompleted: 0,
    totalPolls: 0,
    spuriousWakes: 0,
  };

  // Spawn a new task (equivalent to tokio::spawn or go func())
  spawn<T>(future: Future<T>): Task<T> {
    const task = new Task<T>(this.nextTaskId++, future);
    this.allTasks.set(task.id, task);
    this.runQueue.push(task);
    this.stats.totalSpawned++;
    return task;
  }

  // Block on a future until completion (equivalent to block_on)
  blockOn<T>(future: Future<T>): T {
    const task = this.spawn(future);
    let result: T | undefined;
    let done = false;

    task.then((value) => {
      result = value;
      done = true;
    });

    this.running = true;
    while (this.running && !done) {
      this.tickOnce();
    }

    return result as T;
  }

  // Run all tasks to completion
  run(): void {
    this.running = true;

    while (this.running && this.runQueue.length > 0) {
      this.tickOnce();
    }
  }

  // Process one task from the queue
  private tickOnce(): void {
    const task = this.runQueue.shift();
    if (!task) return;

    // Create waker that re-queues this task
    const waker = task.createWaker((t) => {
      this.runQueue.push(t);
    });

    const cx: PollContext = { waker };
    this.stats.totalPolls++;

    const result = task.poll(cx);

    if (result.status === 'ready') {
      this.stats.totalCompleted++;
      this.allTasks.delete(task.id);
    }
    // If Pending, task will be re-queued when waker.wake() is called
  }

  stop(): void {
    this.running = false;
  }

  getStats(): typeof this.stats & { pendingTasks: number; queueLength: number } {
    return {
      ...this.stats,
      pendingTasks: this.allTasks.size,
      queueLength: this.runQueue.length,
    };
  }
}

// Multi-threaded executor with work-stealing
class MultiThreadExecutor {
  private globalQueue: Task[] = [];
  private workers: WorkerState[];
  private nextTaskId: number = 1;
  private allTasks: Map<TaskId, Task> = new Map();

  constructor(numWorkers: number = 4) {
    this.workers = Array.from({ length: numWorkers }, (_, i) => ({
      id: i,
      localQueue: [] as Task[],
      currentTask: null as Task | null,
      pollCount: 0,
    }));
  }

  spawn<T>(future: Future<T>): Task<T> {
    const task = new Task<T>(this.nextTaskId++, future);
    this.allTasks.set(task.id, task);
    this.globalQueue.push(task);
    return task;
  }

  // Simulate one tick of a specific worker
  tickWorker(workerId: number): void {
    const worker = this.workers[workerId];
    let task: Task | undefined;

    // 1. Check local queue first
    task = worker.localQueue.shift();

    // 2. If empty, check global queue
    if (!task) {
      task = this.globalQueue.shift();
    }

    // 3. If still empty, try work-stealing from another worker
    if (!task) {
      task = this.stealFrom(workerId);
    }

    if (!task) return; // Nothing to do

    worker.currentTask = task;
    worker.pollCount++;

    const waker = task.createWaker((t) => {
      // Re-awoken tasks go to the waking worker's local queue
      worker.localQueue.push(t);
    });

    const result = task.poll({ waker });

    if (result.status === 'ready') {
      this.allTasks.delete(task.id);
    }

    worker.currentTask = null;
  }

  private stealFrom(thiefId: number): Task | undefined {
    // Try to steal from the busiest worker (longest local queue)
    let bestVictim = -1;
    let bestSize = 0;

    for (let i = 0; i < this.workers.length; i++) {
      if (i === thiefId) continue;
      if (this.workers[i].localQueue.length > bestSize) {
        bestSize = this.workers[i].localQueue.length;
        bestVictim = i;
      }
    }

    if (bestVictim >= 0 && bestSize > 1) {
      // Steal half of the victim's queue
      const victim = this.workers[bestVictim];
      const stealCount = Math.floor(victim.localQueue.length / 2);
      const stolen = victim.localQueue.splice(0, stealCount);
      const first = stolen.shift();
      this.workers[thiefId].localQueue.push(...stolen);
      return first;
    }

    return undefined;
  }

  getStats(): {
    globalQueueLength: number;
    workers: Array<{ id: number; localQueueLength: number; pollCount: number }>;
    totalTasks: number;
  } {
    return {
      globalQueueLength: this.globalQueue.length,
      workers: this.workers.map(w => ({
        id: w.id,
        localQueueLength: w.localQueue.length,
        pollCount: w.pollCount,
      })),
      totalTasks: this.allTasks.size,
    };
  }
}

interface WorkerState {
  id: number;
  localQueue: Task[];
  currentTask: Task | null;
  pollCount: number;
}

Reactor (I/O Event Driver)

/*
  The Reactor bridges I/O readiness (epoll/kqueue) with the
  executor (task scheduler). It's the "I/O driver" of the runtime.
  
  When a Future needs I/O:
  1. Future registers interest with the reactor (fd + events + waker)
  2. Future returns Pending
  3. Reactor's event loop calls epoll_wait()
  4. When fd becomes ready, reactor invokes the registered Waker
  5. Waker re-queues the task in the executor
  6. Executor polls the future again → now I/O is ready → returns Ready
  
  ┌──────────────────────────────────────────────────┐
  │ Reactor                                          │
  │                                                  │
  │  Registration Table:                             │
  │  fd=5  → Waker for Task₁ (EPOLLIN)             │
  │  fd=8  → Waker for Task₂ (EPOLLOUT)            │
  │  fd=12 → Waker for Task₃ (EPOLLIN)             │
  │                                                  │
  │  Event Loop (runs on dedicated thread or inline): │
  │  while (true) {                                  │
  │    events = epoll_wait(epfd, timeout)            │
  │    for (fd, event_mask) in events:               │
  │      waker = registrations[fd]                   │
  │      waker.wake()  → re-queues task in executor  │
  │  }                                               │
  └──────────────────────────────────────────────────┘
*/

interface IoRegistration {
  fd: number;
  events: number; // EPOLLIN, EPOLLOUT, etc.
  waker: Waker;
}

class Reactor {
  private registrations: Map<number, IoRegistration> = new Map();
  private pendingEvents: Map<number, number> = new Map(); // fd → event_mask
  private running: boolean = false;

  // Register interest in I/O events for a fd
  register(fd: number, events: number, waker: Waker): void {
    this.registrations.set(fd, { fd, events, waker });

    // Check if there's already a pending event
    const pending = this.pendingEvents.get(fd);
    if (pending && (pending & events)) {
      // I/O already ready — wake immediately
      waker.wake();
      this.pendingEvents.delete(fd);
    }
  }

  // Deregister interest
  deregister(fd: number): void {
    this.registrations.delete(fd);
  }

  // Simulate I/O event arrival (in real runtime, comes from epoll_wait)
  deliverEvent(fd: number, eventMask: number): void {
    const reg = this.registrations.get(fd);
    if (reg && (reg.events & eventMask)) {
      // Wake the registered task
      reg.waker.wake();
      // Remove registration (task will re-register if needed)
      this.registrations.delete(fd);
    } else {
      // No registration yet — store as pending
      this.pendingEvents.set(fd, (this.pendingEvents.get(fd) ?? 0) | eventMask);
    }
  }

  // Process events (called by executor in single-threaded mode,
  // or runs on dedicated reactor thread in multi-threaded mode)
  processEvents(events: Array<{ fd: number; events: number }>): number {
    let woken = 0;

    for (const event of events) {
      const reg = this.registrations.get(event.fd);
      if (reg && (reg.events & event.events)) {
        reg.waker.wake();
        this.registrations.delete(event.fd);
        woken++;
      }
    }

    return woken;
  }

  getRegistrationCount(): number {
    return this.registrations.size;
  }
}

// Async I/O operations built on the reactor
class AsyncTcpStream {
  private fd: number;
  private reactor: Reactor;

  constructor(fd: number, reactor: Reactor) {
    this.fd = fd;
    this.reactor = reactor;
  }

  // Returns a Future that resolves when data is available
  read(): Future<string> {
    const fd = this.fd;
    const reactor = this.reactor;
    let registered = false;
    let data: string | null = null;

    return {
      poll(cx: PollContext): Poll<string> {
        // Check if data is available (non-blocking read attempt)
        if (data !== null) {
          return Ready(data);
        }

        // Register with reactor for read readiness
        if (!registered) {
          reactor.register(fd, 0x001 /* EPOLLIN */, cx.waker);
          registered = true;
        }

        return Pending();
      }
    };
  }

  // Returns a Future that resolves when write buffer is available
  write(payload: string): Future<number> {
    const fd = this.fd;
    const reactor = this.reactor;
    let registered = false;

    return {
      poll(cx: PollContext): Poll<number> {
        // Try non-blocking write
        // (In real implementation, check if socket buffer has space)
        if (!registered) {
          reactor.register(fd, 0x004 /* EPOLLOUT */, cx.waker);
          registered = true;
          return Pending();
        }

        // Woken up — write should succeed now
        return Ready(payload.length);
      }
    };
  }
}

Complete Runtime Assembly

/*
  Putting it all together: a minimal but complete async runtime.
  
  This mirrors the architecture of:
  - Tokio (Rust): multi-threaded executor + mio reactor
  - Node.js (libuv): single-threaded executor + epoll/kqueue reactor
  - Go runtime: multi-threaded scheduler + netpoller
  
  Usage:
    const runtime = new AsyncRuntime();
    
    runtime.spawn(async () => {
      const result = await someAsyncOp();
      console.log(result);
    });
    
    runtime.run();
*/

class AsyncRuntime {
  private executor: SingleThreadExecutor;
  private reactor: Reactor;
  private timers: Map<number, { deadline: number; waker: Waker }> = new Map();
  private nextTimerHandle: number = 1;

  constructor() {
    this.executor = new SingleThreadExecutor();
    this.reactor = new Reactor();
  }

  spawn<T>(future: Future<T>): Task<T> {
    return this.executor.spawn(future);
  }

  // Create a timer future
  sleep(durationMs: number): Future<void> {
    const runtime = this;
    const deadline = Date.now() + durationMs;
    let timerHandle: number | null = null;

    return {
      poll(cx: PollContext): Poll<void> {
        if (Date.now() >= deadline) {
          if (timerHandle !== null) {
            runtime.timers.delete(timerHandle);
          }
          return Ready(undefined);
        }

        if (timerHandle === null) {
          timerHandle = runtime.nextTimerHandle++;
          runtime.timers.set(timerHandle, {
            deadline,
            waker: cx.waker,
          });
        }

        return Pending();
      }
    };
  }

  // Main run loop
  run(): void {
    this.executor.run();
  }

  // Tick: process timers + I/O, then run ready tasks
  tick(): void {
    // Check expired timers
    const now = Date.now();
    for (const [handle, timer] of this.timers) {
      if (timer.deadline <= now) {
        timer.waker.wake();
        this.timers.delete(handle);
      }
    }

    // Run ready tasks
    this.executor.run();
  }

  getReactor(): Reactor {
    return this.reactor;
  }

  getStats(): {
    executor: ReturnType<SingleThreadExecutor['getStats']>;
    pendingTimers: number;
    pendingIo: number;
  } {
    return {
      executor: this.executor.getStats(),
      pendingTimers: this.timers.size,
      pendingIo: this.reactor.getRegistrationCount(),
    };
  }
}

Comparison Table

┌──────────────────┬───────────────┬───────────────┬───────────────┬───────────────┐
│                  │ Tokio (Rust)  │ Node.js       │ Go Runtime    │ Python asyncio│
│                  │               │ (libuv)       │               │               │
├──────────────────┼───────────────┼───────────────┼───────────────┼───────────────┤
│ Executor model   │ Multi-thread  │ Single-thread │ Multi-thread  │ Single-thread │
│                  │ work-stealing │ event loop    │ work-stealing │ event loop    │
│ Future type      │ Pin<Box<dyn   │ Promise       │ Goroutine     │ Coroutine     │
│                  │ Future>>      │               │ (stackful)    │ (async def)   │
│ Execution model  │ Poll-based    │ Callback      │ Preemptive    │ Poll-based    │
│                  │ (lazy)        │ (eager)       │ (scheduled)   │ (lazy)        │
│ I/O driver       │ mio (epoll/   │ libuv (epoll/ │ netpoller     │ selector      │
│                  │ kqueue)       │ kqueue/IOCP)  │ (epoll/kqueue)│ (epoll/kqueue)│
│ Worker threads   │ CPU cores     │ 1 (+libuv     │ CPU cores     │ 1             │
│                  │ (default)     │ thread pool)  │ (GOMAXPROCS)  │               │
│ Coroutine type   │ Stackless     │ N/A (callback)│ Stackful      │ Stackless     │
│                  │ (state machine│               │ (2KB-1GB      │ (generator    │
│                  │ on heap)      │               │ growable      │ based)        │
│                  │               │               │ stack)        │               │
│ Cancellation     │ Drop future   │ AbortController│ context.Done │ task.cancel() │
│ Zero-cost?       │ Yes (no heap  │ No (Promises  │ No (goroutine │ No (coroutine │
│                  │ alloc if      │ always heap   │ stack always  │ objects on    │
│                  │ optimized)    │ allocated)    │ allocated)    │ heap)         │
│ Backpressure     │ Built-in      │ Manual (stream│ Channel       │ Manual        │
│                  │ (poll-based)  │ .pause/.resume│ buffering     │               │
│ Max tasks        │ Millions      │ Millions      │ Millions      │ Thousands     │
│                  │ (tiny futures)│ (tiny Promise)│ (goroutines)  │ (GIL limited) │
└──────────────────┴───────────────┴───────────────┴───────────────┴───────────────┘

Interview Questions

Q1: Explain the difference between stackful and stackless coroutines.

Stackless coroutines (Rust async, Python async, C# async, JavaScript Promises) are compiled into state machines. Each await point becomes a state variant. Local variables that live across await points are stored in the state machine struct on the heap. Size = only what's needed. Suspended coroutines can be as small as a few bytes. The compiler transforms the code at compile time. Stackful coroutines (Go goroutines, Java virtual threads, Lua coroutines) have their own call stack. The runtime allocates a stack (Go: starts at 2KB, growable to 1GB) and the coroutine can suspend at any point in the call stack — even deep inside library functions. This is more flexible (no function coloring problem) but uses more memory per coroutine. Trade-offs: Stackless is more memory-efficient and zero-cost (Rust can inline and optimize the state machine), but requires "infecting" the entire call chain with async (function coloring). Stackful is more ergonomic (any function can yield) but has higher per-coroutine overhead and requires a runtime for stack management.

Q2: How does the Waker mechanism work and why is it necessary?

The Waker is the mechanism by which a pending Future tells the executor "I'm ready to make progress, poll me again." When a Future returns Pending, it must store the Waker (from the Context parameter) somewhere that will be notified when the underlying I/O completes. For example, a TCP read future registers its Waker with the reactor (I/O driver). When epoll_wait returns that the socket is readable, the reactor calls waker.wake(), which pushes the associated Task back into the executor's run queue. Why it's necessary: Without Wakers, the executor would have to busy-poll all pending futures repeatedly, wasting CPU. With Wakers, the executor only polls futures that are actually ready — the "demand-driven" or "wake-on-ready" model. It's also what allows futures to be composed: JoinAll can register wakers for all child futures, and any child becoming ready wakes the parent. Implementation: In Rust, Waker is a vtable pointer (RawWaker) that calls a wake function, which is typically Arc::clone + push to queue. In Tokio, the waker increments a reference count and pushes the task handle to the worker's run queue.

Q3: What is the "function coloring" problem with async/await?

The function coloring problem (Bob Nystrom, 2015): in languages with stackless async, functions are "colored" — sync (regular) and async. Async functions can call sync functions, but sync functions cannot call async functions without becoming async themselves. This creates a viral effect: if a low-level function becomes async, every caller up the chain must also become async. For example, in Rust: fn process() cannot call async fn fetch() — it must become async fn process(), and its callers must also become async, all the way to main. Consequences: (1) Library authors must provide both sync and async versions of APIs (doubling code). (2) Trait implementations become complex (async traits in Rust required #[async_trait] for years). (3) Mixing sync and async code requires explicit bridges (block_on, spawn_blocking). Go avoids this: Goroutines are stackful — any function can yield without changing its signature. There's no async keyword. The runtime handles scheduling transparently. The trade-off is higher per-goroutine memory and less opportunity for compile-time optimization.

Q4: How does Tokio's multi-threaded executor work?

Tokio uses a work-stealing scheduler inspired by Go's runtime and Cilk. Each worker thread has a local run queue (lock-free, bounded, LIFO for cache locality). When a task is spawned, it goes to the spawning worker's local queue. Workers poll tasks in this order: (1) Local queue (LIFO — recently spawned tasks are hot in cache). (2) Global queue (checked periodically, not every tick — typically once per 61 local polls to amortize lock cost). (3) Steal from other workers (dequeue from opposite end — FIFO — to steal older/colder tasks). Reactor integration: Tokio has a dedicated I/O driver thread (or integrates into worker threads). When a task registers with the reactor and I/O becomes ready, the reactor's waker pushes the task to its associated worker's local queue. Budget system: Tokio uses a "coop budget" — each task gets ~128 poll iterations before being forcefully yielded. This prevents one task from monopolizing a worker thread (starvation). If a future polls a sub-future in a tight loop, the budget decrement forces a yield back to the executor. Blocking: spawn_blocking sends work to a separate thread pool (up to 512 threads by default) for CPU-heavy or blocking I/O work that would stall the async workers.

Q5: Why does Go not need async/await?

Go uses stackful coroutines (goroutines) managed by a sophisticated runtime scheduler. Every goroutine has its own stack (starts at 2KB, grows dynamically). When a goroutine performs I/O (network call, file read, channel operation), the Go runtime transparently: (1) Registers the fd with the netpoller (Go's internal reactor using epoll/kqueue). (2) Parks the goroutine (removes it from the OS thread). (3) Schedules another goroutine on that OS thread. (4) When I/O is ready, the netpoller wakes the goroutine and re-schedules it. This happens without any syntactic annotation — every call is potentially a yield point. The compiler inserts preemption points at function calls and loop backedges (since Go 1.14, asynchronous preemption via signals). GMP model: G (goroutines) are scheduled onto M (OS threads) via P (processors/contexts). P holds the local run queue. M:N scheduling means millions of G's map to a few M's. Trade-off vs async/await: No function coloring, simpler code, but: higher per-goroutine memory (2KB vs bytes), no zero-cost abstractions (always allocates stack), and less control over scheduling (no select! macro-level composition). Go's approach works because the runtime is deeply integrated with the compiler and standard library.

Key Takeaways

async/await compiles to state machines: Each await point becomes a state variant. Local variables across await points are stored in the state struct. The compiler transforms your sequential code into a poll()-based Future.
Futures are lazy — they do nothing until polled: Unlike JavaScript Promises (which start executing immediately), Rust Futures and Python coroutines only make progress when the executor calls poll().
The Waker is the glue between I/O readiness and task scheduling: When a Future returns Pending, it stores the Waker. When I/O becomes ready (epoll event), the Waker is invoked, pushing the task back into the executor's run queue.
The Reactor monitors I/O via epoll/kqueue and dispatches Wakers: It's the bridge between OS-level I/O events and the userspace task scheduler. One reactor thread can monitor millions of file descriptors.
Executors schedule tasks, not threads: A multi-threaded executor (Tokio) uses work-stealing across a small thread pool (typically = CPU cores), managing millions of tasks on a handful of OS threads.
Stackless coroutines (Rust, JS, Python) cause function coloring: Async functions can't be called from sync contexts without special bridges. Stackful coroutines (Go goroutines) avoid this but use more memory per coroutine.
Backpressure is built into poll-based systems: If a consumer Future isn't being polled, its producer won't be polled either — no unbounded buffering. Callback-based systems (Node.js streams) need manual backpressure.
Work-stealing executors optimize for cache locality: Tasks run on the same thread that spawned them (LIFO local queue). Stealing takes from the opposite end (FIFO) so cold tasks migrate, not hot ones.
Tokio's coop budget prevents task starvation: Each task gets ~128 sub-polls before forced yield, preventing one busy future from monopolizing a worker thread.
Go's goroutine scheduler (GMP model) is an async runtime in disguise: It has an executor (P's local run queue + work-stealing), a reactor (netpoller using epoll), and cooperative/preemptive scheduling — just without the async keyword.

What did you think?