Atomics, SharedArrayBuffer, and True Parallelism in JavaScript

March 19, 20262 min read2 views

javascript concurrency

performance engineering

concurrent programming

advanced javascript

system design

scalable systems

Atomics, SharedArrayBuffer, and True Parallelism in JavaScript

JavaScript has always been single-threaded—until SharedArrayBuffer. With shared memory between workers, you can build truly parallel algorithms, but you also inherit all the complexity of concurrent programming: data races, memory ordering, and synchronization primitives. This article covers how to use these low-level tools correctly and when they genuinely change what's possible in JavaScript applications.

The Shared Memory Model

┌─────────────────────────────────────────────────────────────────────────────┐
│                    SHAREDARRAYBUFFER ARCHITECTURE                           │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  WITHOUT SharedArrayBuffer (postMessage):                                   │
│  ────────────────────────────────────────                                   │
│                                                                             │
│  Main Thread              Worker Thread                                     │
│  ┌─────────────┐         ┌─────────────┐                                   │
│  │ ArrayBuffer │         │ ArrayBuffer │                                   │
│  │ [1,2,3,4]   │         │ [1,2,3,4]   │  ← COPY (structured clone)        │
│  └──────┬──────┘         └──────┬──────┘                                   │
│         │                       │                                           │
│         │ postMessage(arr)      │                                           │
│         │ ─────────────────────▶│                                           │
│         │     (serialize,       │                                           │
│         │      copy memory,     │                                           │
│         │      deserialize)     │                                           │
│                                                                             │
│  WITH SharedArrayBuffer:                                                    │
│  ─────────────────────────                                                  │
│                                                                             │
│  Main Thread              Worker Thread                                     │
│  ┌─────────────┐         ┌─────────────┐                                   │
│  │    View     │         │    View     │                                   │
│  │  Int32Array │         │  Int32Array │                                   │
│  └──────┬──────┘         └──────┬──────┘                                   │
│         │                       │                                           │
│         └───────────┬───────────┘                                           │
│                     │                                                       │
│                     ▼                                                       │
│         ┌─────────────────────────┐                                        │
│         │   SharedArrayBuffer     │  ← SAME MEMORY                         │
│         │   [1, 2, 3, 4]          │                                        │
│         │   (shared between       │                                        │
│         │    all threads)         │                                        │
│         └─────────────────────────┘                                        │
│                                                                             │
│  Changes by one thread are visible to all threads!                          │
│  But this creates data race potential...                                    │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Basic SharedArrayBuffer Usage

// Main thread
const sharedBuffer = new SharedArrayBuffer(1024);  // 1KB shared memory
const view = new Int32Array(sharedBuffer);

// Initialize data
view[0] = 42;
view[1] = 100;

// Send to worker (buffer is NOT copied, just the reference)
worker.postMessage({ buffer: sharedBuffer });

// Worker can now read AND WRITE the same memory
// Changes are immediately visible to main thread

// worker.js
self.onmessage = function(e) {
  const view = new Int32Array(e.data.buffer);
  console.log(view[0]);  // 42 - reading shared memory
  view[0] = 99;          // Main thread sees this change!
};

The Data Race Problem

// WITHOUT Atomics - DATA RACE!

// Main thread
const view = new Int32Array(sharedBuffer);
view[0] = 0;

// Spawn 4 workers, each increments view[0] by 1000000
// Expected result: 4000000
// Actual result: UNDEFINED (usually less than 4000000)

// Worker code:
for (let i = 0; i < 1000000; i++) {
  view[0]++;  // NOT ATOMIC!
}

// What happens internally for view[0]++:
// 1. Read current value from memory
// 2. Add 1 to the value
// 3. Write new value to memory

// Race condition timeline:
// Thread A: Read value (100)
// Thread B: Read value (100)        ← Both see 100
// Thread A: Write 101
// Thread B: Write 101               ← Lost update! Should be 102

// This is why we need Atomics

Atomics: Safe Shared Memory Operations

// Atomics provide thread-safe operations

const view = new Int32Array(sharedBuffer);

// Atomic read and write
Atomics.store(view, 0, 42);    // Atomic write
const value = Atomics.load(view, 0);  // Atomic read

// Atomic arithmetic
Atomics.add(view, 0, 5);       // view[0] += 5, atomically
Atomics.sub(view, 0, 3);       // view[0] -= 3, atomically

// Atomic bitwise operations
Atomics.and(view, 0, 0xFF);    // view[0] &= 0xFF
Atomics.or(view, 0, 0x100);    // view[0] |= 0x100
Atomics.xor(view, 0, 0x55);    // view[0] ^= 0x55

// Atomic exchange
const old = Atomics.exchange(view, 0, 99);  // Set to 99, return old value

// Compare-and-swap (CAS) - foundation of lock-free algorithms
const oldValue = Atomics.compareExchange(
  view,
  0,          // index
  expected,   // expected current value
  newValue    // value to set if current === expected
);
// Returns actual old value (compare with expected to know if swap happened)

Compare-and-Exchange: The Building Block

// CAS is how you build lock-free data structures

function atomicIncrement(view, index) {
  while (true) {
    const current = Atomics.load(view, index);
    const next = current + 1;

    // Try to swap current → next
    const actual = Atomics.compareExchange(view, index, current, next);

    if (actual === current) {
      // Swap succeeded! No other thread modified it
      return next;
    }
    // Another thread modified it, retry with new value
  }
}

// This is what Atomics.add does internally, but you can use CAS
// for more complex operations:

function atomicMultiply(view, index, multiplier) {
  while (true) {
    const current = Atomics.load(view, index);
    const next = current * multiplier;

    if (Atomics.compareExchange(view, index, current, next) === current) {
      return next;
    }
    // Retry - another thread interfered
  }
}

Memory Ordering and Happens-Before

┌─────────────────────────────────────────────────────────────────────────────┐
│                    MEMORY ORDERING GUARANTEES                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  JavaScript uses Sequential Consistency for Atomics:                        │
│  ─────────────────────────────────────────────────                         │
│  All atomic operations appear to execute in SOME sequential order           │
│  that is consistent with the program order of each thread.                 │
│                                                                             │
│  HAPPENS-BEFORE RELATIONSHIPS:                                              │
│                                                                             │
│  1. Within a thread, operations happen in program order                     │
│                                                                             │
│  2. An Atomics.store() HAPPENS-BEFORE any Atomics.load() that reads        │
│     the value stored                                                        │
│                                                                             │
│  3. Atomics.wait() HAPPENS-BEFORE the matching Atomics.notify()            │
│                                                                             │
│  Thread A                    Thread B                                       │
│  ─────────                   ─────────                                      │
│                                                                             │
│  data = 42;                                                                 │
│  Atomics.store(flag, 0, 1);  ─────────┐                                    │
│                                        │  HAPPENS-BEFORE                    │
│                              ┌─────────┘                                    │
│                              ▼                                              │
│                              while(Atomics.load(flag,0) === 0);             │
│                              console.log(data);  // GUARANTEED to see 42    │
│                                                                             │
│  WITHOUT Atomics, Thread B might see stale 'data' even after flag is set!  │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Atomics.wait and Atomics.notify: Thread Synchronization

// Atomics.wait() - Block thread until condition is met
// Atomics.notify() - Wake up waiting threads

// IMPORTANT: Atomics.wait() only works in Workers, not main thread!
// (Main thread cannot block)

// Worker code (waiter):
const view = new Int32Array(sharedBuffer);

// Wait for view[0] to become non-zero
// Blocks the thread entirely (no event loop)
const result = Atomics.wait(view, 0, 0);  // Wait while value === 0

// result can be:
// 'ok' - was woken by notify
// 'not-equal' - value wasn't 0 when we checked
// 'timed-out' - timeout expired

// With timeout:
Atomics.wait(view, 0, 0, 1000);  // Wait max 1000ms

// Main thread (notifier):
const view = new Int32Array(sharedBuffer);
view[0] = 1;  // Set the value

// Wake up one waiting thread
const wokenCount = Atomics.notify(view, 0, 1);

// Wake up all waiting threads
Atomics.notify(view, 0, Infinity);

// Wake up returns the number of threads actually woken

Building a Mutex with Atomics

// A simple mutex (mutual exclusion lock)

class Mutex {
  private lockView: Int32Array;
  private lockIndex = 0;
  private UNLOCKED = 0;
  private LOCKED = 1;

  constructor(sharedBuffer: SharedArrayBuffer, byteOffset: number = 0) {
    this.lockView = new Int32Array(sharedBuffer, byteOffset, 1);
    // Initialize to unlocked state
    Atomics.store(this.lockView, this.lockIndex, this.UNLOCKED);
  }

  lock(): void {
    while (true) {
      // Try to acquire lock (CAS: UNLOCKED → LOCKED)
      const oldValue = Atomics.compareExchange(
        this.lockView,
        this.lockIndex,
        this.UNLOCKED,
        this.LOCKED
      );

      if (oldValue === this.UNLOCKED) {
        // Successfully acquired lock
        return;
      }

      // Lock is held by another thread, wait
      Atomics.wait(this.lockView, this.lockIndex, this.LOCKED);
      // When woken, loop and try again
    }
  }

  unlock(): void {
    // Release lock
    Atomics.store(this.lockView, this.lockIndex, this.UNLOCKED);
    // Wake one waiting thread
    Atomics.notify(this.lockView, this.lockIndex, 1);
  }

  // Try to acquire without blocking
  tryLock(): boolean {
    const oldValue = Atomics.compareExchange(
      this.lockView,
      this.lockIndex,
      this.UNLOCKED,
      this.LOCKED
    );
    return oldValue === this.UNLOCKED;
  }
}

// Usage in worker:
const mutex = new Mutex(sharedBuffer, 0);
const data = new Int32Array(sharedBuffer, 4);

function criticalSection() {
  mutex.lock();
  try {
    // Only one thread can be here at a time
    const current = data[0];
    // ... complex operation ...
    data[0] = current + 1;
  } finally {
    mutex.unlock();
  }
}

Building a Lock-Free Queue

// A simple lock-free single-producer, single-consumer queue

class SPSCQueue {
  private buffer: Int32Array;
  private head: number;  // Index for read position
  private tail: number;  // Index for write position
  private capacity: number;

  // Layout: [head, tail, data...]
  private HEAD_INDEX = 0;
  private TAIL_INDEX = 1;
  private DATA_START = 2;

  constructor(sharedBuffer: SharedArrayBuffer, capacity: number) {
    this.buffer = new Int32Array(sharedBuffer);
    this.capacity = capacity;

    // Initialize head and tail to 0
    Atomics.store(this.buffer, this.HEAD_INDEX, 0);
    Atomics.store(this.buffer, this.TAIL_INDEX, 0);
  }

  // Producer calls this
  push(value: number): boolean {
    const tail = Atomics.load(this.buffer, this.TAIL_INDEX);
    const head = Atomics.load(this.buffer, this.HEAD_INDEX);

    const nextTail = (tail + 1) % this.capacity;

    if (nextTail === head) {
      // Queue is full
      return false;
    }

    // Write the value
    this.buffer[this.DATA_START + tail] = value;

    // Publish the new tail (release semantics)
    Atomics.store(this.buffer, this.TAIL_INDEX, nextTail);

    // Wake any waiting consumer
    Atomics.notify(this.buffer, this.TAIL_INDEX, 1);

    return true;
  }

  // Consumer calls this
  pop(): number | null {
    const head = Atomics.load(this.buffer, this.HEAD_INDEX);
    const tail = Atomics.load(this.buffer, this.TAIL_INDEX);

    if (head === tail) {
      // Queue is empty
      return null;
    }

    // Read the value
    const value = this.buffer[this.DATA_START + head];

    // Advance head
    const nextHead = (head + 1) % this.capacity;
    Atomics.store(this.buffer, this.HEAD_INDEX, nextHead);

    return value;
  }

  // Consumer can wait for data
  popBlocking(): number {
    while (true) {
      const value = this.pop();
      if (value !== null) {
        return value;
      }

      // Wait for producer to publish
      const tail = Atomics.load(this.buffer, this.TAIL_INDEX);
      Atomics.wait(this.buffer, this.TAIL_INDEX, tail);
    }
  }
}

Real-World Use Cases

Use Case 1: Parallel Image Processing

// Main thread
async function processImageParallel(imageData: ImageData): Promise<ImageData> {
  const { width, height, data } = imageData;

  // Create shared buffer for image data
  const sharedBuffer = new SharedArrayBuffer(data.length);
  const sharedView = new Uint8ClampedArray(sharedBuffer);
  sharedView.set(data);

  // Divide work among workers
  const numWorkers = navigator.hardwareConcurrency || 4;
  const rowsPerWorker = Math.ceil(height / numWorkers);

  const workers = [];
  const promises = [];

  for (let i = 0; i < numWorkers; i++) {
    const worker = new Worker('image-worker.js');
    workers.push(worker);

    const startRow = i * rowsPerWorker;
    const endRow = Math.min((i + 1) * rowsPerWorker, height);

    const promise = new Promise((resolve) => {
      worker.onmessage = () => resolve(undefined);
      worker.postMessage({
        buffer: sharedBuffer,
        width,
        startRow,
        endRow
      });
    });

    promises.push(promise);
  }

  // Wait for all workers to complete
  await Promise.all(promises);

  // Create result (data is already in sharedView)
  const result = new ImageData(width, height);
  result.data.set(sharedView);

  // Cleanup workers
  workers.forEach(w => w.terminate());

  return result;
}

// image-worker.js
self.onmessage = function(e) {
  const { buffer, width, startRow, endRow } = e.data;
  const view = new Uint8ClampedArray(buffer);

  for (let y = startRow; y < endRow; y++) {
    for (let x = 0; x < width; x++) {
      const i = (y * width + x) * 4;

      // Example: Grayscale conversion
      const gray = view[i] * 0.299 + view[i+1] * 0.587 + view[i+2] * 0.114;
      view[i] = gray;
      view[i+1] = gray;
      view[i+2] = gray;
      // Alpha unchanged
    }
  }

  self.postMessage('done');
};

Use Case 2: Parallel Data Processing in Node.js

// main.js
const { Worker, isMainThread, parentPort, workerData } = require('worker_threads');

if (isMainThread) {
  async function parallelSum(numbers: number[]): Promise<number> {
    // Create shared buffer for numbers
    const sharedBuffer = new SharedArrayBuffer(numbers.length * 4);
    const view = new Int32Array(sharedBuffer);
    numbers.forEach((n, i) => view[i] = n);

    // Create buffer for partial sums
    const numWorkers = 4;
    const resultsBuffer = new SharedArrayBuffer(numWorkers * 4);
    const results = new Int32Array(resultsBuffer);

    const workers = [];
    const chunkSize = Math.ceil(numbers.length / numWorkers);

    for (let i = 0; i < numWorkers; i++) {
      const start = i * chunkSize;
      const end = Math.min(start + chunkSize, numbers.length);

      const worker = new Worker(__filename, {
        workerData: {
          dataBuffer: sharedBuffer,
          resultsBuffer,
          workerIndex: i,
          start,
          end
        }
      });

      workers.push(new Promise((resolve, reject) => {
        worker.on('message', resolve);
        worker.on('error', reject);
      }));
    }

    await Promise.all(workers);

    // Sum partial results
    let total = 0;
    for (let i = 0; i < numWorkers; i++) {
      total += results[i];
    }

    return total;
  }

  // Test it
  const numbers = Array.from({ length: 10000000 }, (_, i) => i);
  parallelSum(numbers).then(console.log);

} else {
  // Worker code
  const { dataBuffer, resultsBuffer, workerIndex, start, end } = workerData;
  const data = new Int32Array(dataBuffer);
  const results = new Int32Array(resultsBuffer);

  let sum = 0;
  for (let i = start; i < end; i++) {
    sum += data[i];
  }

  // Write result atomically
  Atomics.store(results, workerIndex, sum);
  parentPort.postMessage('done');
}

Use Case 3: Real-Time Audio Processing

// Audio worklet with shared buffer for zero-copy audio processing

// main.js
async function setupAudioProcessing() {
  const audioContext = new AudioContext();
  await audioContext.audioWorklet.addModule('processor.js');

  // Shared buffer for audio parameters
  const paramBuffer = new SharedArrayBuffer(16);
  const params = new Float32Array(paramBuffer);

  // Create processor node
  const processorNode = new AudioWorkletNode(audioContext, 'shared-processor', {
    processorOptions: { paramBuffer }
  });

  // Connect to output
  const source = audioContext.createOscillator();
  source.connect(processorNode);
  processorNode.connect(audioContext.destination);

  // Update parameters from main thread (no message passing!)
  function setVolume(value: number) {
    Atomics.store(params, 0, value);  // Instant update
  }

  function setFrequency(value: number) {
    Atomics.store(params, 1, value);
  }

  return { setVolume, setFrequency };
}

// processor.js (AudioWorklet)
class SharedProcessor extends AudioWorkletProcessor {
  private params: Float32Array;

  constructor(options) {
    super();
    this.params = new Float32Array(options.processorOptions.paramBuffer);
  }

  process(inputs, outputs, parameters) {
    const output = outputs[0];
    const volume = Atomics.load(this.params, 0);
    const frequency = Atomics.load(this.params, 1);

    for (let channel = 0; channel < output.length; channel++) {
      const outputChannel = output[channel];
      for (let i = 0; i < outputChannel.length; i++) {
        // Apply volume from shared buffer (updated in real-time)
        outputChannel[i] = inputs[0][channel][i] * volume;
      }
    }

    return true;
  }
}

registerProcessor('shared-processor', SharedProcessor);

When NOT to Use SharedArrayBuffer

// SharedArrayBuffer adds complexity. Use it only when:
// 1. postMessage copying overhead is a measurable bottleneck
// 2. You need true parallelism (not just concurrency)
// 3. You understand the synchronization requirements

// DON'T use for:

// ❌ Simple worker communication
// Use postMessage with Transferable objects instead
worker.postMessage(arrayBuffer, [arrayBuffer]);  // Transfer, not copy

// ❌ Sharing small amounts of data
// The synchronization overhead exceeds the copying cost

// ❌ Data that's read-only
// Just copy it; no synchronization needed for reads

// ❌ When you're not sure about thread safety
// Bugs are subtle and hard to reproduce

// DO use for:

// ✅ Large datasets processed in parallel
// ✅ Real-time applications (audio, video)
// ✅ Compute-intensive algorithms (physics, ML inference)
// ✅ When profiling shows postMessage is a bottleneck

Security Considerations

// SharedArrayBuffer was disabled in browsers after Spectre/Meltdown
// It's re-enabled with Cross-Origin Isolation:

// Required HTTP headers:
// Cross-Origin-Opener-Policy: same-origin
// Cross-Origin-Embedder-Policy: require-corp

// In your server:
res.setHeader('Cross-Origin-Opener-Policy', 'same-origin');
res.setHeader('Cross-Origin-Embedder-Policy', 'require-corp');

// Check if available:
if (typeof SharedArrayBuffer !== 'undefined') {
  // Can use SharedArrayBuffer
} else {
  // Fall back to postMessage with transfers
}

// Cross-origin resources need:
// Cross-Origin-Resource-Policy: cross-origin
// Or be served from the same origin

Key Takeaways

SharedArrayBuffer enables true parallelism: Multiple threads can read and write the same memory, enabling parallel algorithms impossible with postMessage.
Atomics prevent data races: Without Atomics, concurrent reads/writes produce undefined results. Use Atomics for ALL shared memory access.
CAS is the foundation of lock-free programming: Atomics.compareExchange enables building complex synchronization primitives without locks.
Atomics.wait/notify are for blocking synchronization: Like condition variables in other languages. Only work in workers, not main thread.
Sequential consistency is guaranteed: Atomic operations have well-defined memory ordering. Non-atomic operations on shared memory do not.
Cross-origin isolation is required in browsers: Security headers must be set for SharedArrayBuffer to be available.
Use sparingly and measure first: SharedArrayBuffer adds complexity. Only use when postMessage overhead is a proven bottleneck.
Real use cases exist: Parallel image/video processing, real-time audio, compute-intensive algorithms benefit from true parallelism.

SharedArrayBuffer and Atomics bring systems programming to JavaScript. With great power comes great responsibility—you're now dealing with the same concurrency challenges that C++ and Rust programmers face. Use these tools when parallelism genuinely solves your problem, and always default to simpler message-passing when it's sufficient.

What did you think?