BLOG
Inside a Custom Distributed Task Scheduler: Redis Locks, Heartbeats, and Zero-Poll Concurrency
PUBLISHED
January 18, 2026
READ TIME
22 min read
TOPICS
SUMMARY
A deep technical breakdown of a production distributed task scheduler built without BullMQ: instance heartbeats with recursive setTimeout, controller election over Postgres, in-process locks using a Map of resolve callbacks, global Redis SET NX locks with pub/sub notification instead of polling, Postgres-backed durable tasks that survive instance crashes, and the full background price pipeline as a working end-to-end example.
The platform I work on is a real-time financial intelligence application. It needs continuous background work: prices need to stay fresh for tickers no one is actively viewing, stock profiles need to refresh periodically, financial data needs to update as companies file quarterly reports. Each of these jobs needs to run on exactly one instance at a time and recover cleanly if an instance crashes mid-task.
The standard answer for this is BullMQ. But the team made a deliberate call to skip it. BullMQ introduces a new system to reason about: a queue abstraction with its own workers, processors, concurrency model, and failure semantics sitting on top of Redis. The requirements here were well-understood enough that the same guarantees could be built directly from primitives, with no new dependency.
The result is a custom scheduler built into the platform's core. It covers the same ground as BullMQ for this use case: periodic jobs, distributed locking, durable tasks with retry, instance health tracking, and rate limiting. It is the foundation that every piece of background work in the platform runs on top of.
I've spent a lot of time reading through how it works and using it while building features on top of the platform. This post is a detailed breakdown of the architecture: how each piece is built and why the design decisions were made the way they were.
Two Roles: Controller and Worker
Every backend instance starts as one of two roles, or both:
The controller is responsible for the global task loop: running interval and cron tasks, monitoring instance health, and assigning queued work to available worker instances. There is one controller in the system at any time.
Workers execute tasks sent to them. They join task rooms, process the payloads, and report success or failure back through the database.
An instance can be both. In a single-server deployment, one instance is both controller and worker. In a scaled deployment, a dedicated controller instance manages the cluster and separate worker instances do the execution.
Instance Registration and the Heartbeat
The first thing watch() does is register the instance in Postgres with a crypto.randomUUID() ID, then start a heartbeat loop that runs every 1 second:
The heartbeat uses recursive setTimeout rather than setInterval. This is deliberate: if the DB write takes 200ms, the next heartbeat fires 1 second after the write completes, not 1 second after it started. With setInterval, slow writes could cause heartbeats to overlap. With recursive setTimeout, they are always sequential.
After 30 consecutive failures, if no tasks are in flight, the process exits. If tasks are still running, the instance keeps trying. It does not exit while work is in progress. _workerTasksAbortControllers is a Set<AbortController> that holds an entry for each active task, and its size is the count of currently executing work.
The Controller Check Loop
The controller runs a check loop every 2 seconds:
Step 1 cleans up dead instances. Any instance that has not heartbeated in 30 seconds is gone. Step 2 finds instances that have heartbeated within the last 2 seconds, a stricter window that verifies they are currently alive. Step 3 writes the healthy list to Redis so other parts of the system can look up available workers without hitting Postgres. Step 4 picks up any tasks that were abandoned when their assigned instance died.
An idle task is an InstanceTask row where instance_id is null. This happens when an instance crashes mid-task: the failure handler sets instance_id = null, which flags the task for reassignment. The controller picks these up and routes them to a new worker.
Task Registration
Before the scheduler starts, tasks are registered by type:
Four task types exist: startup (runs once on boot), startup-worker (runs once on boot, workers only), interval (runs on a loop with a fixed delay between completions), and cron (runs on a cron schedule). The name option wraps execution in a global distributed lock so only one instance runs it at a time. All tasks receive an AbortController from the scheduler.
Interval Scheduling: Recursive setTimeout
Interval tasks use recursive setTimeout, not setInterval:
The next iteration is scheduled after the current one completes. If the handler takes 8 seconds and the interval is 3 seconds, the next run fires 3 seconds after completion, not 3 seconds after the previous one started. Overlapping executions cannot accumulate. The abort signal propagates cleanly.
Named interval tasks pass through runInSequenceGlobal. This means the interval for update-stock-profiles will try to acquire a distributed lock on every tick. If another instance is already running the same job, the current instance waits. This provides automatic single-instance execution without any explicit leader election.
Cron Scheduling: 60-Second Windows
Cron tasks work differently. The controller runs an outer setInterval every 60 seconds that calls checkAllTasks(). For each cron task, the scheduler parses the expression and checks how long until the next fire time. If the next scheduled fire time is more than 60 seconds away, the task is skipped. If it is within the current minute window, it runs. Each cron task fires once per scheduled window with ±60 second precision.
Cron tasks also pass through runInSequenceGlobal. A weekly cron job that runs on Sunday at 3 AM will only execute on one instance even if the cluster has ten.
The Local Sequence Lock
runInSequenceLocal is an in-process lock. No Redis. No network. It uses two Maps:
_locks maps a unique lock key to a grantLock callback, specifically the resolve function of the promise the waiter is blocking on. _locksLastSequence tracks a counter per sequence ID for lane assignment.
With maxConcurrency = N, incoming requests are distributed across N lanes using a counter modulo N. Each lane is an independent sequential queue. N lanes means N operations can run concurrently, each lane serializing its own callers.
When a caller is first in their lane's queue, the lock is granted immediately. Otherwise, the caller registers its resolve as the grantLock callback and waits. On completion, the holder deletes its entry from the map, finds the next entry in the same lane, and calls its resolve function. No polling. No setInterval. No busy waiting. Each handoff is a single function call at zero additional cost.
maxConcurrencyMsWindow adds rate limiting on top. After each operation, the holder waits for the remaining fraction of the window before releasing. If the window is 1000ms and concurrency is 10, each of the 10 lanes must take at least 100ms. The throughput is bounded to at most maxConcurrency operations per maxConcurrencyMsWindow milliseconds. This is how the SEC EDGAR rate limiter works: maxConcurrency: 10, maxConcurrencyMsWindow: 1000 enforces exactly 10 requests per second.
The Global Sequence Lock
runInSequenceGlobal is the distributed equivalent. It uses Redis SET NX EX for acquisition and Redis pub/sub for notification.
SET NX EX is atomic. If two instances call it simultaneously, exactly one succeeds. The other gets null back and enters the wait loop.
The wait is not polling. Before the loop, each waiter subscribes to a Redis pub/sub channel named after its own unique lock key. When the current lock holder finishes, it looks up all waiters by calling getRoomIds(lockGroup:*), sorts them by the timestamp embedded in their key, and emits to the first one. The waiter wakes up, acquires the lock, and runs its handler. Ordering is FIFO, guaranteed by the timestamp in the lock key.
The 5-second setTimeout fallback handles the case where the pub/sub notification is lost. Without it, a missed notify would block the waiter indefinitely. Under normal conditions, the promise resolves via the room notification, not the timeout.
Lock renewal keeps the TTL alive during long operations. The lock has a 15-second TTL. The renewal interval runs every 7.5 seconds. This means two renewals can fail before the lock expires. If the process crashes without releasing the lock, the TTL ensures automatic expiry and another instance can acquire it within 15 seconds.
With maxConcurrency > 1, lane assignment uses an atomic Redis INCR counter. Two concurrent callers on different instances get different sequence numbers and therefore different lanes. Each lane is an independent SET NX lock. N lanes means N concurrent holders across the entire cluster.
Durable Worker Tasks
Beyond periodic jobs, the scheduler supports durable tasks: units of work persisted in Postgres that survive instance crashes.
The task is written to Postgres first. This is the guarantee: the task exists regardless of what happens next. The ROOMS.emit is the delivery: it notifies the assigned worker immediately. If the assigned instance is unreachable, the DB record survives and the controller's idle task scan picks it up.
On success: delete the task. On failure: set instance_id = null. The task becomes idle. The controller's _checkForIdleTasks scans for rows with instance_id: null every 2 seconds, increments the retry counter, and reassigns. When retries exceeds max_retries, the task is routed to a fail room where the caller's onTaskFail handler decides what final failure means.
_getAvailableInstanceId reads the healthy instance list from Redis and shuffles it before selecting, distributing assignments across workers. If no healthy workers are available, instance_id is set to null and the task waits in the idle queue until a worker comes online.
Room Tasks: Ephemeral Fan-Out
createRoomTask is the stateless sibling of createWorkerTask. No database row. No retry. It distributes work across all available workers in one call:
If there are 3 workers and 9 tasks, each worker gets 3. The tasks are chunks of the input array, not independent retried units. Room tasks are for fire-and-forget fan-out operations where durability is not required.
Graceful Shutdown
SCHEDULER.unwatch() waits for all in-flight tasks to drain before shutting down. It polls every 30ms until both the active task set and the local lock map are empty. The scheduler also aborts the global signal controller, which propagates into every handler via the abort parameter. Tasks that check abort.signal.aborted stop early. Tasks that do not check it run to completion before the process exits.
How Everything Connects
Here is how these pieces fit together in the background price pipeline as a concrete example:
The interval runs every 3 seconds. The name option wraps the entire worker in a runInSequenceGlobal lock, so only one instance runs the sweep at any time. Inside the sweep, each individual snapshot fetch is wrapped in a second runInSequenceGlobal with maxConcurrency: 2, meaning at most 2 REST calls in flight globally. Two nested distributed locks. The outer one serializes the sweep across instances. The inner one rate-limits the API calls within the sweep.
What This Architecture Gets Right
The most interesting design decision is the notify-not-poll pattern in runInSequenceGlobal. The naive distributed lock implementation polls Redis every N milliseconds to check if the key has been released. This wastes CPU, adds unnecessary Redis load, and introduces fixed latency proportional to the polling interval. Subscribing to a pub/sub channel and waking on notification means the waiting instance responds immediately when the lock is available. The 5-second fallback timeout exists purely as a safety net. Under normal operation, it never fires.
The two-tier lock design (local and global) is worth having because not everything needs cross-instance coordination. The SEC EDGAR rate limiter only needs per-process limiting; using a Redis lock for it would add unnecessary network round trips. runInSequenceLocal handles it with zero Redis calls. runInSequenceGlobal handles the cases that need true cluster-wide serialization.
The lock TTL renewal is a necessary component that many distributed lock tutorials omit. A lock with a fixed TTL that does not get renewed will expire if the operation runs longer than the TTL. The holder then loses the lock while still inside the critical section, and another instance can acquire it simultaneously, which is exactly the race the lock was supposed to prevent. Renewing at half the TTL interval gives two renewal windows before expiry.
The durable task design uses Postgres as the source of truth rather than Redis. This means tasks survive Redis restarts, flushes, or network partitions. A Redis queue (what BullMQ uses) is faster but less durable. For tasks that represent real work with real failure modes, Postgres persistence is the right tradeoff.
Reading deeply through this codebase has made me want to build something like this from scratch, not because this implementation needs replacing, but because the only way to truly internalize these patterns is to work through the failure modes yourself. The primitives are approachable: a heartbeat loop, a SET NX Redis lock, a pub/sub waiter map, a Postgres task table with a nullable instance_id. The interesting work is in the edge cases. What happens when the lock TTL expires mid-operation? When a pub/sub notification is dropped? A minimal reimplementation, even in a toy project, would surface all of that and make every decision in this post something you made rather than something you read about.