maxConcurrency Scheduling Issue (issue #10097)

The Problem

With describe.concurrent and maxConcurrency: 5, users expect at most 5 tests' resources to be held simultaneously. Instead, all N concurrent tests fire their beforeEach hooks (allocating resources) before any test body runs — producing N simultaneous resource allocations regardless of maxConcurrency.

Root cause: the current scheduler is effectively BFS over the task tree. All nodes at depth N complete before depth N+1 begins.

The Task Tree

Every test run is a recursive tree where every node has a lifecycle (not just leaf tests):

File
└── Suite  [aroundAll / beforeAll → children → afterAll]
    ├── Suite  [aroundAll / beforeAll → children → afterAll]
    │   ├── Test  [aroundEach / beforeEach → body → afterEach]
    │   └── Test  [aroundEach / beforeEach → body → afterEach]
    └── Test  [aroundEach / beforeEach → body → afterEach]

Resources can be allocated at any level of this tree (not just in beforeEach). A suite's beforeAll opens resources for the whole suite. Nested concurrent suites can each open their own beforeAll resources. The BFS problem applies at every level, not just at the leaf (test) level.

BFS vs DFS

Current (BFS): In a concurrent group of N children, all N subtrees start simultaneously. Their operations queue into a flat FIFO. Result: all beforeEach hooks across all N tests complete before any test body runs.

Desired (DFS with bounded parallelism): In a concurrent group, start at most maxConcurrency children at a time. When one child's entire subtree completes (including all its hooks at every level), free that slot to the next sibling. This keeps at most maxConcurrency subtrees genuinely in-flight at any level of the tree.

Correctness Claim

At any concurrent group in the tree, at most maxConcurrency children's subtrees are simultaneously in-flight.

"In-flight" means: the subtree has started (its first before-hook has begun) but not yet completed (its last after-hook has not yet finished). This invariant holds recursively at every level of the tree.

This claim is:

Sufficient to bound resource ownership at every level
Deadlock-free: each subtree holds one "slot" in its parent group, no nesting of slots across levels, no cycles
General: applies uniformly to suites with beforeAll/afterAll/aroundAll and tests with beforeEach/afterEach/aroundEach, at any depth

Abstract Solution Shape

The fix lives at the concurrent dispatch point — wherever the tree fans out into concurrent children. Instead of launching all children simultaneously, launch at most K, and when one subtree fully completes, start the next:

runConcurrentGroup(children, K):
  run at most K children simultaneously
  when a child's full subtree completes → start next child

Applied recursively at every concurrent group in the tree, this naturally produces DFS-ordered execution with bounded parallelism.

No special-casing of leaf tests needed. No changes to the hook call chain needed. The tree structure itself enforces the invariant.

Implementation Plan

Core change: per-group limiter at concurrent dispatch

In runSuite, where a concurrent group is dispatched, wrap each child with a short-lived limiter:

// before
await Promise.all(tasksGroup.map(c => runSuiteChild(c, runner)))

// after
const groupLimiter = limitConcurrency(runner.config.maxConcurrency)
await Promise.all(tasksGroup.map(c => groupLimiter(() => runSuiteChild(c, runner))))

Each child holds one slot in groupLimiter for the entire duration of its subtree. The slot is released only when runSuiteChild resolves (after all hooks at all levels within that subtree have completed). When a slot frees, the next waiting sibling starts.

The limiter instance is created per concurrent group and GC'd when the group finishes. Multiple instances exist simultaneously only if concurrent groups are nested or running in parallel — each scoped to its own group, no cross-level interference.

Why a single global instance would deadlock

A global instance with subtree-scoped holding would deadlock: a parent concurrent group holds K slots (one per in-flight child), and when those children contain their own concurrent groups, the children try to acquire more slots from the same exhausted pool. Per-group instances avoid this entirely — each level has its own independent pool.

What happens to the existing per-operation `limitMaxConcurrency`

The current global limitMaxConcurrency wraps individual hook calls and the test body. With the new model these are two orthogonal concerns:

Subtree-level concurrency (the fix): per-group limiter at dispatch — bounds how many sibling subtrees are in-flight simultaneously. This is the resource-ownership guarantee.
Within-lifecycle hook parallelism: when sequence.hooks = 'parallel', multiple hooks within a single lifecycle run concurrently. This is independent of subtree concurrency and can use its own limiter (or be left unbounded, since the number of hooks per lifecycle is small and fixed).

The existing global limitMaxConcurrency was conflating both concerns. After the fix, it can be removed or narrowed to concern (2) only.

Walkthrough: How the Fix Solves the BFS Problem

Concrete example: 400 concurrent tests, maxConcurrency=5, each test does beforeEach(alloc) → body → afterEach(release).

Old behavior (global FIFO, no dispatch bounding)

All 400 runTest calls start immediately via Promise.all. Each enqueues into the global FIFO:

Queue: [t1.bE, t2.bE, t3.bE, ..., t400.bE]   ← all 400 enqueued upfront
running: t1–t5

t1.bE completes → slot freed → t6.bE starts
t1 enqueues t1.body at the BACK of the queue:
Queue: [t6.bE, t7.bE, ..., t400.bE, t1.body]

→ t1.body sits behind 394 other beforeEach items
→ by the time t1.body runs, all 400 beforeEach have completed
→ 400 resources allocated simultaneously

New behavior (per-group limiter at dispatch, K=5)

Only 5 runTest calls start at all. The other 395 wait on the group limiter before even beginning:

groupLimiter slots: [t1, t2, t3, t4, t5]   ← t6–t400 haven't started at all

Each of t1–t5 runs its lifecycle as a sequential chain:
  t1: beforeEach → body → afterEach
  t2: beforeEach → body → afterEach
  ...

When t1 fully completes → slot freed → t6 starts its lifecycle
At most 5 resources held at any time.

Why it works

Two properties hold together:

Only K tests exist at all — the rest haven't started, so their beforeEach hasn't been called yet.
Within each started test, the lifecycle is a sequential chain — t1.body follows immediately after t1.beforeEach with nothing in between, not after 394 other items.

The bounding happens before the lifecycle begins, not inside it. There is no global queue for individual operations to race in.

Two Models: Right Math, Wrong Domain

PR #9653 introduced a per-operation global FIFO limiter to fix beforeAll throttling (#8367). It had its own clean mathematical claim:

"At most K operations execute simultaneously."

This is correct and uniform — every hook call and test body competes equally for K slots. No special cases. But it was modeling at the wrong level of abstraction for the domain.

The mismatch: users reason about maxConcurrency in terms of test lifecycles ("at most K tests running at once, so at most K database connections open"). They don't think in terms of individual hook/body calls. A test lifecycle spans many sequential operations — beforeEach, body, afterEach — and the per-operation model makes no promise about how many lifecycles are simultaneously open. As shown in the walkthrough above, all 400 lifecycles can be open at once even with K=5.

The per-operation model was also not simple in practice. To make it safe, it required:

A "leaf only takes lock" discipline (only leaf operations acquire slots, not wrappers)
Explicit setup/teardown slot acquisition in callAroundHooks to manage the aroundEach phases
Careful avoidance of nested acquisition to prevent deadlock

All of that complexity was compensating for the fact that the model was correct but misaligned with the domain.

The per-group dispatch model aligns implementation with mental model:

"At any concurrent group, at most K subtrees are simultaneously in-flight."

This is what users mean when they set maxConcurrency. The complexity in callAroundHooks evaporates because the invariant is now enforced at the right level — before a lifecycle begins, not inside it.

hi-ogawa/maxconcurrency-scheduling-10097.md

Select an option

No results found

Select an option

No results found

maxConcurrency Scheduling Issue (issue #10097)

The Problem

The Task Tree

BFS vs DFS

Correctness Claim

Abstract Solution Shape

Implementation Plan

Core change: per-group limiter at concurrent dispatch

Why a single global instance would deadlock

What happens to the existing per-operation `limitMaxConcurrency`

Walkthrough: How the Fix Solves the BFS Problem

Old behavior (global FIFO, no dispatch bounding)

New behavior (per-group limiter at dispatch, K=5)

Why it works

Two Models: Right Math, Wrong Domain

dbousamra commented Apr 9, 2026

Uh oh!

hi-ogawa/maxconcurrency-scheduling-10097.md

maxConcurrency Scheduling Issue (issue #10097)

The Problem

The Task Tree

BFS vs DFS

Correctness Claim

Abstract Solution Shape

Implementation Plan

Core change: per-group limiter at concurrent dispatch

Why a single global instance would deadlock

What happens to the existing per-operation limitMaxConcurrency

Walkthrough: How the Fix Solves the BFS Problem

Old behavior (global FIFO, no dispatch bounding)

New behavior (per-group limiter at dispatch, K=5)

Why it works

Two Models: Right Math, Wrong Domain

dbousamra commented Apr 9, 2026

Uh oh!

What happens to the existing per-operation `limitMaxConcurrency`