[core] Fix abort signal not reflected in subsequent step (replay-ordering flake)#2412
[core] Fix abort signal not reflected in subsequent step (replay-ordering flake)#2412VaguelySerious wants to merge 2 commits into
Conversation
An AbortController aborted from a step is reflected into the workflow VM via a hook_received event, but _setAborted was deferred behind an async hydrateStepReturnValue (reason decrypt/deserialize) on the promiseQueue. Unlike step-result and hook-payload deliveries, the abort delivery did not participate in ctx.pendingDeliveries, so scheduleWhenIdle (which the suspension handler uses to gate dehydration of queued step arguments) could fire while the abort was still in flight. A step dispatched right after the abort that received controller.signal then had its arguments serialized with aborted=false. Bump pendingDeliveries around the abort delivery so the suspension waits for _setAborted to land, matching the existing step/hook-payload pattern. This fixes the intermittent abortFromStepWorkflow E2E failure (stepSawAborted=false). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
🦋 Changeset detectedLatest commit: 6c1164f The changes in this PR will be included in the next version bump. This PR includes changesets to release 16 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
🧪 E2E Test Results✅ All tests passed Summary
Details by Category✅ ▲ Vercel Production
✅ 💻 Local Development
✅ 📦 Local Production
✅ 🐘 Local Postgres
✅ 🪟 Windows
✅ 📋 Other
|
📊 Benchmark Results
workflow with no steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express workflow with 1 step💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) workflow with 10 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) | Nitro workflow with 25 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) workflow with 50 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) Promise.all with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Express | Nitro Promise.all with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Express | Nitro Promise.all with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Express | Nitro Promise.race with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express Promise.race with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express Promise.race with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Express | Nitro workflow with 10 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) workflow with 25 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express workflow with 50 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Express | Nitro workflow with 10 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) workflow with 25 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) workflow with 50 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) Stream Benchmarks (includes TTFB metrics)workflow with stream💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) stream pipeline with 5 transform steps (1MB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) 10 parallel streams (1MB each)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express fan-out fan-in 10 streams (1MB each)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Nitro | Express SummaryFastest Framework by WorldWinner determined by most benchmark wins
Fastest World by FrameworkWinner determined by most benchmark wins
Column Definitions
Worlds:
❌ Some benchmark jobs failed:
Check the workflow run for details. |
The abortFromStep E2E flake had a second, dominant cause beyond the suspension-gate race: a step that aborts a controller records the durable hook_received event via an op that was flushed in the *background* (safeWaitUntil), not awaited before step_completed. So the workflow continuation enqueued by step_completed could run — resolving Promise.all and dispatching a later step with the controller.signal — before hook_received existed, serializing a stale aborted=false signal. Add a preCompletionOps bucket to the step context for ops that must be durable before the step's terminal event. Route the abort hook resume there (keeping the real-time stream write in the background ops), and await it before step_completed/step_failed in both the queue step handler and the inline step executor. Combined with the pendingDeliveries suspension-gate fix, the workflow continuation now both has the hook_received in its log and waits for its hydration before serializing downstream step arguments. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Problem
The E2E test
abortFromStepWorkflow: step abort cancels an in-flight sibling stepflakes withstepSawAborted === false. The workflow aborts a controller from one step (abortFromStep), then — after the parallel work settles — passescontroller.signalto a subsequent step (checkSignalState), which readsaborted: false. (workflowAbortedand the in-flight sibling's stream-based cancel are unaffected, which is the tell.)Root cause (two parts)
When a step aborts a controller, the abort is reflected into the workflow VM via a
hook_receivedevent. Two independent ordering gaps letcheckSignalStateserialize a stale, non-aborted signal:Durable abort flushed in the background. The step-side
abort()records the durablehook_receivedevent via an op pushed intoctx.ops, which is flushed in the background (safeWaitUntil) — not awaited beforestep_completedis written or before the workflow continuation is enqueued. So the continuation can resolvePromise.alland dispatchcheckSignalStatebeforehook_receivedexists. This is the dominant cause.Suspension gate ignored the abort delivery. Even when
hook_receivedis in the log, the consumer's_setAbortedis deferred behindawait hydrateStepReturnValue(...)(async reason decrypt/deserialize) on thepromiseQueue. Step-result and hook-payload deliveries bumpctx.pendingDeliveriesso the suspension handler waits for them before dehydrating the next step's arguments; the abort delivery did not, so the suspension could serializecheckSignalState'ssignalbefore the abort landed.Fix
preCompletionOpsbucket to the step context for operations that must be durable before the step's terminal event. Route the abort's hook resume there (keeping the real-time stream write in the backgroundops, since it must reach the in-flight sibling ASAP). Await it beforestep_completed/step_failedin both the queue step handler and the inline step executor.ctx.pendingDeliveriesaround the abort delivery so the suspension waits for_setAbortedbefore dehydrating downstream step arguments.Together: the continuation now both has
hook_receivedin its event log and waits for its hydration before serializing the next step'ssignal.Tests
abort-replay-ordering.test.ts— injects hydration latency past a macrotask; asserts the suspension/idle gate only fires once the signal has aborted, and the in-flight abort is counted as a pending delivery.abort-controller-step.test.ts— new test exercising the realreviveAbortControllerviahydrateStepArguments; asserts the durable hook resume is routed topreCompletionOps(not the backgroundops) and fires with the correct payload when that bucket is drained.Both fail against the pre-fix code and pass with the fix. Existing abort/step/suspension/workflow suites pass (542 tests in the runtime+workflow sweep).