[world-vercel] Switch event endpoints to v4 wire format#2055
Conversation
Mirrors the v4 server-side handlers landing in workflow-server. The
v4 wire format moves event metadata into x-wf-* request/response
headers and treats payloads as opaque user-data bytes (streamed
end-to-end). The SDK passes Uint8Array bytes through unchanged at
this layer; higher-level world-vercel adapter glue handles CBOR.
Adds:
- packages/world-vercel/src/frames.ts: encoder + async-iterable
decoder for the length-prefixed binary frame format used by the
v4 list-events response.
- packages/world-vercel/src/events-v4.ts: three new functions:
* createWorkflowRunEventV4 — POST with x-wf-* headers + payload
bytes, returns event/run ids and timestamp from response
headers.
* getEventV4 — GET single event, returns metadata + body bytes.
* getWorkflowRunEventsV4 — GET list, parses frame stream, returns
events + pagination cursor.
- V4_HEADERS exported as the canonical name map; mirrors the
server-side constant.
V4 client characteristics:
- Required runId in URL for run_created too (no /runs/null/events
shortcut; the runId is part of the S3 key the server allocates).
Higher-level callers generate the ULID client-side.
- Payload bytes flow through without CBOR encode/decode on this
layer. Callers CBOR-encode for parity with v3 if they want.
- Pagination cursor surfaces in the LIST response — eliminates the
per-large-payload /refs round-trip used by v2/v3.
Tests (10 new in src/frames.test.ts, no new e2e):
- Canonical wire layout round-trip.
- Multi-frame round-trip with pagination cursor.
- Decoder survives 1-byte chunk delivery (matching spike B's chunk-
boundary robustness requirement).
- 64 KB body split across many small chunks.
- Bodies containing 0xff padding don't mis-frame.
- Back-to-back frames in a single chunk.
- Truncated stream raises.
- Meta CBOR types (numbers, booleans, arrays) preserved.
The world-vercel adapter still defaults to the v3 path; v4 is exposed
for direct callers and a follow-up PR will switch the adapter over
once the matching server-side PR is on staging.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
🦋 Changeset detectedLatest commit: 9bd6543 The changes in this PR will be included in the next version bump. This PR includes changesets to release 17 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
🧪 E2E Test Results❌ Some tests failed Summary
❌ Failed Tests▲ Vercel Production (1 failed)astro (1 failed):
Details by Category❌ ▲ Vercel Production
✅ 💻 Local Development
✅ 📦 Local Production
✅ 🐘 Local Postgres
✅ 🪟 Windows
✅ 📋 Other
❌ Some E2E test jobs failed:
Check the workflow run for details. |
📊 Benchmark Results
workflow with no steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) | Nitro workflow with 1 step💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Express | Nitro workflow with 10 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) | Nitro workflow with 25 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) | Nitro workflow with 50 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) Promise.all with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) Promise.all with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) Promise.all with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Express | Nitro Promise.race with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express Promise.race with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express Promise.race with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express workflow with 10 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) workflow with 25 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) workflow with 50 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) workflow with 10 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) | Nitro workflow with 25 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) workflow with 50 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) Stream Benchmarks (includes TTFB metrics)workflow with stream💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express stream pipeline with 5 transform steps (1MB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) 10 parallel streams (1MB each)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express fan-out fan-in 10 streams (1MB each)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) | Nitro SummaryFastest Framework by WorldWinner determined by most benchmark wins
Fastest World by FrameworkWinner determined by most benchmark wins
Column Definitions
Worlds:
|
Sets WORKFLOW_SERVER_URL_OVERRIDE in packages/world-vercel/src/utils.ts to https://workflow-server-git-peter-v4.vercel.sh so that e2e tests running off this SDK branch exercise the v4-enabled workflow-server preview instead of production. The override is the inline mechanism documented at the constant — when set, it wins over both the default (https://vercel-workflow.com) and the VERCEL_WORKFLOW_SERVER_URL env var. The same pattern is used in v4 testing on the workflow- server side: CI rewrites this string on PR branches. Reset to '' before merging to main. Companion to vercel/workflow-server#439. Updates four tests in utils.test.ts that previously assumed the override is empty. Each affected assertion gets a comment noting what the expectation looks like on main; flipping back to the main behavior is a one-line edit per test when the override is reset. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ped-error parity
- Feed undici response bodies into decodeFrames as AsyncIterables instead
of converting via dynamic import('node:stream').Readable.toWeb — the
dynamic import resolves to an empty namespace in Next.js webpack server
bundles and crashed every events.list call (E2E Vercel Prod nextjs-webpack).
- Restore the v3 makeRequest typed-error contract on all v4 endpoints:
409 EntityConflictError, 410 RunExpiredError, 425 TooEarlyError(retryAfter),
429 ThrottleError(retryAfter), else WorkflowWorldError with status/code.
The previous mapping returned plain Errors for 404/410/425, which broke
the hook 404 -> HookNotFoundError translation, terminal-run handling, and
step retry pacing.
- Honor config.dispatcher on v4 calls (was silently ignored).
- Drop the redundant setAuthHeader: getHttpConfig already sets Authorization
and degrades gracefully outside a Vercel OIDC context.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The queue enforces the retry delay, but the backend also persists the timestamp on the step entity for premature-delivery pacing and observability. The v4 split was silently dropping it. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
VaguelySerious
left a comment
There was a problem hiding this comment.
AI review: no blocking issues.
Three correctness regressions found during review were fixed in d4d5663 and 7a798a2 (webpack-incompatible node:stream dynamic import in the frame-read path, loss of the v3 typed-error contract for 404/410/425/retryAfter, and step_retrying.retryAfter being dropped from the wire). Remaining findings below are forward-compatibility notes and nits.
| data as unknown as { eventData?: Record<string, unknown> } | ||
| ).eventData ?? {}) as Record<string, unknown>; | ||
| const payloadField = PAYLOAD_FIELD_BY_EVENT_TYPE[data.eventType]; | ||
| const meta: SplitEventData['meta'] = {}; |
There was a problem hiding this comment.
AI Review: Note
Forward-compat: this allowlist is now the wire contract for eventData. v3 serialized the entire eventData object, so adding a field to an event schema in @workflow/world was automatically transmitted. On v4, any new field must be added here (and accepted by the backend) or it is silently dropped — step_retrying.retryAfter was exactly this failure mode (fixed in 7a798a2), and it produced no type error, no test failure, and no runtime warning.
Suggestion for a follow-up: a contract test that walks the AnyEventRequest union from @workflow/world and asserts every eventData key is either the event type's payload field or covered by this meta split, so the next schema addition fails loudly instead of silently.
There was a problem hiding this comment.
(AI) Documented in 65a7234 — the map now carries a warning comment marking it (plus the meta allowlist) as the eventData wire contract, with step_retrying.retryAfter as the cautionary tale. The walk-the-AnyEventRequest-union contract test is a good idea; deferring to a follow-up since it needs a representative fixture per event type.
| input.payload ?? new Uint8Array(0) | ||
| ); | ||
|
|
||
| const url = `${baseUrl}/v4/runs/${encodeURIComponent(input.runId)}/events`; |
There was a problem hiding this comment.
AI Review: Note
No protocol fallback. The adapter is all-in on the v4 endpoints: if they are ever unavailable (backend rollback or partial deploy), every event operation throws WorkflowWorldError(404) with no v3 degradation path. That is presumably a deliberate trade (skew protection pins deployments to their SDK, and the backend keeps v2/v3 routes for older SDKs), but it makes the v4 route surface a permanent compatibility commitment for every deployment shipped with this version. Worth stating explicitly in the PR description / release notes.
There was a problem hiding this comment.
(AI) Now stated explicitly in the PR description under "Behavioral notes": no v3 degradation path, deployments are pinned to their SDK, and the v4 route surface becomes a permanent compatibility commitment for deployments shipped with this version.
|
|
||
| // For resolveData='none', strip eventData entirely. Reuse the world- | ||
| // side helper so behavior stays in sync with other backends. | ||
| return resolveData === 'none' ? stripEventDataRefs(event, 'none') : event; |
There was a problem hiding this comment.
AI Review: Note
resolveData: 'none' still downloads every payload. The v4 list stream always carries resolved payload bytes in the frame body, and the client strips them here after the transfer. The v3 lazy-ref path returned descriptors only, so metadata-only listings (observability/CLI polling) paid no payload bandwidth. For runs with large step I/O this is a regression in transfer volume on the busiest read path. Consider a follow-up wire-level flag for body-less listing.
There was a problem hiding this comment.
(AI) Acknowledged in the PR description under "Behavioral notes" — accepted trade for eliminating the per-event resolution round-trip. A wire-level body-less listing flag is the right follow-up for metadata-only consumers (observability/CLI polling).
| // back into Date instances — the workflow runtime calls .getTime() on | ||
| // these and would otherwise crash. safeParse: pass the event through | ||
| // unchanged if it doesn't match a known shape (legacy / mid-rollout). | ||
| const parsed = EventSchema.safeParse(raw); |
There was a problem hiding this comment.
AI Review: Nit
safeParse falling back to the raw event is the right call for unknown/future event types, but it also masks parse failures on known types — e.g. a future regression in date coercion would pass through silently here and only surface later as a .getTime() crash inside the runtime. Consider logging (debug-level) when parsing fails for an eventType that is in the known set.
There was a problem hiding this comment.
(AI) Done in 65a7234 — buildEventFromV4 now emits a console.debug breadcrumb when EventSchema.safeParse fails for an eventType that is in the known set, so a coercion regression surfaces at the parse site instead of as a .getTime() crash downstream.
| }; | ||
| data: events, | ||
| cursor: result.next ?? null, | ||
| hasMore: Boolean(result.next), |
There was a problem hiding this comment.
AI Review: Nit
hasMore: Boolean(result.next) — key-paginated backends can return a cursor even when the next page turns out empty, so consumers may see hasMore: true followed by an empty page. v3 returned the server's explicit hasMore. Harmless for the runtime (it loops until the cursor is exhausted), but worth knowing for paginating UI consumers.
There was a problem hiding this comment.
(AI) Keeping as-is for now: the runtime (the dominant consumer of this path) loops until the cursor is exhausted, so a trailing empty page is harmless there. Flagged for paginating UI consumers if one shows up.
TooTallNate
left a comment
There was a problem hiding this comment.
Comment — strong implementation; two behavioral divergences from main to resolve before merge
I read the full diff, built @workflow/world-vercel, and ran the package test suite locally (135 tests pass, including the new frame and v4-client suites). The core of this change — the framed wire protocol, the codec, and the typed-error parity — is excellent, careful work. I have two correctness concerns that I believe are still open (both originally surfaced by the automated review and, as far as I can tell, not yet resolved in code), plus a few smaller notes.
What's very good
- The frame codec (
frames.ts) is robust.refill()/take()handle arbitrary chunk boundaries — splits mid-u32-prefix and mid-CBOR-meta — and the test suite proves it with 1-byte chunking, a 64 KB body across 37-byte chunks, bodies full of0xffbytes that could fool a length-scanning parser, and back-to-back frames in one chunk. Thebuffer.slice()(notsubarray()) at the body yield is correct and commented — the yielded body owns its bytes. - Webpack-safety is handled deliberately. Feeding undici's response body straight into
decodeFramesas anAsyncIterable(rather than going throughReadable.toWeb) is the right call, and there's a dedicated regression test that drives the decoder from an async generator ofBufferchunks, mirroring how undici delivers in production. The comments explaining why not to convert vianode:streamwill save the next person a bad afternoon. - Typed-error parity is exhaustively tested.
throwForErrorResponsepreserves the 409→EntityConflict, 410→RunExpired, 425→TooEarly(+retryAfter), 429→Throttle(+retryAfter), else→WorkflowWorldError(status) contract that the runtime branches on, andevents-v4.test.tsasserts each one plus the non-JSON-body message path. This is the part most likely to cause silent control-flow regressions if it drifted, so I'm glad it's nailed down. - The deletion of
refs.tsis clean — I grepped for every exported symbol (resolveRefDescriptors,hydrateEventRefs,collectPendingRefs,eventDataRefFieldMap,RefWithRunId) and found no remaining references anywhere inpackages/. - The changeset is correctly scoped (
@workflow/world-vercel: minor); the only non-world-vercel file changes in the diff are version-bump churn from a merged release commit and will evaporate on rebase.
Concern 1 (blocking): v1Compat now throws for hook_received / wait_completed on legacy runs
createWorkflowRunEventInner only handles run_created and run_cancelled under v1Compat, and throws for everything else:
throw new Error(
`world-vercel: v1Compat=true is only supported for run_created ` +
`and run_cancelled, not ${data.eventType}`
);On main, the v1Compat branch has a catch-all fallback that POSTs any other event type to the legacy /v1/runs/:id/events endpoint. The runtime still relies on that fallback for legacy (spec-version-1) runs:
packages/core/src/runtime/resume-hook.ts:160callsworld.events.create(hook.runId, { eventType: 'hook_received', … }, { v1Compat })wherev1Compat = isLegacySpecVersion(hook.specVersion).packages/core/src/runtime/runs.ts:197(wakeUpRun) callsworld.events.create(runId, { eventType: 'wait_completed', … }, { v1Compat: compatMode })for legacy runs.
For a legacy run, both now hit the throw instead of the legacy POST that main performs. A webhook delivered to — or a wait completing on — a pre-event-sourcing run would fail. run_cancelled (via cancelRun) is covered, but these two aren't.
Either restore the legacy event POST path for non-lifecycle event types under v1Compat, or update those callers so they don't route legacy hook_received/wait_completed through this adapter. If the position is that spec-version-1 runs can't reach a v5 SDK in practice and this path is intentionally dead, that's a legitimate call — but it should be stated explicitly (and ideally the throw message should say so), because today it's a silent behavioral narrowing vs main.
Concern 2 (should-fix): createWorkflowRunEvent ignores resolveData for the returned event
On main, the create path returns event: stripEventAndLegacyRefs(wireResult.event, resolveData), so a caller passing resolveData: 'none' doesn't get payload fields back. The v4 path returns body.event verbatim:
return {
event: body.event as Event | undefined,
…
};This diverges from the Storage contract and can return input/result/error payload bytes to a caller that explicitly asked for none. In practice the eventsNeedingResolve types read run/step rather than the returned event, so the live blast radius may be zero today — but it's a contract divergence that's cheap to close (thread resolveData through and strip, mirroring buildEventFromV4's resolveData === 'none' handling). At minimum worth a comment if it's deliberately not stripped.
Smaller notes (non-blocking)
resolveData: 'none'still transfers full payloads. The list stream always carries resolved payload bytes in the frame body, and the client strips them after transfer. The previous lazy path returned descriptors only, so metadata-only listings (observability / CLI polling) paid no payload bandwidth. For runs with large step I/O this is a transfer-volume regression — probably an acceptable trade for eliminating the per-event resolution round-trip, but worth acknowledging in the PR description since it's a real change in egress characteristics.safeParsefallback can mask date-coercion regressions on known types. InbuildEventFromV4, falling back to the raw event whenEventSchema.safeParsefails is the right behavior for unknown/future event types, but for a known type a future coercion regression would pass through silently here and only surface downstream as a.getTime()crash in the runtime. A debug-level log when parsing fails for an event type that's in the known set would turn a confusing runtime crash into an obvious breadcrumb.- No protocol fallback if the v4 endpoints are ever unavailable (every op throws
WorkflowWorldError(404)with no degradation path). Presumably deliberate given deployment/SDK pinning, but it does mean a partial backend rollback is not survivable from the SDK side. Flagging for awareness, not asking for a change. - The forward-compat shape of
PAYLOAD_FIELD_BY_EVENT_TYPE/ the meta allowlist is now effectively the wire contract foreventData: any new event field must be added to the split logic (and accepted by the backend) or it's silently dropped.step_retrying.retryAfterwas exactly that failure mode and is fixed here — but it's worth a code comment at the allowlist making explicit that this is now a contract surface, so the next field-adder doesn't rediscover it the hard way.
CI
The red nextjs-webpack jobs (Local Postgres / Local Prod / Local Dev, "stable lazyDiscovery disabled") are pre-existing on main — the latest main Tests run fails the same three jobs with the identical root cause (Dynamic require of "stream" is not supported → Failed to collect page data for /.well-known/workflow/v1/flow during next build). This PR doesn't introduce or widen that: undici is already statically imported into the world-vercel bundle on main via http-client.ts, so adding import { request } from 'undici' to events-v4.ts doesn't change the bundled dependency surface. Not a blocker for this PR, but it does mean nextjs-webpack build E2E can't currently vouch for this change end-to-end — worth getting that baseline green separately.
Bottom line
The protocol implementation and its tests are great. I'd like to see Concern 1 resolved (or explicitly declared dead-path with a clear message) and Concern 2 addressed or consciously waived before this merges, since both are silent divergences from main's behavior on the legacy/resolveData paths. Happy to re-review quickly once those are settled.
…ata on create - v1Compat with hook_received / wait_completed (legacy spec-version-1 runs) again POSTs to the legacy v1 events endpoint instead of throwing — resumeHook and wakeUpRun still rely on this path. - The event returned from createWorkflowRunEvent is stripped per the caller's resolveData, restoring the v3 Storage contract. - Debug-level breadcrumb when EventSchema parsing fails for a known event type (the raw fallback is meant for unknown/future types). - Document that the v4 payload/meta split is the eventData wire contract: new schema fields must be added on both sides. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… truncated list streams
- POSTs now use the aliased route with the event type as the trailing
path segment so it is visible in backend access logs / traces without
decoding the frame; the frame meta stays authoritative.
- LIST responses that end without the {_end: 1} sentinel frame now
throw instead of returning a silently-truncated page with
hasMore=false — the read is idempotent and safe to retry.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
(AI) @TooTallNate thanks for the thorough review — both concerns are addressed in this push:
Smaller notes, same push: Also in 9bc87e3: event POSTs now use the backend's new |
pranaygp
left a comment
There was a problem hiding this comment.
Reviewed the full diff and cross-checked it against the server side (workflow-server origin/main). Verified: the /events/:eventType POST alias and meta.retryAfter parsing are both merged server-side (#504, #505); throwForErrorResponse preserves the full v3 typed-error mapping the runtime branches on; the splitEventDataForV4 meta allowlist covers every field of every user-creatable event schema in @workflow/world (the only uncovered field is hook_conflict.conflictingRunId, which is world-created and only travels the read path, where the frame meta carries full eventData anyway); run/step entities in the POST response carry real Dates over CBOR (electrodb getters convert before encode), so skipping zod there is safe; the truncation sentinel guard and webpack-safe stream reads look right and are well tested; no remaining imports of the deleted refs.ts; unit tests pass locally (141/141).
One medium (latent, not currently reachable) finding about date coercion on the POST response events array, plus two small nits — all inline. Previously-raised items from earlier rounds (hasMore derivation, resolveData:'none' bandwidth, no-fallback commitment, wire-contract allowlist) are addressed or consciously accepted, so I haven't re-raised them.
| cursor: wireResult.cursor, | ||
| hasMore: wireResult.hasMore, | ||
| hook: body.hook as EventResult['hook'], | ||
| events: body.events as EventResult['events'], |
There was a problem hiding this comment.
Medium (latent): body.events — and body.event above — skip the EventSchema date coercion that this PR adds for the GET/LIST path.
On v3 the POST response was parsed through EventResultResolveWireSchema, whose events: z.array(EventSchema) coerced nested eventData dates. Here they're cast verbatim from CBOR. The preloaded-events optimization on run_started is fed server-side from a queryByRunId Dynamo read (workflow-server lib/data/events.ts, the non-resilient-start path), and Dynamo-stored eventData date fields come back as ISO strings — the electrodb eventData attribute's set converts Date → toISOString() and there is no inverse getter. That's exactly the situation buildEventFromV4 runs EventSchema.safeParse for.
It's unreachable today only because the server populates events solely on the first successful start (the alreadyRunning branch deliberately returns no events), and a first-start history contains only run_created/run_started, which have no eventData date fields. But the consumer is preloadedEvents in runtime.ts, which replay reads verbatim — including now >= (e.eventData.resumeAt as Date).getTime() (runtime.ts:803). If the TTFB optimization is ever extended to re-enqueues (the sleep-resume case, where a wait_created is guaranteed present), this becomes a replay crash with no SDK change to flag it.
Suggest mapping each entry of body.events (and body.event) through the same EventSchema.safeParse coercion used in buildEventFromV4, so create-response events and listed events are guaranteed the same shape.
There was a problem hiding this comment.
Done in 349a750 — extracted the EventSchema coercion into coerceEventDates (shared with buildEventFromV4) and mapped both body.event and body.events through it, so the POST response and the GET/LIST path now return identically-coerced events. Test added: a run_started response whose preloaded wait_created event carries ISO-string createdAt/resumeAt comes back with real Dates.
| events: wireResult.events, | ||
| cursor: wireResult.cursor, | ||
| hasMore: wireResult.hasMore, | ||
| hook: body.hook as EventResult['hook'], |
There was a problem hiding this comment.
Nit: body.wait is dropped. The v4 server explicitly includes wait: result.wait in the CBOR response body, and CreateEventV4Result['body'] declares it — but it doesn't make it into the returned EventResult, which does have a wait?: Wait field (for wait_created/wait_completed). v3 had the same gap (its wire schemas had no wait), so this is parity-preserving, but since the v4 response now carries it, threading it through is a one-liner: wait: body.wait as EventResult['wait'],. Fine as a follow-up if you want this PR to stay strictly wire-format-only.
There was a problem hiding this comment.
Done in 349a750 — wait: body.wait as EventResult['wait'] is now threaded through, with a test asserting the wait entity survives a wait_created create.
| config: APIConfig | undefined, | ||
| opName: string | ||
| ): Promise<ListEventsV4Result> { | ||
| const { headers } = await getHttpConfig(config); |
There was a problem hiding this comment.
Nit: each LIST call resolves getHttpConfig twice — once in getWorkflowRunEventsV4 / getEventsByCorrelationIdV4 (for baseUrl) and again here (for headers). Each invocation can hit getVercelOidcToken(). Consider resolving once at the call site and passing { baseUrl, headers } into consumeListFrameStream, which also removes the (theoretical) window where the two calls could disagree.
There was a problem hiding this comment.
Done in 349a750 — consumeListFrameStream now takes the caller's resolved headers, so each LIST call resolves getHttpConfig (and the OIDC token) exactly once.
…le auth resolve on LIST Address review feedback on the v4 events client: - Run the POST response's `event` and preloaded `events` (run_started TTFB optimization) through the same EventSchema date coercion the GET/LIST path uses. These can be read back from DynamoDB server-side, where nested eventData dates (wait_created.resumeAt, step_retrying.retryAfter) are ISO strings — v3 coerced them via its zod wire schemas, and the runtime calls .getTime() on them during replay. Extracted the coercion into coerceEventDates, shared with buildEventFromV4. - Thread `body.wait` through to EventResult.wait — the v4 server includes the wait entity in the CBOR response body. - Resolve getHttpConfig once per LIST call and pass headers into consumeListFrameStream instead of resolving twice (baseUrl at the call site, headers inside). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* origin/main: [core] V2: unify wait+step queue dispatch in suspension processing (#1925) fix(world-local,world-postgres): make duplicate hook_created idempotent (#2295) docs(observability): remove MVP implementation detail bullet (#2367) fix: settle aborted parallel steps before completing abortParallelWorkflow (#2244) Add native v4 workflow attribute events (#2226) Version Packages (beta) (#2326) [ci] Fix flaky windows unit tests (#2359) Capture Vercel runtime logs when e2e Vercel Prod lanes fail (#2356) Fix e2e failure reporting under vitest 4 and preserve fetch error causes (#2355) [core] Fix process crash from rejected waitUntil promises (#2336) [core] Remove duplicate `waitUntil` for suspension handler async operations (#2345) Prevent local tests from hanging (#2338) feat(core): add optional namespace for queue topic prefix (#2305) [web-shared] Show precise durations in the new trace viewer (#2335) Validate unique workflow step IDs at build time (#2018) Move run attributes into their own detail card (#2327) [core] Forward-port stream reconnect to getReadable level (#2318) [docs] Add "Step executed multiple times" error page (#2310) Fix flickering on the detail panel when navigating the trace viewer (#2325)
The merge from main brings native workflow attributes (attr_set events, initial attributes on run_created / resilient run_started). Their eventData fields are structured metadata, so they ride in the v4 frame meta: splitEventDataForV4 now passes through `attributes`, `changes`, `writer`, and `allowReservedAttributes`, and the wire client accepts them on CreateEventV4Input. Without this the v4 split silently dropped them — the exact wire-contract hazard documented on PAYLOAD_FIELD_BY_EVENT_TYPE. Server counterpart: vercel/workflow-server#516 (parses the same fields out of the v4 meta). The attributes e2e tests on Vercel lanes stay red until that deploys. splitEventDataForV4 is now exported for unit tests, which lock in the attribute-field coverage of the meta allowlist. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Status update after taking over the branch:
🤖 Generated with Claude Code |
pranaygp
left a comment
There was a problem hiding this comment.
Approving. CI is fully green (111 pass / 9 skip, including all 12 E2E Vercel Prod lanes), the server-side dependency (workflow-server#516, which parses the native-attribute fields out of the v4 frame meta) is merged and deployed, and every prior review thread is resolved.
Disclosure: I co-authored part of this branch (the POST-response date coercion + shared coerceEventDates, the wait passthrough, single getHttpConfig resolve per LIST, and the native run-attribute wire fields on both SDK and server). Reviewed the full diff including those additions; 146 world-vercel unit tests pass and typecheck is clean locally. Since I'm a contributor, a second independent pass is still welcome, but I see nothing blocking.
Remove specific backend storage technology names (S3, the backing store) and internal server file paths from the v4 event-path comments and error messages, keeping only the general wire contract. Open-source code should describe behavior, not Vercel backend internals. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The v4 wire split routes each eventData field to either the frame body (payload) or the frame meta. Previously a field added to a @workflow/world event schema that wasn't routed here was silently dropped on the wire with no type error, test failure, or warning (step_retrying.retryAfter hit this). Add a compile-time exhaustiveness guard: EventDataField is derived from the CreateEventSchema discriminated union (via AnyEventRequest), and assertEventDataWireContractExhaustive fails the build — naming the field — if any schema field is routed to neither the payload map nor the meta allowlist, or if an allowlisted meta field no longer exists in the schema. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-authored-by: Peter Wielander <mittgfu@gmail.com> Signed-off-by: Peter Wielander <mittgfu@gmail.com>
|
Mostly polished comments, and added some more typeguards so that we don't accidentally forget to add a field to the v4 wire format |
|
Backport PR opened against |
Switches the
world-verceladapter's event endpoints from v2/v3 to v4Changes
The adapter's
createWorkflowRunEvent/getEvent/getWorkflowRunEventskeep their public signatures and theEventResult/Event/PaginatedResponse<Event>shapes the workflow runtime consumes. What changes is what's on the wire under those calls:[u32_be meta_len][cbor meta][u32_be body_len][bytes]. The CBOR meta block carries structured event metadata (eventType, deploymentId, executionContext, …); the body is the opaque user payload. POSTs target the/v4/runs/:runId/events/:eventTypealias — the trailing segment is an observability hint (access logs / traces) and the frame meta stays authoritative.dehydrateRunInput/dehydrateStepReturnValue/ etc. before invokingevents.create, and the bytes pass through unchanged on both write and read paths — no double-CBOR-wrap.EventResult. For event types the runtime reads immediately (run_created,run_started,step_started), the server'sremoteRefBehavior=resolvelands the materialized entity in the same response so the runtime doesn't need a follow-upruns.get. The returnedeventis stripped per the caller'sresolveData, same as v3.application/vnd.workflow.v4-frames). One frame per event with CBOR metadata + raw payload bytes inline, followed by a sentinel{_end:1, next?:cursor}frame. The per-event/refsround-trip used by the v3 client is gone. A stream that ends without the sentinel throws (truncation guard) — the read is idempotent and safe to retry.createdAt/resumeAt/retryAfterround-trip via cbor-x's native Date tag, but events read back from the backing store arrive as ISO strings. The SDK runs each event throughEventSchema.safeParseafter building it from a frame so per-event-typez.coerce.date()lifts the strings back intoDateinstances — the runtime calls.getTime()on these and would otherwise crash. A parse failure on a known event type leaves a debug-level breadcrumb instead of failing silently.Behavioral notes
resolveData: 'none'listings still transfer payload bytes. The v4 list stream always carries resolved payload bytes in the frame body; the client strips them after transfer. The old lazy-refs path returned descriptors only, so metadata-only listings paid no payload bandwidth. Accepted trade for eliminating the per-event resolution round-trip; a wire-level body-less listing flag is a possible follow-up for metadata-only consumers.eventDatawire contract. v3 serialized the wholeeventDataobject; on v4, a new field added to an event schema must also be added to the client's split logic and accepted by the backend's parser, or it is silently dropped. Documented with a warning comment at the map.What goes away
packages/world-vercel/src/refs.ts— deleted. The/refshydration path is no longer used.hydrateEventRefs/collectPendingRefs/eventDataRefFieldMapand the wire schemas (EventResultResolveWireSchema,EventResultLazyWireSchema,EventWithRefsSchema) — deleted fromevents.ts.createWorkflowRunEvent— the server still respectsremoteRefBehavior(passed in the frame meta foreventsNeedingResolvetypes) and bakes the resolution decision into its CBOR response, so the SDK has nothing to do.What stays
v1Compatpath increateWorkflowRunEvent— still uses/v1endpoints for legacy (pre-event-sourcing) runs, including the catch-all legacy POST forhook_received(resumeHook) andwait_completed(wakeUpRun). v4 doesn't cover these.validateUlidTimestamponrun_created, theHookNotFoundErrortranslation on hook 404s, and thestripEventDataRefspath forresolveData='none'.events-v4.tsis an internal helper module — not re-exported from the package's public API.Test plan
packages/world-vercel/src/frames.test.ts)events-v4.test.ts)v1Compatlegacy fallback,resolveDatastripping on create (events.test.ts)resilient start: addTenWorkflow completes when run_created returns 500)