Idempotency follow-ups: atomic keyed start(), post-completion dedupe, and attribute-based run lookup

Follow-up to the hook-based run idempotency work in #2015, #2373, and #2011. Those PRs ship the in-flight story: deterministic hook tokens as run idempotency keys, `hook.getConflict()` resolving with the conflicting `Run`, and code-driven conflict-handling strategies (reject, adopt result, inspect, signal via `resumeHook()`, supersede via `cancel()`). This issue tracks the structural gaps that remain before the idempotency story is rock-solid.

## Where we stand vs. comparable frameworks

Idempotency is a lifecycle; each framework answers four questions:

| Phase | Temporal | Inngest | DBOS | **Workflow (today)** |
|---|---|---|---|---|
| **Admission** (dedupe atomic with start?) | Yes — workflow ID enforced server-side at start | Yes — `idempotency` key dedupes scheduling | Yes — workflow ID is effectively a DB primary key | **No** — `start()` always creates a run; the claim happens inside the run body |
| **In-flight conflict policy** | Enum: Fail / UseExisting / TerminateExisting | None (first wins) | Implicit UseExisting | **Code** — `getConflict()` hands the duplicate the owner's `Run` |
| **Post-completion memory** | RejectDuplicate, bounded by namespace retention | Fixed 24h TTL per key | Durable record | **None** — hook released at terminal state |
| **Result reuse** | No (rejects; caller queries the closed run) | Yes, within window | Yes — same ID returns stored result | Only while the owner is running (`conflict.returnValue`) |

**Where we're ahead:** in-flight expressiveness. Policy-as-code beats a static enum ("inspect the owner's status, then decide" or "forward this request's payload to the owner" are inexpressible as configuration). And because the duplicate run is itself durable, the conflict-handling logic gets retries, replay, and observability — in admission-time systems that logic lives in a crashable client.

**Where we're behind:** admission atomicity and post-completion memory. Notably, no framework offers *unbounded* memory — Temporal's reject-duplicate is retention-bounded, Inngest is 24h — so the target is a retention window, not forever.

## Gaps

1. **Admission isn't atomic (root cause of the rest).** The duplicate run is created, billed, queued, and executed before discovering it's a duplicate, and routes need the resume-with-retry dance to bridge the `start()` → hook-registration window. The docs already note a native atomic start-and-hook-registration API is planned.
2. **No post-completion dedupe.** The hook is a lease and the lease dies with the run. A retried request arriving seconds after the owner completes starts fresh duplicate-sensitive work. (Temporal: retention-bounded reject; DBOS: returns the stored result.)
3. **Attributes are writable but not queryable.** Runs can set (`experimental_setAttributes`) and even seed (`CreateWorkflowRunParams.attributes`) attributes, but `ListWorkflowRunsParams` only filters by `workflowName`/`status` — nothing can *find* a run by attribute, so attributes can't yet serve as post-completion memory.
4. **Optimistic-strategy races (acceptable, but inherent).** Supersede (cancel-and-reclaim) can lose the reclaim to a third arrival (ABA-shaped; the documented retry loop handles it); signal-the-owner can hit `HookNotFoundError` if the owner completes mid-forward.
5. **Flat token namespace.** `order:123` collides across unrelated workflows sharing a key scheme. Same property as Temporal; a documented prefix convention is probably sufficient.

## Recommendations (in order)

1. **P0 — atomic keyed `start()`** with `{ run, created }` return semantics and a retention-bounded uniqueness window. Closes gaps 1 and 2 at once and yields DBOS-style result reuse (`created === false` + completed → `run.returnValue`) while keeping policies in code: no policy enum — the caller inspects the existing run and decides, which subsumes both of Temporal's knobs (conflict policy + reuse policy).
2. **P1 — `runs.list` attribute filtering in the World contract**, then document the attribute pattern as the post-completion bridge: hook claim = in-flight mutex; attribute (`idempotency: <key>`) = retention-bounded memory; a duplicate that wins the token after the owner finished queries completed runs by attribute in a step and adopts the prior result via `getRun(prior.runId).returnValue`. Must be documented as **advisory** (the query and subsequent work aren't atomic — residual race in the just-completed window) and retention-bounded. Stays useful after keyed start lands, for richer queries.
3. **Document the two patterns that work today** so users aren't stranded: the *entity pattern* (a long-lived run per key looping `for await` on its hook — strict serialization at the cost of one perpetual run per key and deployment pinning) and the *app-record pattern* (store `runId` under the domain key in your own DB inside a step; replays resolve via `getRun`).
4. **Non-goal:** a standalone lease/TTL primitive or a `dispose: false` hook option. A claim that outlives its run is a leak generator; keyed start with a retention window does the same job with better semantics.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idempotency follow-ups: atomic keyed start(), post-completion dedupe, and attribute-based run lookup #2376

Where we stand vs. comparable frameworks

Gaps

Recommendations (in order)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Phase	Temporal	Inngest	DBOS	Workflow (today)
Admission (dedupe atomic with start?)	Yes — workflow ID enforced server-side at start	Yes — `idempotency` key dedupes scheduling	Yes — workflow ID is effectively a DB primary key	No — `start()` always creates a run; the claim happens inside the run body
In-flight conflict policy	Enum: Fail / UseExisting / TerminateExisting	None (first wins)	Implicit UseExisting	Code — `getConflict()` hands the duplicate the owner's `Run`
Post-completion memory	RejectDuplicate, bounded by namespace retention	Fixed 24h TTL per key	Durable record	None — hook released at terminal state
Result reuse	No (rejects; caller queries the closed run)	Yes, within window	Yes — same ID returns stored result	Only while the owner is running (`conflict.returnValue`)

Idempotency follow-ups: atomic keyed start(), post-completion dedupe, and attribute-based run lookup #2376

Description

Where we stand vs. comparable frameworks

Gaps

Recommendations (in order)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions