Skip to content

fix(parallelism): salt sub-app IDs with sequence_id to prevent stale replay (#761)#778

Open
elijahbenizzy wants to merge 3 commits into
mainfrom
fix/761-mapstates-stale-replay-on-resume
Open

fix(parallelism): salt sub-app IDs with sequence_id to prevent stale replay (#761)#778
elijahbenizzy wants to merge 3 commits into
mainfrom
fix/761-mapstates-stale-replay-on-resume

Conversation

@elijahbenizzy
Copy link
Copy Markdown
Contributor

Fixes #761

Root cause

TaskBasedParallelAction (and subclasses MapStates, MapActions, MapActionsAndStates) generated sub-application IDs that were deterministic in (parent_app_id, i, j) only -- no per-invocation discriminator. When the parent application was built with initialize_from(...), the cascaded state initializer would, on the second invocation of the same parallel action, see a sub-app ID matching the first invocation's persisted state and silently hydrate it instead of running the action.

Result: every invocation after the first replayed the first invocation's outputs. Silent wrong results.

Fresh runs (no initialize_from) were unaffected because the cascaded initializer was None, so sub-apps never tried to load existing state even though their IDs still collided.

Fix

In TaskBasedParallelAction.run_and_update, salt each task's application_id with the parent context's sequence_id. The parent's sequence_id increments on every action step, so it uniquely identifies which invocation of the parent action we are in. Applied via a small _salt_task_app_id helper in both sync and async generator paths, so every subclass inherits the fix automatically without overriding tasks().

This matches the workaround pattern the issue reporter (hippalectryon-0) verified.

Regression test

tests/core/test_parallelism.py::test_map_states_reexecutes_on_repeated_invocations_with_initializer -- builds an app with initialize_from(...) and a MapStates action, invokes it three times in a row using a counter inside the sub-action, and asserts the outputs differ across invocations and the counter advanced as expected. Confirmed it fails on main and passes with the fix.

Test plan

  • New regression test passes with the fix
  • New regression test fails on main (without the fix), confirming it actually catches the bug
  • All 19 existing tests in tests/core/test_parallelism.py still pass

Backwards-compat note for release notes

This changes the sub-application IDs produced by every TaskBasedParallelAction subclass. Anyone who has persisted sub-app state under the old (deterministic-without-salt) IDs and depended on resuming from those exact IDs will see new IDs after upgrading. Worth flagging in the release notes for the next release.

…replay (#761)

Sub-application IDs in `TaskBasedParallelAction` (and its subclasses
`MapStates`, `MapActions`, `MapActionsAndStates`) were deterministic in
`(parent_app_id, i, j)` only, with no per-invocation discriminator. When
the parent application was built with `initialize_from(...)`, the
cascaded state initializer would, on the second invocation of the same
parallel action, see a sub-app ID matching the first invocation's
persisted state and silently hydrate it instead of re-running the
action. The result: every invocation after the first replayed the first
invocation's outputs.

Fresh runs (no `initialize_from`) were unaffected because the cascaded
initializer was None, so sub-apps never tried to load existing state
even though their IDs still collided.

Fix: in `TaskBasedParallelAction.run_and_update`, salt each task's
`application_id` with the parent context's `sequence_id` (which
increments per parent action step, so it uniquely identifies a given
invocation). Applied via a small `_salt_task_app_id` helper in both the
sync and async generator paths so all subclasses inherit the fix
without needing to override `tasks()`.

Also adds a regression test that exercises the full failure mode:
builds an app with `initialize_from(...)` and a `MapStates`, invokes it
three times, and asserts the per-invocation outputs differ.

Note: this changes sub-app IDs across versions for any
`TaskBasedParallelAction` subclass; persisted sub-app state from the
old scheme will not be matched by the new IDs.
@github-actions github-actions Bot added area/core Application, State, Graph, Actions area/streaming Streaming actions, parallel streams labels May 14, 2026
@elijahbenizzy
Copy link
Copy Markdown
Contributor Author

For 0.43.0 release notes — breaking change to surface:

The sub-application ID scheme for TaskBasedParallelAction (MapStates, MapActions, MapActionsAndStates) now incorporates the parent's sequence_id. Persisted sub-app state from earlier versions will be orphaned on resume — the sub-actions will re-execute fresh rather than load stale results. This is the fix for #761; the old scheme would have silently returned stale data instead.

Also documented inline in the _salt_task_app_id docstring (commit b234025) so anyone reading the code finds the rationale + version-impact note in one place.

Practical impact summary:

  • Fresh runs on the new version: no change in behavior
  • New runs on the new version that get persisted, then resumed: works normally with new IDs
  • Resumes of mid-flight pre-upgrade state: parallel sub-actions re-execute on first resume (one-time cost), then proceed normally
  • No data loss, no silent wrong answers (which is what the bug was)

Comment thread burr/core/parallelism.py
# Salt sub-app IDs with the parent sequence_id so repeated invocations
# don't collide and silently replay prior persisted state (#761).
task_generator = (
_salt_task_app_id(task, context.sequence_id) for task in task_generator
Copy link
Copy Markdown
Contributor

@skrawcz skrawcz May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to enable configuring this? Is there a use case where you want the prior behavior?

@skrawcz
Copy link
Copy Markdown
Contributor

skrawcz commented May 15, 2026

Looks good. Why test failures though?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/core Application, State, Graph, Actions area/streaming Streaming actions, parallel streams

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MapStates replays stale sub-state on repeated invocations when parent has a cascading state_initializer

2 participants