fix(parallelism): salt sub-app IDs with sequence_id to prevent stale replay (#761)#778
fix(parallelism): salt sub-app IDs with sequence_id to prevent stale replay (#761)#778elijahbenizzy wants to merge 3 commits into
Conversation
…replay (#761) Sub-application IDs in `TaskBasedParallelAction` (and its subclasses `MapStates`, `MapActions`, `MapActionsAndStates`) were deterministic in `(parent_app_id, i, j)` only, with no per-invocation discriminator. When the parent application was built with `initialize_from(...)`, the cascaded state initializer would, on the second invocation of the same parallel action, see a sub-app ID matching the first invocation's persisted state and silently hydrate it instead of re-running the action. The result: every invocation after the first replayed the first invocation's outputs. Fresh runs (no `initialize_from`) were unaffected because the cascaded initializer was None, so sub-apps never tried to load existing state even though their IDs still collided. Fix: in `TaskBasedParallelAction.run_and_update`, salt each task's `application_id` with the parent context's `sequence_id` (which increments per parent action step, so it uniquely identifies a given invocation). Applied via a small `_salt_task_app_id` helper in both the sync and async generator paths so all subclasses inherit the fix without needing to override `tasks()`. Also adds a regression test that exercises the full failure mode: builds an app with `initialize_from(...)` and a `MapStates`, invokes it three times, and asserts the per-invocation outputs differ. Note: this changes sub-app IDs across versions for any `TaskBasedParallelAction` subclass; persisted sub-app state from the old scheme will not be matched by the new IDs.
|
For 0.43.0 release notes — breaking change to surface:
Also documented inline in the Practical impact summary:
|
| # Salt sub-app IDs with the parent sequence_id so repeated invocations | ||
| # don't collide and silently replay prior persisted state (#761). | ||
| task_generator = ( | ||
| _salt_task_app_id(task, context.sequence_id) for task in task_generator |
There was a problem hiding this comment.
Do we want to enable configuring this? Is there a use case where you want the prior behavior?
|
Looks good. Why test failures though? |
Fixes #761
Root cause
TaskBasedParallelAction(and subclassesMapStates,MapActions,MapActionsAndStates) generated sub-application IDs that were deterministic in(parent_app_id, i, j)only -- no per-invocation discriminator. When the parent application was built withinitialize_from(...), the cascaded state initializer would, on the second invocation of the same parallel action, see a sub-app ID matching the first invocation's persisted state and silently hydrate it instead of running the action.Result: every invocation after the first replayed the first invocation's outputs. Silent wrong results.
Fresh runs (no
initialize_from) were unaffected because the cascaded initializer wasNone, so sub-apps never tried to load existing state even though their IDs still collided.Fix
In
TaskBasedParallelAction.run_and_update, salt each task'sapplication_idwith the parent context'ssequence_id. The parent'ssequence_idincrements on every action step, so it uniquely identifies which invocation of the parent action we are in. Applied via a small_salt_task_app_idhelper in both sync and async generator paths, so every subclass inherits the fix automatically without overridingtasks().This matches the workaround pattern the issue reporter (hippalectryon-0) verified.
Regression test
tests/core/test_parallelism.py::test_map_states_reexecutes_on_repeated_invocations_with_initializer-- builds an app withinitialize_from(...)and aMapStatesaction, invokes it three times in a row using a counter inside the sub-action, and asserts the outputs differ across invocations and the counter advanced as expected. Confirmed it fails onmainand passes with the fix.Test plan
main(without the fix), confirming it actually catches the bugtests/core/test_parallelism.pystill passBackwards-compat note for release notes
This changes the sub-application IDs produced by every
TaskBasedParallelActionsubclass. Anyone who has persisted sub-app state under the old (deterministic-without-salt) IDs and depended on resuming from those exact IDs will see new IDs after upgrading. Worth flagging in the release notes for the next release.