Skip to content

execution: fix split scheduler preservation for late subscribers (P2300)#7165

Open
arpittkhandelwal wants to merge 2 commits intoTheHPXProject:masterfrom
arpittkhandelwal:fix/split-scheduler-preservation
Open

execution: fix split scheduler preservation for late subscribers (P2300)#7165
arpittkhandelwal wants to merge 2 commits intoTheHPXProject:masterfrom
arpittkhandelwal:fix/split-scheduler-preservation

Conversation

@arpittkhandelwal
Copy link
Copy Markdown
Contributor

Problem Statement

This PR addresses a P2300 correctness violation in the split algorithm. Previously, the split sender failed to preserve the associated scheduler when a receiver connected after the predecessor had already completed (the "late subscriber" scenario). In these cases, the completion signal was fired inline, bypassing the execution context guaranteed by the sender's completion scheduler.

Proposed Changes

  • Virtualized Completion Hooks: Introduced a virtual void schedule_completion(continuation_type&&) method to the shared_state base class. This allows subclasses to reroute completion signals through the appropriate execution context.

  • Scheduler-Aware shared_state: Implemented shared_state_scheduler, a new subclass that captures the attached scheduler. Overrode schedule_completion to dispatch stored continuations via hpx::execution::experimental::schedule(sched).

  • Safe Asynchronous Management: Implemented a self-owning schedule_op_holder to manage the lifetime of the schedule() operation state.

  • Memory Safety: Adopted the standard HPX allocator pattern, rebinding the shared_state allocator to handle internal task metadata without raw new/delete.

  • Race Prevention: Added an intrusive_ptr owner guard before calling start() to prevent use-after-free if a scheduler executes synchronously.

  • CPO & Dispatch Refactoring: Updated split_t overloads to support generic schedulers, enabling both automatic scheduler discovery and explicit injection.

  • Cleaned up constructor SFINAE in split_sender to handle no_scheduler, run_loop_scheduler, and generic Scheduler types without ambiguity.

Verification Results
New Test Suite: Added algorithm_split_scheduler.cpp which specifically targets the late-subscriber race condition.
Regression Testing: Verified that the legacy no_scheduler and run_loop paths remain unaffected.
Performance: Used HPX_NO_UNIQUE_ADDRESS and intrusive pointers to keep metadata overhead at an absolute minimum.

@codacy-production
Copy link
Copy Markdown

codacy-production bot commented Apr 11, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

TIP This summary will be updated as you push new changes. Give us feedback

When split(scheduler, sender) is used and the predecessor has already
completed (predecessor_done == true) by the time add_continuation() is
called by a new subscriber, the downstream completion signal was fired
inline on whatever thread called add_continuation(). This violated the
P2300 get_completion_scheduler<CPO> contract, which requires that
completions on the set_value and set_stopped signals are dispatched by
the scheduler passed to split.

The code itself acknowledged this with a TODO comment:
  // TODO: Should this preserve the scheduler? It does not
  // if we call set_* inline.

Fix:

* Add a virtual schedule_completion(continuation_type&&) method to
  shared_state with a default implementation that fires inline (preserving
  existing behaviour for the scheduler-free case).

* Replace the two "fire inline" paths inside add_continuation (the
  predecessor_done fast-path and the lock-then-done path) with calls to
  schedule_completion, so all completion dispatch goes through a single
  overridable hook.

* Add shared_state_scheduler<Sched> — a new subclass of shared_state —
  that overrides schedule_completion to post the continuation through
  schedule(sched). The operation state is kept alive via a self-owning
  intrusive_ptr-based holder (mirroring the pattern in start_detached.hpp),
  so the async lifetime is correct regardless of how quickly the thread
  pool processes the work item.

* Add a second constructor overload to split_sender for generic
  (non-run_loop) schedulers that allocates shared_state_scheduler instead
  of plain shared_state.

* Add algorithm_split_scheduler unit test that covers:
  - Basic split with no scheduler (regression guard)
  - split with thread_pool_scheduler: late subscriber receives value on
    the pool, not inline
  - Multiple concurrent late subscribers all receive the value
  - ensure_started (eager submission) is unaffected

No behavioural change for the scheduler-free split or the run_loop split;
only the generic-scheduler path gains the new subclass.

Signed-off-by: arpittkhandelwal <arpitkhandelwal810@gmail.com>
@arpittkhandelwal arpittkhandelwal force-pushed the fix/split-scheduler-preservation branch from 5d7786a to 4afbe1b Compare April 11, 2026 12:28
@StellarBot
Copy link
Copy Markdown

Performance test report

HPX Performance

Comparison

BENCHMARKFORK_JOIN_EXECUTORPARALLEL_EXECUTORSCHEDULER_EXECUTOR
For Each(=)(=)(=)

Info

PropertyBeforeAfter
HPX Datetime2026-03-09T14:08:29+00:002026-04-11T12:55:48+00:00
HPX Commit0eeca869deb045
Envfile
Compiler/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8
Datetime2026-03-09T09:15:24.034803-05:002026-04-11T08:05:19.438544-05:00
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Clusternamerostamrostam

Comparison

BENCHMARKNO-EXECUTOR
Future Overhead - Create Thread Hierarchical - Latch++

Info

PropertyBeforeAfter
HPX Datetime2026-03-09T14:08:29+00:002026-04-11T12:55:48+00:00
HPX Commit0eeca869deb045
Envfile
Compiler/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8
Datetime2026-03-09T09:17:15.638328-05:002026-04-11T08:07:02.860839-05:00
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Clusternamerostamrostam

Comparison

BENCHMARKFORK_JOIN_EXECUTOR_DEFAULT_FORK_JOIN_POLICY_ALLOCATORPARALLEL_EXECUTOR_DEFAULT_PARALLEL_POLICY_ALLOCATORSCHEDULER_EXECUTOR_DEFAULT_SCHEDULER_EXECUTOR_ALLOCATOR
Stream Benchmark - Add(=)---
Stream Benchmark - Scale(=)----
Stream Benchmark - Triad(=)---
Stream Benchmark - Copy=+++++

Info

PropertyBeforeAfter
HPX Datetime2026-03-09T18:50:37+00:002026-04-11T12:55:48+00:00
HPX Commitba89f5d9deb045
Envfile
Compiler/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8
Datetime2026-03-09T17:49:10.837937-05:002026-04-11T08:07:23.108751-05:00
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Clusternamerostamrostam

Explanation of Symbols

SymbolMEANING
=No performance change (confidence interval within ±1%)
(=)Probably no performance change (confidence interval within ±2%)
(+)/(-)Very small performance improvement/degradation (≤1%)
+/-Small performance improvement/degradation (≤5%)
++/--Large performance improvement/degradation (≤10%)
+++/---Very large performance improvement/degradation (>10%)
?Probably no change, but quite large uncertainty (confidence interval with ±5%)
??Unclear result, very large uncertainty (±10%)
???Something unexpected…

@hkaiser hkaiser added category: senders/receivers Implementations of the p0443r14 / p2300 + p1897 proposals type: enhancement type: compatibility issue labels Apr 11, 2026
@hkaiser
Copy link
Copy Markdown
Contributor

hkaiser commented Apr 11, 2026

@arpittkhandelwal What's the difference to #6911?

@arpittkhandelwal
Copy link
Copy Markdown
Contributor Author

@arpittkhandelwal What's the difference to #6911?

I think the main difference between this PR and #6911 is all about where the 'source of truth' for the scheduler lives. PR #6911 takes a receiver-centric approach—it essentially looks downstream at the moment of connection to see if the receiver has a preferred scheduler. This is a nice safety net, but it doesn't quite address the core problem: the split(scheduler, sender) call itself is currently ignoring the scheduler we specifically gave it whenever a subscriber arrives late.
In this PR, I’ve gone with a sender-centric approach to make sure we actually follow the algorithm's contract. By virtualizing the completion dispatch through a schedule_completion hook in the shared state, we get a few big wins:

Better Compliance: The split_sender now strictly follows the P2300 contract by completing on the provided scheduler, even if the receiver doesn't have an environment or a specific scheduler of its own.
Proper Discovery: It allows the sender to correctly report its scheduler through get_completion_scheduler.
Efficiency: Instead of using start_detached for every late subscriber (which adds extra allocations), we use a specialized shared_state_scheduler. This fits much better into the existing HPX shared-state patterns and keeps things efficient.
Basically, while #6911 is a good generic fix, this implementation feels like the architectural step needed to make HPX's split truly P2300 compliant.

@hkaiser
Copy link
Copy Markdown
Contributor

hkaiser commented Apr 11, 2026

@isidorostsa could you have a look, please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: senders/receivers Implementations of the p0443r14 / p2300 + p1897 proposals type: compatibility issue type: enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants