Skip to content

[cuda.core] Add support for host_launch (host callback nodes / host function launches) #2058

@rparolin

Description

@rparolin

Feature Request

Add a host_launch (or equivalent) API to cuda.core that allows scheduling
Python callables (or C function pointers) to execute on the host as part of a
stream's work order. This is the cuLaunchHostFunc / cudaLaunchHostFunc
path (and its graph-node counterpart, host nodes via cuGraphAddHostNode).

Motivation

cuda.core currently exposes launch(...) for device kernels but has no
symmetric primitive for host work. This makes it impossible to express
mixed host/device work ordering in pure cuda.core terms — users must drop
to cuda.bindings for cuLaunchHostFunc, which breaks the cuda.core
abstraction boundary (streams, events, graphs).

Common use cases:

  • Logging / progress callbacks ordered against GPU work without host-side
    stream synchronization.
  • Triggering Python-side state transitions (e.g. buffer release, metric
    updates) at a specific point in a stream.
  • Host nodes in CUDA graphs for workflows that need host-side compute or
    notification steps between kernels.

Proposed Scope

  • A top-level host_launch(stream, fn, *args, **kwargs) (or
    stream.launch_host(fn, ...)) that wraps cuLaunchHostFunc.
  • A corresponding graph node type (HostNode) added to
    cuda.core.graph._subclasses, alongside the existing EmptyNode,
    MemcpyNode, etc.
  • Clear documentation of the callback threading / reentrancy restrictions
    imposed by the CUDA driver (host functions run on an internal driver
    thread; must not call any CUDA API).
  • An example under cuda_core/examples/ demonstrating a host callback
    ordered between two kernels.
  • API reference entries in cuda_core/docs/source/api.rst.

Related

  • Driver API: cuLaunchHostFunc, cuGraphAddHostNode
  • Runtime API: cudaLaunchHostFunc
  • Part of cuda.core feature audit gap list (Nov 2025).

Metadata

Metadata

Assignees

No one assigned

    Labels

    cuda.coreEverything related to the cuda.core modulefeatureNew feature or requesttriageNeeds the team's attention

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions