-
Notifications
You must be signed in to change notification settings - Fork 199
Description
Today, LaunchConfig
only supports cuLaunchKernel
driver API to launch kernels on a single GPU. When extending to broader usecases where there is a need for inter-SM synchronization or multi-GPU synchronization, one would need to use cuLaunchCooperativeKernel
to launch kernels safely in a deadlock-free manner. To support this, one could extend LaunchConfig(..., launch_attr=None)
with an optional launch_attr
that could set equivalent cuda-python data-type for CUlaunchAttribute
.
Background:
This issue came out of discussion: NVIDIA/numba-cuda#128 (comment) where existing implementation of cuda driver bindings in numba-cuda
uses cuLaunchCooperativeKernel
or cuLaunchKernel
based on the existence of grid.sync()
in the kernel and in the effort to migrate it to cuda.core
, one would need to provide the capability to select launch kernel API variant at runtime based on the LaunchConfig
.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status