Skip to content

basic CUDA <> CPU or CUDA <> CUDA rdma Support #372

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

dstaay-fb
Copy link
Contributor

Summary:
RDMA support for CUDA <> CUDA and CUDA <> CPU comms

Key changes

  • using cuda apis we can detect if a given pointer is mapped to a cuda device, or cpu.
  • if data pointer is cuda, the code leverages dma registration to register with NIC; we are able to avoid directly passing with cuda allocation handles using cuMemGetHandleForAddressRange.
  • if data pointer is cpu, we use standard ibv mr; note I transitioned to using standard registration, not entire memory space (security concern raised by mariusae)
  • Refactored test infra to support named NIC devices, and different compute (cuda:X or cpu)

This implementation is relatively naive, and I will iterate accordingly.

To Do: add unit test for cuda/cuda

Differential Revision: D77408653

Summary: exposes basic cuda bindings to monarch for rdma support

Differential Revision: D77404103
Summary: expose rdmacore bindings; including basic ibv verbs along with mlx5dv prodivers to monarch for rdma.

Differential Revision: D77408652
Summary:
RDMA support for CUDA <> CUDA and CUDA <> CPU comms

Key changes
- using cuda apis we can detect if a given pointer is mapped to a cuda device, or cpu.
- if data pointer is cuda, the code leverages dma registration to register with NIC; we are able to avoid directly passing with cuda allocation handles using cuMemGetHandleForAddressRange.
- if data pointer is cpu, we use standard ibv mr; note I transitioned to using standard registration, not entire memory space (security concern raised by mariusae)
- Refactored test infra to support named NIC devices, and different compute (cuda:X or cpu)

This implementation is relatively naive, and I will iterate accordingly.

To Do: add unit test for cuda/cuda

Differential Revision: D77408653
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 27, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D77408653

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants