Skip to content

AMD/ROCm integration for DGL#7922

Open
mukh1l wants to merge 98 commits into
dmlc:masterfrom
ROCm-LS:amd-integration
Open

AMD/ROCm integration for DGL#7922
mukh1l wants to merge 98 commits into
dmlc:masterfrom
ROCm-LS:amd-integration

Conversation

@mukh1l

@mukh1l mukh1l commented May 20, 2026

Copy link
Copy Markdown

Summary

This pull request contributes the AMD/ROCm integration work from the ROCm/dgl fork back to upstream DGL.

  • Adds ROCm/HIP build and runtime support across core libraries, GraphBolt, and related tooling
  • Updates CMake, build scripts, and dependency installation for AMD GPU targets
  • Includes documentation updates (README, install instructions, Docker references)
  • 98 commits ahead of dmlc/dgl master (~197 files changed)

Source

Branch amd-integration on ROCm/dgl, created from develop.

Test plan

  • Build DGL with ROCm/HIP enabled on supported AMD hardware
  • Run core Python unit tests on ROCm
  • Verify GraphBolt build and sampling paths on ROCm
  • Confirm CUDA builds remain unaffected
  • Review CI/build script changes for Linux ROCm environments

Made with Cursor

awelling2801 and others added 30 commits July 9, 2025 14:21
Co-authored-by: mukh1l <mmallaiy@amd.com>
Co-authored-by: awelling2801 <anuya.welling@amd.com>
Co-authored-by: Tiffany Mintz <Tiffany.Mintz@amd.com>
Co-authored-by: Vicky Tsang <vicky.tsang@amd.com>
Co-authored-by: Radha Krishna Srimanthula <radha.srimanthula@amd.com>
Co-authored-by: mukh1l <mmallaiy@amd.com>
Co-authored-by: awelling2801 <anuya.welling@amd.com>
Co-authored-by: Tiffany Mintz <Tiffany.Mintz@amd.com>
Co-authored-by: Vicky Tsang <vicky.tsang@amd.com>
Co-authored-by: Radha Krishna Srimanthula <radha.srimanthula@amd.com>
This commit also adds an include that was missing during some builds
and causing failures.
…cess. Currently crashing on missing <cuda_fp16.h>
Turning on more tests, mostly in graphbolt/impl. Still only one failure
(other than the missing torch_geometric dependency) so far. It's a
failure in test_hetero_cached_feature where the miss rate is slightly
too high. Still have to turn on the dataloader tests.
The hipification of HugeCTR was in progress on the ROCm/dgl default
branch, but was missing a few small tweaks. Especially the
initialization of the warp_position in nv_gpu_cache.cu.
Pulling the changes directly from the prehipified source in the nod-ai
branch and then making minor tweaks.
Removed skip macros for
- tests/python/common/cuda/test_gpu_cache.py
- tests/python/pytorch/cuda/test_nccl.py
- Fixes the test failure in test_hetero_cached_feature.py caused by
  rocm's constraints on atomics (i.e. it doesn't support 8bit atomics)
- Cleaning up comments and unused files from self review
This PR consolidates the graphbolt/CMakeLists.tx as much as possible so
that CUDA and ROCm configurations do not have redundant CMake variables.

It also switched the configurate of ROCm archs to the more standard
CMAKE_HIP_ARCHITECTURES.
jamesETsmith and others added 29 commits November 12, 2025 10:56
There are some challenges with the new hipco installation. For example,
libhipcxx cmake config get installed in /opt/rocm/lib/rapids/cmake
instead of the /opt/rocm/lib/cmake.
Co-authored-by: Geoffrey Martin-Noble <GMNGeoffrey@gmail.com>
[Tests] Unskip tests that work on rocm
Allow setting CC and CXX for sub-project builds
[DOC] Update with readme with pypi links
[Feature] Bump hipCollections version
[Feature] Updating dgl ROCm support
adding changes to mark rocm/dgl deprecated, retaining original readme…
@mukh1l mukh1l marked this pull request as ready for review June 22, 2026 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants