AMD/ROCm integration for DGL by mukh1l · Pull Request #7922 · dmlc/dgl

mukh1l · 2026-05-20T16:08:04Z

Summary

This pull request contributes the AMD/ROCm integration work from the ROCm/dgl fork back to upstream DGL.

Adds ROCm/HIP build and runtime support across core libraries, GraphBolt, and related tooling
Updates CMake, build scripts, and dependency installation for AMD GPU targets
Includes documentation updates (README, install instructions, Docker references)
98 commits ahead of dmlc/dgl master (~197 files changed)

Source

Branch amd-integration on ROCm/dgl, created from develop.

Test plan

Build DGL with ROCm/HIP enabled on supported AMD hardware
Run core Python unit tests on ROCm
Verify GraphBolt build and sampling paths on ROCm
Confirm CUDA builds remain unaffected
Review CI/build script changes for Linux ROCm environments

Made with Cursor

Co-authored-by: mukh1l <mmallaiy@amd.com> Co-authored-by: awelling2801 <anuya.welling@amd.com> Co-authored-by: Tiffany Mintz <Tiffany.Mintz@amd.com> Co-authored-by: Vicky Tsang <vicky.tsang@amd.com> Co-authored-by: Radha Krishna Srimanthula <radha.srimanthula@amd.com>

This commit also adds an include that was missing during some builds and causing failures.

…rough cmake rather than directly with make

… single file (and the test utils).

…cess. Currently crashing on missing <cuda_fp16.h>

…s crash with missing symbols errors

…ng with import problems

… in warp size macros

Turning on more tests, mostly in graphbolt/impl. Still only one failure (other than the missing torch_geometric dependency) so far. It's a failure in test_hetero_cached_feature where the miss rate is slightly too high. Still have to turn on the dataloader tests.

The hipification of HugeCTR was in progress on the ROCm/dgl default branch, but was missing a few small tweaks. Especially the initialization of the warp_position in nv_gpu_cache.cu.

Pulling the changes directly from the prehipified source in the nod-ai branch and then making minor tweaks.

…LVM host compiler

Removed skip macros for - tests/python/common/cuda/test_gpu_cache.py - tests/python/pytorch/cuda/test_nccl.py

[Feature] Updating with Upstream

…graphbolt_macros

- Fixes the test failure in test_hetero_cached_feature.py caused by rocm's constraints on atomics (i.e. it doesn't support 8bit atomics) - Cleaning up comments and unused files from self review

This PR consolidates the graphbolt/CMakeLists.tx as much as possible so that CUDA and ROCm configurations do not have redundant CMake variables. It also switched the configurate of ROCm archs to the more standard CMAKE_HIP_ARCHITECTURES.

There are some challenges with the new hipco installation. For example, libhipcxx cmake config get installed in /opt/rocm/lib/rapids/cmake instead of the /opt/rocm/lib/cmake.

…and explicit mention that this is a fork.

… install graphbolt dependencies

Co-authored-by: Geoffrey Martin-Noble <GMNGeoffrey@gmail.com>

[Tests] Unskip tests that work on rocm

Allow setting CC and CXX for sub-project builds

[DOC] Update with readme with pypi links

…ump_hipco

[Feature] Bump hipCollections version

[Feature] Updating dgl ROCm support

… as a separate file

adding changes to mark rocm/dgl deprecated, retaining original readme…

awelling2801 and others added 30 commits July 9, 2025 14:21

ROCm port of DGL

45c2969

Co-authored-by: mukh1l <mmallaiy@amd.com> Co-authored-by: awelling2801 <anuya.welling@amd.com> Co-authored-by: Tiffany Mintz <Tiffany.Mintz@amd.com> Co-authored-by: Vicky Tsang <vicky.tsang@amd.com> Co-authored-by: Radha Krishna Srimanthula <radha.srimanthula@amd.com>

ROCm port of DGL

41be1be

Co-authored-by: mukh1l <mmallaiy@amd.com> Co-authored-by: awelling2801 <anuya.welling@amd.com> Co-authored-by: Tiffany Mintz <Tiffany.Mintz@amd.com> Co-authored-by: Vicky Tsang <vicky.tsang@amd.com> Co-authored-by: Radha Krishna Srimanthula <radha.srimanthula@amd.com>

Making the build scripts compatible with ninja

c258631

This commit also adds an include that was missing during some builds and causing failures.

Fixing cmake warning and switching graphbolt to drive build proces th…

89fba27

…rough cmake rather than directly with make

First commit, turning on graphbolt in cmake and turning on tests in a…

3feb943

… single file (and the test utils).

Graphbolt still failing to build, but it's getting further in the pro…

42c2479

…cess. Currently crashing on missing <cuda_fp16.h>

With some manual tweaking graphbolt is compiling, but the python test…

b62b72c

…s crash with missing symbols errors

Compilation is working for graphbolt and tests are funning (but faili…

8f4e894

…ng with import problems

Unblocking some of the graphbolt tests for rocm builds

8701a6b

Adding macros and shims to graphbolt/include

ab2ce60

First step toward using header shims and macros instead of hipify_python

2d0128a

Found the bug causing the undefined symbol problem. It was a mismatch…

3b22ecc

… in warp size macros

Cleaning up hipification on HugeCTR

76de204

The hipification of HugeCTR was in progress on the ROCm/dgl default branch, but was missing a few small tweaks. Especially the initialization of the warp_position in nv_gpu_cache.cu.

Redoing the HugeCTR hipification

a87af66

Pulling the changes directly from the prehipified source in the nod-ai branch and then making minor tweaks.

Adding script to install graphbolt deps.sh

b729cac

Forgot to commit changes to graphbolt/test_dataloader.py

8d2fc96

Removing mold from linker flags and adding preset for building with L…

bb27f9c

…LVM host compiler

Removing mold from graphbolt builds too

96fdb2a

Turning on more tests

5154f2a

Removed skip macros for - tests/python/common/cuda/test_gpu_cache.py - tests/python/pytorch/cuda/test_nccl.py

Merge branch 'develop' into feat/jamesETsmith/bump_from_upstream

7174dc2

Merge pull request #2 from ROCm/feat/jamesETsmith/bump_from_upstream

69ceab3

[Feature] Updating with Upstream

Merge branch 'develop' of github.com:ROCm/dgl into feat/jamesETsmith/…

1b2510a

…graphbolt_macros

Addressing self review comments and fixing one test

8300a1c

- Fixes the test failure in test_hetero_cached_feature.py caused by rocm's constraints on atomics (i.e. it doesn't support 8bit atomics) - Cleaning up comments and unused files from self review

Applying lintrunner

8f42ca8

Skipping cooperative tests for rocm in graphbolt/test_dataloader.py

c9897bd

Updating the Dockerfile.rocm and readme instructions

d7747c9

Consolidating GPU config in graphbolt

c95b33d

This PR consolidates the graphbolt/CMakeLists.tx as much as possible so that CUDA and ROCm configurations do not have redundant CMake variables. It also switched the configurate of ROCm archs to the more standard CMAKE_HIP_ARCHITECTURES.

Moving dockerfile to docker directory and trimming it down.

c8d2f29

Forgot to check in the move of the graphbolt deps script

364d631

jamesETsmith and others added 29 commits November 12, 2025 10:56

Cleaning up graphbolt install script

5984306

There are some challenges with the new hipco installation. For example, libhipcxx cmake config get installed in /opt/rocm/lib/rapids/cmake instead of the /opt/rocm/lib/cmake.

Formatting changes and adding dockerignore

c8c5c16

Updating readme with pypi install instructions, links to docker hub, …

bdd482c

…and explicit mention that this is a fork.

Adding links to rocprim PR and better comments about in the script to…

7da392a

… install graphbolt dependencies

Update tests/python/common/test_traversal.py

79d5033

Co-authored-by: Geoffrey Martin-Noble <GMNGeoffrey@gmail.com>

Cleaning up lintrunner errors

b7aa167

Cleared lintrunner errors

1b60684

Cleared lintrunner errors

fdba576

Merge pull request #14 from ROCm/dev/awelling/skip_tests

9d57679

[Tests] Unskip tests that work on rocm

.Allow setting CC and CXX for sub-project builds

8dff97c

Merge pull request #19 from diptorupd/improve/build_scripts

ccaedce

Allow setting CC and CXX for sub-project builds

PR ready

61b841e

Minor change

3b2a3b2

Fixed typo in readme

a38a18a

Merge pull request #17 from ROCm/dev/jamesETsmith/update_readme

4fbbf39

[DOC] Update with readme with pypi links

Merge branch 'develop' of github.com:rocm/dgl into dev/jamesETsmith/b…

241301a

…ump_hipco

Attempt at fixing the linting errors

c61cc21

Adding issue numbers to the TODOs in graphbolt dependency script

8b797d2

Fixing linting

24fba8d

Merge pull request #16 from ROCm/dev/jamesETsmith/bump_hipco

63eed62

[Feature] Bump hipCollections version

Merge branch 'develop' of github.com:ROCm/dgl into develop

f27d36a

Addressing comments on PR

ed3f23a

Addressing latest set of comments

9729b7b

Forgot to include one reverted change to CMakeLists.txt

9a82d43

Making install_graphbolt_deps.sh executable

cc9e550

Addressing James' comments

324eaf1

Merge pull request #20 from gcapodagAMD/develop

9a3bfc1

[Feature] Updating dgl ROCm support

adding changes to mark rocm/dgl deprecated, retaining original readme…

131871c

… as a separate file

Merge pull request #25 from ROCm/muk/deprec-update

637d192

adding changes to mark rocm/dgl deprecated, retaining original readme…

mukh1l marked this pull request as ready for review June 22, 2026 15:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMD/ROCm integration for DGL#7922

AMD/ROCm integration for DGL#7922
mukh1l wants to merge 98 commits into
dmlc:masterfrom
ROCm-LS:amd-integration

mukh1l commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

mukh1l commented May 20, 2026

Summary

Source

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants