AMD/ROCm integration for DGL#7922
Open
mukh1l wants to merge 98 commits into
Open
Conversation
Co-authored-by: mukh1l <mmallaiy@amd.com> Co-authored-by: awelling2801 <anuya.welling@amd.com> Co-authored-by: Tiffany Mintz <Tiffany.Mintz@amd.com> Co-authored-by: Vicky Tsang <vicky.tsang@amd.com> Co-authored-by: Radha Krishna Srimanthula <radha.srimanthula@amd.com>
Co-authored-by: mukh1l <mmallaiy@amd.com> Co-authored-by: awelling2801 <anuya.welling@amd.com> Co-authored-by: Tiffany Mintz <Tiffany.Mintz@amd.com> Co-authored-by: Vicky Tsang <vicky.tsang@amd.com> Co-authored-by: Radha Krishna Srimanthula <radha.srimanthula@amd.com>
This commit also adds an include that was missing during some builds and causing failures.
…rough cmake rather than directly with make
… single file (and the test utils).
…cess. Currently crashing on missing <cuda_fp16.h>
…s crash with missing symbols errors
…ng with import problems
… in warp size macros
Turning on more tests, mostly in graphbolt/impl. Still only one failure (other than the missing torch_geometric dependency) so far. It's a failure in test_hetero_cached_feature where the miss rate is slightly too high. Still have to turn on the dataloader tests.
The hipification of HugeCTR was in progress on the ROCm/dgl default branch, but was missing a few small tweaks. Especially the initialization of the warp_position in nv_gpu_cache.cu.
Pulling the changes directly from the prehipified source in the nod-ai branch and then making minor tweaks.
…LVM host compiler
Removed skip macros for - tests/python/common/cuda/test_gpu_cache.py - tests/python/pytorch/cuda/test_nccl.py
[Feature] Updating with Upstream
- Fixes the test failure in test_hetero_cached_feature.py caused by rocm's constraints on atomics (i.e. it doesn't support 8bit atomics) - Cleaning up comments and unused files from self review
This PR consolidates the graphbolt/CMakeLists.tx as much as possible so that CUDA and ROCm configurations do not have redundant CMake variables. It also switched the configurate of ROCm archs to the more standard CMAKE_HIP_ARCHITECTURES.
There are some challenges with the new hipco installation. For example, libhipcxx cmake config get installed in /opt/rocm/lib/rapids/cmake instead of the /opt/rocm/lib/cmake.
…and explicit mention that this is a fork.
… install graphbolt dependencies
Co-authored-by: Geoffrey Martin-Noble <GMNGeoffrey@gmail.com>
[Tests] Unskip tests that work on rocm
Allow setting CC and CXX for sub-project builds
[DOC] Update with readme with pypi links
[Feature] Bump hipCollections version
[Feature] Updating dgl ROCm support
… as a separate file
adding changes to mark rocm/dgl deprecated, retaining original readme…
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This pull request contributes the AMD/ROCm integration work from the ROCm/dgl fork back to upstream DGL.
Source
Branch amd-integration on ROCm/dgl, created from develop.
Test plan
Made with Cursor