-
Notifications
You must be signed in to change notification settings - Fork 1.5k
feat: Low Precision Allreduce for PCIe based GPU #3851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Low Precision Allreduce for PCIe based GPU #3851
Conversation
86140b5
to
4613735
Compare
@dongxuy04 @yuxianq Hi Dongxu, Yuxian, can you help review this MR? Thanks |
3ab2d21
to
ae1d6d4
Compare
38ddfa1
to
dd8ee0d
Compare
c0f6e3a
to
09bcfdc
Compare
a4de314
to
7816b41
Compare
/bot run --disable-fail-fast --add-multi-gpu-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some nits. LGTM.
7816b41
to
222c4a9
Compare
/bot run --disable-fail-fast --add-multi-gpu-test |
1 similar comment
/bot run --disable-fail-fast --add-multi-gpu-test |
PR_Github #4678 [ run ] triggered by Bot |
60d0eec
to
9a1e741
Compare
/bot run --stage-list "H100_PCIe-PyTorch-1" |
PR_Github #4929 [ run ] triggered by Bot |
/bot run --disable-fail-fast --add-multi-gpu-test |
PR_Github #4939 [ run ] triggered by Bot |
PR_Github #4929 [ run ] completed with state |
PR_Github #4939 [ run ] completed with state |
/bot run --disable-fail-fast --only-multi-gpu-test |
PR_Github #4980 [ run ] triggered by Bot |
PR_Github #4980 [ run ] completed with state |
9a1e741
to
4f2c648
Compare
/bot run --disable-fail-fast --add-multi-gpu-test |
PR_Github #5072 [ run ] triggered by Bot |
PR_Github #5072 [ run ] completed with state |
/bot run --stage-list "B200_PCIe-PackageSanityCheck-DLFW" |
PR_Github #5112 [ run ] triggered by Bot |
PR_Github #5112 [ run ] completed with state |
Signed-off-by: Hui Kang <[email protected]>
4f2c648
to
8614f2c
Compare
/bot reuse-pipeline |
PR_Github #5150 [ reuse-pipeline ] triggered by Bot |
PR_Github #5150 [ reuse-pipeline ] completed with state |
Hi @kanghui0204 @hyukn , we found that multi-gpu test pipelines are not triggered as expected in this PR due to |
Low Precision Allreduce for PCIe based GPU
[feat] Support a new feature for low precision allreduce for PCIe based GPU
Description
This PR adds a customized allreduce to TensorRT-LLM. The new allreduce is used for communication on PCIe-based GPUs via low-precision quantization, which can accelerate the PCIe allreduce process.
Test Coverage
I added a unit test for this feature:
tests/unittest/_torch/multi_gpu/test_lowprecision_allreduce.py
. I hope this test can run on L40 or L20 nodes; otherwise, the low-precision allreduce will fall back to NCCL, making the test meaningless and causing it to fail.