Skip to content

Conversation

kanghui0204
Copy link
Collaborator

last PR:#3851
last revet PR:#4340

@kanghui0204 kanghui0204 requested review from QiJune and hyukn May 15, 2025 02:52
@hyukn
Copy link
Collaborator

hyukn commented May 15, 2025

/bot run --stage-list "DGX_H100-4_GPUs-PyTorch-Others-1"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5257 [ run ] triggered by Bot

@hyukn
Copy link
Collaborator

hyukn commented May 15, 2025

/bot kill

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5313 [ kill ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5257 [ run ] completed with state ABORTED

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5313 [ kill ] completed with state SUCCESS
Successfully killed previous jobs for commit 8614f2c

@kanghui0204 kanghui0204 force-pushed the low_precision_allreduce_for_pcie branch from 8614f2c to fb827d8 Compare May 15, 2025 16:01
@hyukn
Copy link
Collaborator

hyukn commented May 16, 2025

/bot run --stage-list "DGX_H100-4_GPUs-PyTorch-Others-1"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5420 [ run ] triggered by Bot

@kanghui0204 kanghui0204 force-pushed the low_precision_allreduce_for_pcie branch from fb827d8 to b3e1189 Compare May 16, 2025 02:17
@EmmaQiaoCh
Copy link
Collaborator

/bot run --stage-list "DGX_H100-4_GPUs-PyTorch-Others-1"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5420 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3956 (Partly Tested) completed with status: 'SUCCESS'

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5437 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5437 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3968 (Partly Tested) completed with status: 'SUCCESS'

@kanghui0204 kanghui0204 force-pushed the low_precision_allreduce_for_pcie branch from b3e1189 to 60360e1 Compare May 16, 2025 05:45
@hyukn
Copy link
Collaborator

hyukn commented May 16, 2025

/bot run --disable-fail-fast --add-multi-gpu-test

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5460 [ run ] triggered by Bot

@yizhang-nv
Copy link
Member

Is this set of kernels considered cuda graph support? If the barrier flag is captured, during the graph replay, it may cause issues since we depend on the value of the barrier flag and the comm buffer to make sure every gpu reach to the same barrier.

For the captured value, if the model has odd number of ar, then it may select the same peer comm buffer here:
https://github.com/NVIDIA/TensorRT-LLM/pull/4344/files#diff-fd189077a08106939fbdaf23180ba0bc7d81d76279c632db9097381e8440b2c9R1422

@kanghui0204
Copy link
Collaborator Author

Is this set of kernels considered cuda graph support? If the barrier flag is captured, during the graph replay, it may cause issues since we depend on the value of the barrier flag and the comm buffer to make sure every gpu reach to the same barrier.

For the captured value, if the model has odd number of ar, then it may select the same peer comm buffer here: https://github.com/NVIDIA/TensorRT-LLM/pull/4344/files#diff-fd189077a08106939fbdaf23180ba0bc7d81d76279c632db9097381e8440b2c9R1422

Do all our current kernels need to support CUDA graphs? I haven't tested these kernels on CUDA graphs.

@hyukn
Copy link
Collaborator

hyukn commented May 16, 2025

/bot run --add-multi-gpu-test

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5503 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5460 [ run ] completed with state ABORTED

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5503 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4010 completed with status: 'FAILURE'

@kanghui0204 kanghui0204 force-pushed the low_precision_allreduce_for_pcie branch from 22e54f5 to 6ed9c56 Compare May 17, 2025 09:29
@kanghui0204
Copy link
Collaborator Author

/bot run --add-multi-gpu-test

@EmmaQiaoCh
Copy link
Collaborator

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5596 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5596 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4082 completed with status: 'FAILURE'

@kanghui0204 kanghui0204 force-pushed the low_precision_allreduce_for_pcie branch 2 times, most recently from bd92a55 to 7097900 Compare May 19, 2025 00:22
@hyukn
Copy link
Collaborator

hyukn commented May 19, 2025

/bot run --add-multi-gpu-test

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5644 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5644 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4124 completed with status: 'FAILURE'

@kanghui0204 kanghui0204 force-pushed the low_precision_allreduce_for_pcie branch from 7097900 to ec197d9 Compare May 19, 2025 04:21
@hyukn
Copy link
Collaborator

hyukn commented May 19, 2025

/bot run --add-multi-gpu-test

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5673 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5673 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4144 completed with status: 'SUCCESS'

@hyukn
Copy link
Collaborator

hyukn commented May 19, 2025

Pipeline passed. Merge this PR.

@hyukn hyukn merged commit 6f3922f into NVIDIA:main May 19, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants