[AUTOGENERATED] [release/2.1] Cherry-pick PR-1767 #1777

rocm-mici · 2024-12-06T17:34:59Z

Cherry-pick of #1767

… kernel (pytorch#140259) (#1767) It was raised that the backwards layer norm on AMD was slightly off the accuracy of the equivalent NVIDIA implementation. On AMD we call into a helper kernel `cuLoadWriteStridedInputs` which processes strided input and accumulates the partial gradients into shared memory. In this kernel (pytorch#87635) we truncated `mean` and `rstd` from T_ACC type to T which causes numerical issues in the warp buffers created in this kernel. This PR will use the correct accumulator type for mean and rstd. Note: Only AMD call into this call stack for backwards layer norm, so this was not an issue for NV. Pull Request resolved: pytorch#140259 Approved by: https://github.com/jianyuh (cherry picked from commit 001f736) Fixes #ISSUE_NUMBER

rocm-repo-management-api · 2024-12-06T17:51:07Z

Jenkins build for 895c9dbceb93651cb00293029e2c824284ebf091 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

rocm-mici mentioned this pull request Dec 6, 2024

[release/2.2] [ROCm] Correct numerical issues in layer norm backwards kernel (#140259) #1767

Merged

jithunnair-amd closed this Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AUTOGENERATED] [release/2.1] Cherry-pick PR-1767 #1777

[AUTOGENERATED] [release/2.1] Cherry-pick PR-1767 #1777

Uh oh!

rocm-mici commented Dec 6, 2024

Uh oh!

rocm-repo-management-api bot commented Dec 6, 2024 •

edited

Loading

Uh oh!

Uh oh!

[AUTOGENERATED] [release/2.1] Cherry-pick PR-1767 #1777

[AUTOGENERATED] [release/2.1] Cherry-pick PR-1767 #1777

Uh oh!

Conversation

rocm-mici commented Dec 6, 2024

Uh oh!

rocm-repo-management-api bot commented Dec 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

rocm-repo-management-api bot commented Dec 6, 2024 •

edited

Loading