[release/2.2] [ROCm] Correct numerical issues in layer norm backwards kernel (#140259) #1767

jataylo · 2024-12-04T14:37:49Z

It was raised that the backwards layer norm on AMD was slightly off the accuracy of the equivalent NVIDIA implementation.

On AMD we call into a helper kernel cuLoadWriteStridedInputs which processes strided input and accumulates the partial gradients into shared memory.

In this kernel (pytorch#87635) we truncated mean and rstd from T_ACC type to T which causes numerical issues in the warp buffers created in this kernel. This PR will use the correct accumulator type for mean and rstd.

Note: Only AMD call into this call stack for backwards layer norm, so this was not an issue for NV.

Pull Request resolved: pytorch#140259
Approved by: https://github.com/jianyuh

(cherry picked from commit 001f736)

Fixes #ISSUE_NUMBER

…ch#140259) It was raised that the backwards layer norm on AMD was slightly off the accuracy of the equivalent NVIDIA implementation. On AMD we call into a helper kernel `cuLoadWriteStridedInputs` which processes strided input and accumulates the partial gradients into shared memory. In this kernel (pytorch#87635) we truncated `mean` and `rstd` from T_ACC type to T which causes numerical issues in the warp buffers created in this kernel. This PR will use the correct accumulator type for mean and rstd. Note: Only AMD call into this call stack for backwards layer norm, so this was not an issue for NV. Pull Request resolved: pytorch#140259 Approved by: https://github.com/jianyuh (cherry picked from commit 001f736)

rocm-repo-management-api · 2024-12-04T19:30:17Z

Jenkins build for 5ef76a334d756b4e912e48c71d11ba4afb2e2887 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

jithunnair-amd · 2024-12-06T17:33:21Z

cherry-pick --onto release/2.1

… kernel (pytorch#140259) (#1767) It was raised that the backwards layer norm on AMD was slightly off the accuracy of the equivalent NVIDIA implementation. On AMD we call into a helper kernel `cuLoadWriteStridedInputs` which processes strided input and accumulates the partial gradients into shared memory. In this kernel (pytorch#87635) we truncated `mean` and `rstd` from T_ACC type to T which causes numerical issues in the warp buffers created in this kernel. This PR will use the correct accumulator type for mean and rstd. Note: Only AMD call into this call stack for backwards layer norm, so this was not an issue for NV. Pull Request resolved: pytorch#140259 Approved by: https://github.com/jianyuh (cherry picked from commit 001f736) Fixes #ISSUE_NUMBER

rocm-mici · 2024-12-06T17:35:00Z

Created branch release/2.1_cherry-pick_pr-1767 and #1777

inemankov · 2024-12-13T18:32:18Z

!cherry-pick --onto release/2.3 release/2.4 release/2.5

rocm-mici · 2024-12-13T18:40:01Z

Can't perform the cherry-pick keyword: unexpected error

inemankov · 2024-12-13T18:41:35Z

!cherry-pick --onto release/2.3 release/2.4 release/2.5

… kernel (pytorch#140259) (#1767) It was raised that the backwards layer norm on AMD was slightly off the accuracy of the equivalent NVIDIA implementation. On AMD we call into a helper kernel `cuLoadWriteStridedInputs` which processes strided input and accumulates the partial gradients into shared memory. In this kernel (pytorch#87635) we truncated `mean` and `rstd` from T_ACC type to T which causes numerical issues in the warp buffers created in this kernel. This PR will use the correct accumulator type for mean and rstd. Note: Only AMD call into this call stack for backwards layer norm, so this was not an issue for NV. Pull Request resolved: pytorch#140259 Approved by: https://github.com/jianyuh (cherry picked from commit 001f736) Fixes #ISSUE_NUMBER

rocm-mici · 2024-12-13T18:44:27Z

Nothing to cherry-pick onto the release/2.3 branch

Nothing to cherry-pick onto the release/2.4 branch

Created branch release/2.5_cherry-pick_pr-1767 and #1794

jataylo requested review from pruthvistony and jithunnair-amd December 4, 2024 14:37

jataylo changed the title ~~[ROCm] Correct numerical issues in layer norm backwards kernel (#140259)~~ [release/2.2] [ROCm] Correct numerical issues in layer norm backwards kernel (#140259) Dec 4, 2024

pruthvistony approved these changes Dec 6, 2024

View reviewed changes

pruthvistony merged commit ce8fba1 into release/2.2 Dec 6, 2024
3 of 5 checks passed

pruthvistony deleted the rel22-picks-jack branch December 6, 2024 05:58

rocm-mici mentioned this pull request Dec 6, 2024

[AUTOGENERATED] [release/2.1] Cherry-pick PR-1767 #1777

Closed

rocm-mici mentioned this pull request Dec 13, 2024

[AUTOGENERATED] [release/2.5] Cherry-pick PR-1767 #1794

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[release/2.2] [ROCm] Correct numerical issues in layer norm backwards kernel (#140259) #1767

[release/2.2] [ROCm] Correct numerical issues in layer norm backwards kernel (#140259) #1767

Uh oh!

jataylo commented Dec 4, 2024

Uh oh!

rocm-repo-management-api bot commented Dec 4, 2024

Uh oh!

Uh oh!

jithunnair-amd commented Dec 6, 2024

Uh oh!

rocm-mici commented Dec 6, 2024

Uh oh!

inemankov commented Dec 13, 2024

Uh oh!

rocm-mici commented Dec 13, 2024

Uh oh!

inemankov commented Dec 13, 2024

Uh oh!

rocm-mici commented Dec 13, 2024

Uh oh!

Uh oh!

[release/2.2] [ROCm] Correct numerical issues in layer norm backwards kernel (#140259) #1767

[release/2.2] [ROCm] Correct numerical issues in layer norm backwards kernel (#140259) #1767

Uh oh!

Conversation

jataylo commented Dec 4, 2024

Uh oh!

rocm-repo-management-api bot commented Dec 4, 2024

Uh oh!

Uh oh!

jithunnair-amd commented Dec 6, 2024

Uh oh!

rocm-mici commented Dec 6, 2024

Uh oh!

inemankov commented Dec 13, 2024

Uh oh!

rocm-mici commented Dec 13, 2024

Uh oh!

inemankov commented Dec 13, 2024

Uh oh!

rocm-mici commented Dec 13, 2024

Uh oh!

Uh oh!