fp8 backward #119

micmelesse · 2025-01-24T19:32:58Z

add fp8 backward

brunomazzottiamd

I'm approving the PR because I can't see anything wrong with it. I just left some questions and cleanup suggestions.

brunomazzottiamd · 2025-02-04T13:56:23Z

flash_attn/flash_attn_triton_amd/README.md

@@ -43,6 +43,9 @@ python setup.py install
 pytest tests/test_flash_attn_triton_amd.py
 ```

+##### FP8
+In our fork, we have modified the api to work with fp8. You provide tensors that are scaled to be in fp8 range and their associated descaling factors.


I think that "scaled fp8 tensors" is better than "tensors that are scaled to be in fp8 range". I don't know, maybe someone can interpret this as an arbitrary data type scaled to be in fp8 range.

Do you think it's worth mentioning the descaling factors' type in this README?

I am going to add more info to the README. This was just the start.

brunomazzottiamd · 2025-02-04T13:59:01Z

flash_attn/flash_attn_triton_amd/bwd_prefill.py

+        descale_do = tl.load(DESCALE_do + off_z * stride_descale_q_z + off_h)
+
+        # do is scaled in the fp8 range and o is in fp8 but should be the same scale as fp32
+        # TODO: descale do so that we can use it as fp32


I think this TODO comment is deprecated since it's already done.

brunomazzottiamd · 2025-02-04T14:32:29Z

flash_attn/flash_attn_triton_amd/bwd_prefill.py

+        ds = dscores_scaled * sm_scale
+        ds = tl.where(p_mask, ds, 0.0)
+
+        # print("p:", p)


Can we clean up these print statements before merging?

brunomazzottiamd · 2025-02-04T14:36:54Z

flash_attn/flash_attn_triton_amd/bwd_prefill.py

+        # print("delta_i:", delta_i)
+        # print("ds:", ds) # NOTE:is almost the same between fp8 and fp16
+        descale_ds = descale_p
+        ds_fp8_scaled = ds * (1.0/ descale_ds)


We only need to compute ds_fp8_scaled for fp8 kernel, otherwise it's unused. What do you think of computing ds_fp8_scaled inside a if IS_FP8: statement?

brunomazzottiamd · 2025-02-04T14:43:56Z

flash_attn/flash_attn_triton_amd/bwd_prefill.py

@@ -553,6 +636,14 @@ def attention_prefill_backward_triton_impl(
        print("use_exp2:", use_exp2)
        print("sequence_parallel:", sequence_parallel)

+    is_fp8 = arch_supports_fp8() and q.dtype in {torch.float8_e4m3fnuz, torch.float8_e4m3fn, torch.float8_e5m2, torch.float8_e5m2fnuz}
+    if is_fp8:


I think this empty if is_fp8: statement can be removed. Do we need to print or debug anything inside it?

flash_attn/flash_attn_triton_amd/test.py

and ds This is a combination of 9 commits. Enable BWD fp8 This is a combination of 12 commits. add backward test case save clean up disable ci lse is good dv matches reduce diff use do fp8 for dv kinda working group size is a constexpr clean up a bit everything except mqa/gqa works skip mqa cases 20 cases have nan on dropout save what you have disable tests failing enable tests per block descale_p and descale_ds use max(abs(()) clean up tests a bit more

micmelesse changed the title ~~add backward test case~~ fp8 backward Jan 24, 2025

micmelesse force-pushed the micmelesse/fp8_bwd branch 4 times, most recently from 6b691eb to 297742b Compare February 3, 2025 09:24

micmelesse marked this pull request as ready for review February 4, 2025 13:37

micmelesse requested review from brunomazzottiamd, vgokhale and jtang10 February 4, 2025 13:38

brunomazzottiamd approved these changes Feb 4, 2025

View reviewed changes

micmelesse force-pushed the micmelesse/fp8_bwd branch from 7d0277c to 4cd9e2a Compare February 4, 2025 23:06

micmelesse added 7 commits February 5, 2025 08:04

fix bug

85749cd

disable ci for now

cbace3c

pass variables

e0e259c

add flags

19328b6

add alternate path. Still need to load descale factors

12b2fb1

dv working

dd702d9

dk works

1009f29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fp8 backward #119

fp8 backward #119

micmelesse commented Jan 24, 2025 •

edited

Loading

brunomazzottiamd left a comment

brunomazzottiamd Feb 4, 2025

micmelesse Feb 4, 2025

brunomazzottiamd Feb 4, 2025

brunomazzottiamd Feb 4, 2025

brunomazzottiamd Feb 4, 2025

brunomazzottiamd Feb 4, 2025

fp8 backward #119

Are you sure you want to change the base?

fp8 backward #119

Conversation

micmelesse commented Jan 24, 2025 • edited Loading

brunomazzottiamd left a comment

Choose a reason for hiding this comment

brunomazzottiamd Feb 4, 2025

Choose a reason for hiding this comment

micmelesse Feb 4, 2025

Choose a reason for hiding this comment

brunomazzottiamd Feb 4, 2025

Choose a reason for hiding this comment

brunomazzottiamd Feb 4, 2025

Choose a reason for hiding this comment

brunomazzottiamd Feb 4, 2025

Choose a reason for hiding this comment

brunomazzottiamd Feb 4, 2025

Choose a reason for hiding this comment

micmelesse commented Jan 24, 2025 •

edited

Loading