add blockmask to flashmask #76407

starcrown001 · 2025-11-13T14:39:37Z

PR Category

Operator Mechanism

PR Types

New features

Description

Add Blockmask Support to Flashmask

This update enables simultaneous use of both blockmask and flashmask masking methods, where the two masks are combined via a logical OR operation. Any block masked by blockmask, or any block fully masked by flashmask, will be excluded from computation.
Currently, blockmask and flashmask are not fully decoupled. There is no change when using flashmask alone; however, when using blockmask alone, a flashmask tensor (that does not affect the results) still needs to be provided.
At present, only cases with headdim=128 and blocksize=128 are supported.
Compared to the original blockmask implementation (https://github.com/mit-han-lab/Block-Sparse-Attention), this version achieves a 75% to 150% improvement in forward performance and a 50% to 90% improvement in backward performance on H800.
Comprehensive regression testing has been conducted for both accuracy and performance against the original flashmask operator, and there is negligible impact on the accuracy or performance of the original flashmask.

paddle-bot · 2025-11-13T14:39:45Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

CLAassistant · 2025-11-13T14:39:46Z

All committers have signed the CLA.

CLAassistant · 2025-11-13T14:39:46Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

xiehaoyang seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

starcrown001 · 2025-11-13T17:19:35Z

/re-run all-failed

codecov-commenter · 2025-11-13T23:18:35Z

Codecov Report

❌ Patch coverage is 0% with 11 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@2d9355a). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
python/paddle/nn/functional/flash_attention.py	0.00%	11 Missing ⚠️

❌ Your patch status has failed because the patch coverage (0.00%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             develop   #76407   +/-   ##
==========================================
  Coverage           ?    0.00%           
==========================================
  Files              ?        1           
  Lines              ?       11           
  Branches           ?        0           
==========================================
  Hits               ?        0           
  Misses             ?       11           
  Partials           ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

umiswing · 2025-11-14T02:49:25Z

paddle/phi/kernels/gpu/flash_attn_v3_grad_kernel.cu

  if (softmax_lse_log2) {
    dev_ctx.template Alloc<float>(softmax_lse_log2);
  }
+  // std::cout << "dq_accum:" << dq_accum->dims() << std::endl;


这些不需要的注释可以去掉

好的，我在去掉后重新 Push 一下

umiswing · 2025-11-14T02:49:34Z

paddle/phi/kernels/gpu/flash_attn_v3_grad_kernel.cu

      dq_accum->Resize(common::make_ddim(
          {num_heads, total_q_padded_rounded * head_size_rounded}));
    }
+    // std::cout << "enter:" << dq_accum->dims() << std::endl;


这些不需要的注释可以去掉

umiswing · 2025-11-14T02:54:13Z

paddle/phi/kernels/gpu/flash_attn_v3_grad_kernel.cu

+    dynload::flashmaskv2_bwd_params_set_n_block_dim(params_handle, 128);
+    dynload::flashmaskv2_bwd_params_set_block_mask_ptr(
+        params_handle,
+        const_cast<int32_t *>(block_mask_indices.data<int32_t>()));


这个const_cast是必需的吗？我看flash-attention仓库里，block_mask_ptr实际也不是const ptr?

这部分是仿照之前的 start_row_indices 写的，此处通过 .data<int32_t>() 获取的指针不是 const int32_t * 类型的指针，应该不太需要 const_cast 来去掉 const 修饰。前面的 start_row_indices 相关的指针是否也需要进行类似处理？

看了下代码，目前好像已经支持返回非const的指针了
https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/phi/core/dense_tensor.h#L215

umiswing · 2025-11-14T02:54:23Z

paddle/phi/kernels/gpu/flash_attn_v3_grad_kernel.cu

+        const_cast<int32_t *>(block_mask_indices.data<int32_t>()));
+    // phi::funcs::SetConstant<Context, T> set_dq_zero;
+    // // dev_ctx.template Alloc<T>(dq);
+    // set_dq_zero(dev_ctx, dq, T{0});


这些不需要的注释可以去掉

umiswing · 2025-11-14T03:16:50Z

paddle/phi/kernels/gpu/flash_attn_v3_kernel.cu

+    dynload::flashmaskv2_fwd_params_set_n_block_dim(params_handle, 128);
+    dynload::flashmaskv2_fwd_params_set_block_mask_ptr(
+        params_handle,
+        const_cast<int32_t *>(block_mask_indices.data<int32_t>()));


这个const_cast是不是可以去掉？

umiswing · 2025-11-14T04:01:23Z

python/paddle/nn/functional/flash_attention.py

+
+            assert key.shape[3] == 128, (
+                "headdim must be 128 when using block_mask_attn"
+            )


block_mask功能当前只支持fa3，且不支持deterministic，是否需要再加个fa version和deterministic的assert？

感觉确实需要 assert ，我添加一下

starcrown001 · 2025-11-17T04:08:15Z

/re-run all-failed

starcrown001 · 2025-11-17T04:16:31Z

/re-run all-failed

umiswing · 2025-11-17T16:24:25Z

/re-run all-failed

umiswing · 2025-11-17T16:31:39Z

/re-run all-failed

umiswing · 2025-11-18T03:11:19Z

/re-run all-failed

starcrown001 · 2025-11-18T05:05:36Z

/re-run all-failed

yuanlehome

for op yaml

umiswing · 2025-11-18T08:59:46Z

python/paddle/nn/functional/flash_attention.py

        training (bool): Whether the module is in training mode. Default is True.
        name (str, optional): Name of the operation. Default is None. Normally, users do not need to set this property.
            For more information, refer to :ref:`api_guide_Name` .
+        block_mask_indices (tensor, optional):


这里是不是多了一行

starcrown001 · 2025-11-18T13:37:19Z

/re-run all-failed

add test fix description del test code

umiswing · 2025-11-19T11:52:46Z

paddle/phi/kernels/gpu/flash_attn_v3_grad_kernel.cu

+    dynload::flashmaskv2_bwd_params_set_n_block_dim(params_handle, 128);
+    dynload::flashmaskv2_bwd_params_set_block_mask_ptr(
+        params_handle, (block_mask.data<int32_t>()));
+    auto ptr = block_mask.data<int32_t>();


这个ptr在哪里被用到的？

umiswing · 2025-11-19T12:52:05Z

paddle/phi/kernels/gpu/flash_attn_v3_kernel.cu

                   phi::float16,
-                   phi::bfloat16) {}
+                   phi::bfloat16) {
+  kernel->InputAt(4).SetBackend(phi::Backend::ALL_BACKEND);  // block_mask


这个设置的作用是？

yuanlehome

for op yaml

* add blockmask add test fix description del test code * delete unused ptr

paddle-bot bot added the contributor External developers label Nov 13, 2025

starcrown001 force-pushed the flashmask_blockmask branch from d1cda79 to 674eab0 Compare November 13, 2025 14:42

umiswing suggested changes Nov 14, 2025

View reviewed changes

starcrown001 force-pushed the flashmask_blockmask branch from 674eab0 to cced066 Compare November 16, 2025 10:00

yuanlehome previously approved these changes Nov 18, 2025

View reviewed changes

umiswing reviewed Nov 18, 2025

View reviewed changes

starcrown001 dismissed yuanlehome’s stale review via 34ec177 November 18, 2025 11:34

starcrown001 force-pushed the flashmask_blockmask branch from 34ec177 to 6ab581d Compare November 18, 2025 11:34

add blockmask

4c72c9d

add test fix description del test code

starcrown001 force-pushed the flashmask_blockmask branch from 6ab581d to 4c72c9d Compare November 19, 2025 05:13

umiswing reviewed Nov 19, 2025

View reviewed changes

delete unused ptr

bd7ad77

zyfncg approved these changes Nov 20, 2025

View reviewed changes

yuanlehome approved these changes Nov 20, 2025

View reviewed changes

SigureMo approved these changes Nov 20, 2025

View reviewed changes

zrr1999 approved these changes Nov 20, 2025

View reviewed changes

qingqing01 approved these changes Nov 20, 2025

View reviewed changes

swgu98 added the skip-ci: h-ci label Nov 20, 2025

risemeup1 added the skip-ci: coverage label Nov 20, 2025

GuoxiaWang approved these changes Nov 20, 2025

View reviewed changes

GuoxiaWang merged commit 43c370f into PaddlePaddle:develop Nov 20, 2025
116 of 129 checks passed

fsylmxx pushed a commit to fsylmxx/Paddle that referenced this pull request Nov 20, 2025

add blockmask to flashmask (PaddlePaddle#76407)

e8b5447

* add blockmask add test fix description del test code * delete unused ptr

add blockmask to flashmask #76407

add blockmask to flashmask #76407

Uh oh!

Conversation

starcrown001 commented Nov 13, 2025

PR Category

PR Types

Description

Uh oh!

paddle-bot bot commented Nov 13, 2025

Uh oh!

CLAassistant commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented Nov 13, 2025

Uh oh!

starcrown001 commented Nov 13, 2025

Uh oh!

codecov-commenter commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

starcrown001 Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

starcrown001 commented Nov 17, 2025

Uh oh!

starcrown001 commented Nov 17, 2025

Uh oh!

umiswing commented Nov 17, 2025

Uh oh!

umiswing commented Nov 17, 2025

Uh oh!

umiswing commented Nov 18, 2025

Uh oh!

starcrown001 commented Nov 18, 2025

Uh oh!

yuanlehome left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

starcrown001 commented Nov 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuanlehome left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

CLAassistant commented Nov 13, 2025 •

edited

Loading

codecov-commenter commented Nov 13, 2025 •

edited

Loading

starcrown001 Nov 14, 2025 •

edited

Loading