Does Flex attention API accepts a customized attention mask? #112

jiagaoxiang · 2025-02-06T03:33:25Z

Hi, by reading through the documentation, I am quite confused on the score_mod and block_mask argument of the flex attention api. They seem to be callables. I am wondering, is there a way I can provide a customized attention mask to the flex attention API just like attn_mask argument in torch.nn.functional.scaled_dot_product_attention? The reason I am asking is because my attention mask is very irregular (like in the attached image, white squares are masked positions) which is used in the encoder self-attention in Llama3.2 vision models (aka, aspect ratio mask).

Thank you!

Chillee · 2025-02-07T03:36:08Z

If you really want to use a pre-existing attention mask you can just write a mask_mod like

def mask_mod(b, h, q_idx, kv_idx):
    return mask[q_idx, kv_idx]
block_mask = create_block_mask(mask_mod, None, None, S_LEN, S_LEN)

But generally speaking, it'll be more efficient (and use less memory) to encode the behavior of your mask within mask_mod

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does Flex attention API accepts a customized attention mask? #112

Does Flex attention API accepts a customized attention mask? #112

jiagaoxiang commented Feb 6, 2025

Chillee commented Feb 7, 2025

Does Flex attention API accepts a customized attention mask? #112

Does Flex attention API accepts a customized attention mask? #112

Comments

jiagaoxiang commented Feb 6, 2025

Chillee commented Feb 7, 2025