-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[CI Failure] Fix backend selection for encoder-only models #28534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c3c5c39 to
94f2d04
Compare
MatthewBonanni
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Down the road I'd like to make AttentionType an enum, but this LGTM!
|
This pull request has merge conflicts that must be resolved before it can be |
|
Which backends actually do support encoder self-attention? Want to make sure this doesn't just kick over to another backend that doesn't support it and continue failing the tests. Please also make sure to run the previously-failing CI tests if they aren't triggered automatically |
LucasWilkinson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good; thanks for fixing this! please rebase
799e129 to
b6832ad
Compare
|
rebase |
|
Thanks @LucasWilkinson and @MatthewBonanni for reviewing! Just rebased my PR. Could you folks please help add the ready label (I don't have the permission) so I can run all previous failing CIs, thanks! |
DONE! |
b6832ad to
028d538
Compare
|
I think we can probably ignore my comments for now, but we should consider them in followup. Probably Matt can tackle that if you don't have time |
028d538 to
939862f
Compare
oops, sorry just saw your comment @mgoin - but I changed PR based on your comments (thanks)! I am OK either way. Please let me know and then I can start to run previous failing CIs! |
mgoin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the attention backends that support running ENCODER and ENCODER_DECODER? I don't see them mentioned anywhere cc @russellb @NickLucche
Regarding this, not sure if I get you correctly but this PR focuses on fixing ENCODER_ONLY model support. I will defer this questions to others. From my understanding,
|
972a6ae to
fe7f580
Compare
flash_attn supports ENCODER_DECODER. flashinfer would support it with this change: #25098 |
fe7f580 to
0c767bd
Compare
Signed-off-by: Huamin Li <[email protected]>
3f5e3f6 to
b27f2ca
Compare
|
I manually triggered all 3 previously failing CIs due to Language Models Test (Extended Pooling) - https://buildkite.com/vllm/ci/builds/38812/steps/canvas?jid=019a7c6c-315c-418a-8273-e5b946fbac0f |
Purpose
After #24794, encoder-only models (e.g., BERT) fail to initialize because the TRITON_ATTN backend is selected by default, but it doesn't support encoder self-attention, causing:
This PR implemented an opt-in approach for attention type support:
supports_attn_type()method toAttentionBackend:- Default behavior: Only supports
DECODERattention- Backends must explicitly override to support
ENCODER_ONLYor other attention types- This makes the system safe by default - new backends won't accidentally support encoder-only models
attn_typethrough the backend selection pipeline:- Added
attn_typeparameter toget_attn_backend()andvalidate_configuration()- Modified
EncoderOnlyAttentionto passattn_type=AttentionType.ENCODER_ONLY- Platform classes now validate attention type compatibility during backend selection
- FlexAttention: Supports DECODER + ENCODER_ONLY
- FlashAttention: Supports DECODER + ENCODER_ONLY
- CPU/TorchSDPA: Supports all attention types
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.