-
Notifications
You must be signed in to change notification settings - Fork 30.9k
Fix CUDA index out of bounds for q_idx in VLM token type masking for Gemma3, PaliGemma, and example modular #41757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix CUDA index out of bounds for q_idx in VLM token type masking for Gemma3, PaliGemma, and example modular #41757
Conversation
unsubscribe
…On Tue, Oct 21, 2025 at 12:32 PM github-actions[bot] < ***@***.***> wrote:
*github-actions[bot]* left a comment (huggingface/transformers#41757)
<#41757 (comment)>
*[For maintainers]* Suggested jobs to run (before merge)
run-slow: gemma3
—
Reply to this email directly, view it on GitHub
<#41757 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ATN5J2YTBP5G47TWRYZQEK33YXLA5AVCNFSM6AAAAACJYBPO2KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTIMRVGAZTMOJRGI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
[For maintainers] Suggested jobs to run (before merge) run-slow: gemma3, paligemma |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me! Can you also check pytest -k test_generate_with_static_cache tests/models/gemma3/test_modeling_gemma3.py
, since it was supposed to fails for gemma3 in that case?
Prob the test doesn't pass token type ids, or is already failing on main and we didn't notice it?
Thanks for your review, @zucchini-nlp. I have run the tests as requested and everything is OK: pytest -k test_generate_with_static_cache tests/models/gemma3/test_modeling_gemma3.py
tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_generate_with_static_cache PASSED [ 50%]
tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_generate_with_static_cache PASSED [100%]
============================================================================== 2 passed, 373 deselected, 2 warnings in 9.62s ============================================================================== |
Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, have to approve to merge
Thank YOU, @zucchini-nlp for your fast and efficient review! |
…Gemma3, PaliGemma, and example modular (huggingface#41757) * Fix CUDA index out of bounds for q_idx in Gemma3 token type masking * Fix CUDA index out of bounds for q_idx in modular modeling_new_task_model * Revert "Fix CUDA index out of bounds for q_idx in Gemma3 token type masking" This reverts commit f8e5c2a. * Fix CUDA index out of bounds for q_idx in PaliGemma token type masking * Fix CUDA index out of bounds for q_idx in Gemma3 token type masking
Fix CUDA index out of bounds error that occurs during generation with static caches when using token type IDs for bidirectional image attention masking.
Background
After PR
generate
delegates default cache initialization to the model #41505changed cache initialization behavior in
generate()
, a latent bug in the VLM masking code was exposed. The error manifests as:Bug
In the
token_type_ids_mask_function
inner mask, the code correctly handles out-of-boundskv_idx
values but fails to handle out-of-boundsq_idx
values.The PR
originally fixed the bidirectional image masking by adding bounds checking for
kv_idx
, but overlooked thatq_idx
needed the same protection.During generation with static caches:
q_idx
andkv_idx
that can exceedtoken_type_ids.shape[1]
token_type_ids[batch_idx, q_idx]
causes CUDA index out of bounds errors whenq_idx >= token_type_ids.shape[1]
The code comment on line 740 already acknowledged this issue:
Bounds checking was implemented for
kv_idx
, butq_idx
was overlooked.Fix
This PR adds the same
torch.where
bounds-checking pattern forq_idx
that already existed forkv_idx
:safe_q_idx
to clamp indices within valid rangetorch.where
to mask out-of-bounds values with appropriate defaults (0 fortoken_type_ids
, -1 forimage_group_ids
)Affected Models
Gemma3ForConditionalGeneration
PaliGemmaForConditionalGeneration
modeling_new_task_model.py
)Testing
This PR fixes the downstream failing test in TRL:
See associated issue:
Related Issues
Regression exposed by: cache initialization refactor
generate
delegates default cache initialization to the model #41505This PR completes the fix started in (which fixed
kv_idx
but missedq_idx
):This PR will fix CI fails with dev dependencies: torch.AcceleratorError: CUDA error: device-side assert triggered trl#4281
CC:
generate
delegates default cache initialization to the model #41505