Fix CUDA index out of bounds for q_idx in VLM token type masking for Gemma3, PaliGemma, and example modular #41757

albertvillanova · 2025-10-21T06:59:22Z

Fix CUDA index out of bounds error that occurs during generation with static caches when using token type IDs for bidirectional image attention masking.

Background

After PR

🚨 [v5] generate delegates default cache initialization to the model #41505

changed cache initialization behavior in generate(), a latent bug in the VLM masking code was exposed. The error manifests as:

CUDA error: device-side assert triggered
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [0,0,0], thread: [0,0,0]
Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.

Bug

In the token_type_ids_mask_function inner mask, the code correctly handles out-of-bounds kv_idx values but fails to handle out-of-bounds q_idx values.

The PR

[gemma3] fix bidirectional image mask #39396

originally fixed the bidirectional image masking by adding bounds checking for kv_idx, but overlooked that q_idx needed the same protection.

During generation with static caches:

Cache shapes can exceed the actual input sequence length (e.g., static cache of 2048 positions with 512 token input)
The masking function receives both q_idx and kv_idx that can exceed token_type_ids.shape[1]
Direct indexing like token_type_ids[batch_idx, q_idx] causes CUDA index out of bounds errors when q_idx >= token_type_ids.shape[1]

The code comment on line 740 already acknowledged this issue:

"NOTE: static cache shape goes beyond input seq length, while token_type_ids.shape[1] == input seq length"

Bounds checking was implemented for kv_idx, but q_idx was overlooked.

Fix

This PR adds the same torch.where bounds-checking pattern for q_idx that already existed for kv_idx:

Create safe_q_idx to clamp indices within valid range
Use safe indices for tensor access
Apply torch.where to mask out-of-bounds values with appropriate defaults (0 for token_type_ids, -1 for image_group_ids)

Affected Models

Gemma3ForConditionalGeneration
PaliGemmaForConditionalGeneration
Example modular transformer template (modeling_new_task_model.py)

Testing

This PR fixes the downstream failing test in TRL:

tests/test_grpo_trainer.py::TestGRPOTrainer::test_training_vlm_0_trl_internal_testing_tiny_Gemma3ForConditionalGeneration

See associated issue:

CI fails with dev dependencies: torch.AcceleratorError: CUDA error: device-side assert triggered trl#4281

Related Issues

Regression exposed by: cache initialization refactor
- 🚨 [v5] generate delegates default cache initialization to the model #41505
This PR completes the fix started in (which fixed kv_idx but missed q_idx):
- [gemma3] fix bidirectional image mask #39396
This PR will fix CI fails with dev dependencies: torch.AcceleratorError: CUDA error: device-side assert triggered trl#4281

CC:

@gante, who made the PR 🚨 [v5] generate delegates default cache initialization to the model #41505
@zucchini-nlp , who made the PR [gemma3] fix bidirectional image mask #39396

mellowpraful · 2025-10-21T07:05:26Z

unsubscribe

…

On Tue, Oct 21, 2025 at 12:32 PM github-actions[bot] < ***@***.***> wrote: *github-actions[bot]* left a comment (huggingface/transformers#41757) <#41757 (comment)> *[For maintainers]* Suggested jobs to run (before merge) run-slow: gemma3 — Reply to this email directly, view it on GitHub <#41757 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ATN5J2YTBP5G47TWRYZQEK33YXLA5AVCNFSM6AAAAACJYBPO2KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTIMRVGAZTMOJRGI> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

…odel

HuggingFaceDocBuilderDev · 2025-10-21T07:08:58Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…asking" This reverts commit f8e5c2a.

github-actions · 2025-10-21T07:30:18Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: gemma3, paligemma

zucchini-nlp

Makes sense to me! Can you also check pytest -k test_generate_with_static_cache tests/models/gemma3/test_modeling_gemma3.py, since it was supposed to fails for gemma3 in that case?

Prob the test doesn't pass token type ids, or is already failing on main and we didn't notice it?

albertvillanova · 2025-10-21T11:52:21Z

Thanks for your review, @zucchini-nlp.

I have run the tests as requested and everything is OK:

pytest -k test_generate_with_static_cache tests/models/gemma3/test_modeling_gemma3.py

tests/models/gemma3/test_modeling_gemma3.py::Gemma3TextModelTest::test_generate_with_static_cache PASSED                                                                                            [ 50%]
tests/models/gemma3/test_modeling_gemma3.py::Gemma3Vision2TextModelTest::test_generate_with_static_cache PASSED                                                                                     [100%]

============================================================================== 2 passed, 373 deselected, 2 warnings in 9.62s ==============================================================================

zucchini-nlp · 2025-10-22T09:29:19Z

Thanks

zucchini-nlp

Oh right, have to approve to merge

albertvillanova · 2025-10-22T09:33:11Z

Thank YOU, @zucchini-nlp for your fast and efficient review!

…Gemma3, PaliGemma, and example modular (huggingface#41757) * Fix CUDA index out of bounds for q_idx in Gemma3 token type masking * Fix CUDA index out of bounds for q_idx in modular modeling_new_task_model * Revert "Fix CUDA index out of bounds for q_idx in Gemma3 token type masking" This reverts commit f8e5c2a. * Fix CUDA index out of bounds for q_idx in PaliGemma token type masking * Fix CUDA index out of bounds for q_idx in Gemma3 token type masking

Fix CUDA index out of bounds for q_idx in Gemma3 token type masking

f8e5c2a

Fix CUDA index out of bounds for q_idx in modular modeling_new_task_m…

0e3f93f

…odel

albertvillanova added 3 commits October 21, 2025 09:13

Revert "Fix CUDA index out of bounds for q_idx in Gemma3 token type m…

4da945f

…asking" This reverts commit f8e5c2a.

Fix CUDA index out of bounds for q_idx in PaliGemma token type masking

df7298a

Fix CUDA index out of bounds for q_idx in Gemma3 token type masking

ff6af30

albertvillanova changed the title ~~Fix CUDA index out of bounds for q_idx in Gemma3 token type masking~~ Fix CUDA index out of bounds for q_idx in VLM token type masking for Gemma3, PaliGemma, and example modular Oct 21, 2025

albertvillanova mentioned this pull request Oct 21, 2025

CI fails with dev dependencies: torch.AcceleratorError: CUDA error: device-side assert triggered huggingface/trl#4281

Closed

zucchini-nlp reviewed Oct 21, 2025

View reviewed changes

zucchini-nlp approved these changes Oct 22, 2025

View reviewed changes

zucchini-nlp merged commit 9a27302 into huggingface:main Oct 22, 2025
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix CUDA index out of bounds for q_idx in VLM token type masking for Gemma3, PaliGemma, and example modular #41757

Fix CUDA index out of bounds for q_idx in VLM token type masking for Gemma3, PaliGemma, and example modular #41757

albertvillanova commented Oct 21, 2025 •

edited

Loading

Uh oh!

mellowpraful commented Oct 21, 2025 via email

Uh oh!

HuggingFaceDocBuilderDev commented Oct 21, 2025

Uh oh!

github-actions bot commented Oct 21, 2025

Uh oh!

zucchini-nlp left a comment

Uh oh!

albertvillanova commented Oct 21, 2025

Uh oh!

zucchini-nlp commented Oct 22, 2025

Uh oh!

zucchini-nlp left a comment

Uh oh!

Uh oh!

albertvillanova commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix CUDA index out of bounds for q_idx in VLM token type masking for Gemma3, PaliGemma, and example modular #41757

Fix CUDA index out of bounds for q_idx in VLM token type masking for Gemma3, PaliGemma, and example modular #41757

Conversation

albertvillanova commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Bug

Fix

Affected Models

Testing

Related Issues

Uh oh!

mellowpraful commented Oct 21, 2025 via email

Uh oh!

HuggingFaceDocBuilderDev commented Oct 21, 2025

Uh oh!

github-actions bot commented Oct 21, 2025

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

albertvillanova commented Oct 21, 2025

Uh oh!

zucchini-nlp commented Oct 22, 2025

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

albertvillanova commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

albertvillanova commented Oct 21, 2025 •

edited

Loading