Qwen2.5 VL GRPO notebook #61

GAD-cell · 2025-06-23T15:12:46Z

Added a notebook for VLM GRPO. (I think I've corrected all the spelling error also).
Works along with PR 2752
@danielhanchen

danielhanchen · 2025-07-02T10:08:46Z

@rolandtannous Could you check if this notebook works as expected thanks :)

rolandtannous · 2025-07-02T12:23:15Z

on it

rolandtannous · 2025-07-02T16:30:59Z

@GAD-cell @danielhanchen
I just tested this notebook on colab using both a float16 only GPU ( a T4) and a bfloat16 capable GPU (A100).
The notebook failed on both with the same set of runtime exceptions

In its current form, the notebook fails both on a T4 and an A100 colab at the same post-SFT training sample generation cell, namely

sample = dataset[0]

message = [
        {"role":"system",
        "content": f"""You are given a problem with an image.
            Think about the problem and provide your working out.
            Place it between {reasoning_start} and {reasoning_end}.
            Then, provide your solution between {solution_start}{solution_end}"""
      },
      {"role": "user",
       "content": [
        {"type": "image"},
        {"type":"text","text":f"{sample['problem']}"},
        ]}]

image =sample['image']

input_text = tokenizer.apply_chat_template(message, add_generation_prompt=True)
inputs = tokenizer(
    image,
    input_text,
    add_special_tokens=False,
    return_tensors="pt",
).to("cuda")


from transformers import TextStreamer
_ = model.generate(
    **inputs,
    temperature = 0.1,
    max_new_tokens = 1024,
    streamer = TextStreamer(tokenizer, skip_prompt = False),
)

The error is as shown in this screenshot:

This is caused by the Transformers library version installed in the colab environment.
The transformers version in colab is 4.53.0

2- If I downgrade the transformers version to 4.52.4 by forcing the version in the install cell as follows:

!pip install -U transformers==4.52.4

instead of

!pip install -U transformers

the error disappears, but then the notebook throws another cuda device assert runtime exception in the compiled compute_loss method , on both colab T4 and colab A100, during GRPO training. The runtime error is shown in the following screenshot

Not sure if this is due to some code that was committed and merged to unsloth while unslothai/unsloth#2752 was being worked on, but it's definitely worth revisiting .

3- Not very trivial but should be fixed. reference to '/content/' library can lead to permissions error on local systems where the user requires sudo or isn't root in this cell

dataset = load_dataset('MMInstruction/Clevr_CoGenT_TrainA_R1',split='train',cache_dir = '/content/')

one way to solve this, is to replace '/content/' with './content' or '~/content'

GAD-cell · 2025-07-02T16:50:26Z

@GAD-cell @danielhanchen I just tested this notebook on colab using both a float16 only GPU ( a T4) and a bfloat16 capable GPU (A100). The notebook failed on both with the same set of runtime exceptions

In its current form, the notebook fails both on a T4 and an A100 colab at the same post-SFT training sample generation cell, namely
sample = dataset[0]

message = [
        {"role":"system",
        "content": f"""You are given a problem with an image.
            Think about the problem and provide your working out.
            Place it between {reasoning_start} and {reasoning_end}.
            Then, provide your solution between {solution_start}{solution_end}"""
      },
      {"role": "user",
       "content": [
        {"type": "image"},
        {"type":"text","text":f"{sample['problem']}"},
        ]}]

image =sample['image']

input_text = tokenizer.apply_chat_template(message, add_generation_prompt=True)
inputs = tokenizer(
    image,
    input_text,
    add_special_tokens=False,
    return_tensors="pt",
).to("cuda")


from transformers import TextStreamer
_ = model.generate(
    **inputs,
    temperature = 0.1,
    max_new_tokens = 1024,
    streamer = TextStreamer(tokenizer, skip_prompt = False),
)
The error is as shown in this screenshot:

This is caused by the Transformers library version installed in the colab environment. The transformers version in colab is 4.53.0

2- If I downgrade the transformers version to 4.52.4 by forcing the version in the install cell as follows:
!pip install -U transformers==4.52.4
instead of
!pip install -U transformers
the error disappears, but then the notebook throws another cuda device assert runtime exception in the compiled compute_loss method , on both colab T4 and colab A100, during GRPO training. The runtime error is shown in the following screenshot

Not sure if this is due to some code that was committed and merged to unsloth while unslothai/unsloth#2752 was being worked on, but it's definitely worth revisiting .

3- Not very trivial but should be fixed. reference to '/content/' library can lead to permissions error on local systems where the user requires sudo or isn't root in this cell
dataset = load_dataset('MMInstruction/Clevr_CoGenT_TrainA_R1',split='train',cache_dir = '/content/')
one way to solve this, is to replace '/content/' with './content' or '~/content'

Oh, thank you for the review! It was working last week, so you're right, it must be a recent update that broke the notebook.
I'll check that and get back to you.
Noted for the cache_dir thank you

GAD-cell · 2025-07-02T22:45:53Z

Hey @rolandtannous.
II've identified the error in my code, which was indeed caused by recent updates in both unsloth and unsloth_zoo that I didn't notice.
So now my notebook should work along with PR 2752 and PR 188 can you confirm ?

rolandtannous · 2025-07-03T07:58:31Z

@GAD-cell do you mind pushing the additional changes you made to the same 2752 PR? unslothai/unsloth#2752, that way we have the updated modifications in one consolidated file, and make sure there are no potential conflicts. Just switch locally to your 2752 branch, add the changes from PR188, then recommit and push 2752. This will update your PR2752. Then close PR188. Once that's done i'll go ahead and test.

GAD-cell · 2025-07-03T08:07:09Z

@GAD-cell do you mind pushing the additional changes you made to the same 2752 PR? unslothai/unsloth#2752, that way we have the updated modifications in one consolidated file, and make sure there are no potential conflicts. Just switch locally to your 2752 branch, add the changes from PR188, then recommit and push 2752. This will update your PR2752. Then close PR188. Once that's done i'll go ahead and test.

Sorry maybe it wasn't clear but PR188 is for unsloth_zoo and PR2752 for unsloth so I can't push in the same PR.

rolandtannous · 2025-07-03T08:08:45Z

you're right. completely missed that one. slow morning.
Thanks

rolandtannous · 2025-07-03T09:08:31Z

Hello,

The ASSERT DEVICE CUDA runtime error still showing up on colab-T4 and colab-A100 during GRPO training

Take your time to trace and debug. Check any code changes by testing (test on colab-T4 if you don't have access to colab-A100).
Please ping me once you got a final working solution so i can verify.

Also: do you happen to have up-to-date forks of unsloth-zoo and unsloth that contain your changes? This would make testing easier and avoid any issues that might be caused by manually patching files.

Thank you

GAD-cell · 2025-07-03T09:37:38Z

Hello,

The ASSERT DEVICE CUDA runtime error still showing up on colab-T4 and colab-A100 during GRPO training

Take your time to trace and debug. Check any code changes by testing (test on colab-T4 if you don't have access to colab-A100). Please ping me once you got a final working solution so i can verify.

Also: do you happen to have up-to-date forks of unsloth-zoo and unsloth that contain your changes? This would make testing easier and avoid any issues that might be caused by manually patching files.

Thank you

@rolandtannous oh, okay, sorry about that. I just ran the notebook on a new VM and couldn’t reproduce your error with my updated code on colab-A100 and colab-T4. So maybe your code didn't update correctly ? If you used the same vm, don't forget to remove unsloth_compiled_cache. I'm checking again to see if I missed something.

Yes, I do have an updated fork, just run these commands at the beginning of the notebook instead of the regular installation:

! pip install -U git+https://github.com/GAD-cell/unsloth.git@VLM_GRPO
! pip install -U git+https://github.com/GAD-cell/unsloth-zoo.git@VLM_GRPO

And it should work ( and also run the cell 'extra colab install' right after ).

rolandtannous · 2025-07-03T09:48:37Z

hello @GAD-cell

I verified that the code updated correctly by examining the in-place files after running the installation cells. I don't think that's the issue.
You're not using the --no-deps switch here:

! pip install -U git+https://github.com/GAD-cell/unsloth.git@VLM_GRPO
! pip install -U git+https://github.com/GAD-cell/unsloth-zoo.git@VLM_GRPO

You're overwriting a lot of packages on the colab environment. It results in installation taking more time.
We try to avoid that by using --no-deps and trying to make the code compatible with the pre-installed colab environment, except for some of the package versions we enforce manually in install cells.
Your fork installs should also be in "extra colab install" instead to make sure your files overwrite the installed unsloth-zoo and unsloth
Thank you for the forks. Will get back to you on this one :)

GAD-cell · 2025-07-03T10:56:32Z

@rolandtannous ok yes my bad for --no-deps.
I re-ran the notebook with the following installation : first run extra colab install then install my forks with --no-deps.
This worked for me.
Also, thank you for your time :)

rolandtannous · 2025-07-03T11:17:56Z

@GAD-cell it works installing from his branch with --no-deps both on a T4, and A100 colabs

If someone wants to reproduce. These are the updated installed cells I used

%%capture
import os
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth vllm
else:
    # [NOTE] Do the below ONLY in Colab! Use [[pip install unsloth vllm]]
    !pip install --no-deps unsloth vllm==0.8.5.post1

and

#@title Colab Extra Install (execute only in Colab) { display-mode: "form" }
%%capture
import os
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth vllm
else:
    !pip install --no-deps unsloth vllm==0.8.5.post1
    # [NOTE] Do the below ONLY in Colab! Use [[pip install unsloth vllm]]
    # Skip restarting message in Colab
    import sys, re, requests; modules = list(sys.modules.keys())
    for x in modules: sys.modules.pop(x) if "PIL" in x or "google" in x else None
    !pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl triton cut_cross_entropy 
    !pip install --force-reinstall --no-deps git+https://github.com/GAD-cell/unsloth-zoo.git@VLM_GRPO
    !pip install sentencepiece protobuf "datasets>=3.4.1" huggingface_hub hf_transfer

    #added for this specific notebook
    !pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
    !pip install --no-deps -U transformers==4.52.4
    !pip install --no-deps -U accelerate
    !pip install --no-deps trl==0.18.2

    # vLLM requirements - vLLM breaks Colab due to reinstalling numpy
    f = requests.get("https://raw.githubusercontent.com/vllm-project/vllm/refs/heads/main/requirements/common.txt").content
    with open("vllm_requirements.txt", "wb") as file:
        file.write(re.sub(rb"(transformers|numpy|xformers)[^\n]{1,}\n", b"", f))
    !pip install -r vllm_requirements.txt
    !pip install --force-reinstall --no-deps git+https://github.com/GAD-cell/unsloth.git@VLM_GRPO

note the --force-reinstall to ensure that the install is overwritten by the version with the fixes
This won't be necessary once the fixes are merged. This is only meant to allow proper testing pre-merge.

@danielhanchen confirmed to work now. can be merged
Requires unslothai/unsloth#2752 and unslothai/unsloth-zoo#188 to be merged first

Thank you for your contribution !

GAD-cell added 3 commits June 23, 2025 17:11

Qwen2.5 VL GRPO notebook

35e6729

minor fixes

6dff250

fix

0c9b268

GAD-cell added 2 commits July 3, 2025 00:44

remove cache dir

00305c6

transformers 4.52.4

f5d77cd

shimmyshimmer mentioned this pull request Jul 11, 2025

[Question] Does Unsloth support GRPO finetuning with vision models? unslothai/unsloth#2931

Open

GAD-cell mentioned this pull request Jul 11, 2025

[Feature] VLMs support for GRPO unslothai/unsloth#2752

Open

Asma-Bk mentioned this pull request Jul 16, 2025

/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:94: operator(): block: [24973,0,0], thread: [57,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. GAD-cell/vlm-grpo#11

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qwen2.5 VL GRPO notebook #61

Qwen2.5 VL GRPO notebook #61

Uh oh!

GAD-cell commented Jun 23, 2025 •

edited

Loading

Uh oh!

danielhanchen commented Jul 2, 2025

Uh oh!

rolandtannous commented Jul 2, 2025

Uh oh!

rolandtannous commented Jul 2, 2025 •

edited

Loading

Uh oh!

GAD-cell commented Jul 2, 2025

Uh oh!

GAD-cell commented Jul 2, 2025 •

edited

Loading

Uh oh!

rolandtannous commented Jul 3, 2025

Uh oh!

GAD-cell commented Jul 3, 2025

Uh oh!

rolandtannous commented Jul 3, 2025 •

edited

Loading

Uh oh!

rolandtannous commented Jul 3, 2025

Uh oh!

GAD-cell commented Jul 3, 2025 •

edited

Loading

Uh oh!

rolandtannous commented Jul 3, 2025 •

edited

Loading

Uh oh!

GAD-cell commented Jul 3, 2025

Uh oh!

rolandtannous commented Jul 3, 2025

Uh oh!

Uh oh!

Qwen2.5 VL GRPO notebook #61

Are you sure you want to change the base?

Qwen2.5 VL GRPO notebook #61

Uh oh!

Conversation

GAD-cell commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danielhanchen commented Jul 2, 2025

Uh oh!

rolandtannous commented Jul 2, 2025

Uh oh!

rolandtannous commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GAD-cell commented Jul 2, 2025

Uh oh!

GAD-cell commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rolandtannous commented Jul 3, 2025

Uh oh!

GAD-cell commented Jul 3, 2025

Uh oh!

rolandtannous commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rolandtannous commented Jul 3, 2025

Uh oh!

GAD-cell commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rolandtannous commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GAD-cell commented Jul 3, 2025

Uh oh!

rolandtannous commented Jul 3, 2025

Uh oh!

Uh oh!

GAD-cell commented Jun 23, 2025 •

edited

Loading

rolandtannous commented Jul 2, 2025 •

edited

Loading

GAD-cell commented Jul 2, 2025 •

edited

Loading

rolandtannous commented Jul 3, 2025 •

edited

Loading

GAD-cell commented Jul 3, 2025 •

edited

Loading

rolandtannous commented Jul 3, 2025 •

edited

Loading