Skip to content

Conversation

GAD-cell
Copy link

@GAD-cell GAD-cell commented Jun 23, 2025

Added a notebook for VLM GRPO. (I think I've corrected all the spelling error also).
Works along with PR 2752
@danielhanchen

@danielhanchen
Copy link
Contributor

@rolandtannous Could you check if this notebook works as expected thanks :)

@rolandtannous
Copy link
Contributor

on it

@rolandtannous
Copy link
Contributor

rolandtannous commented Jul 2, 2025

@GAD-cell @danielhanchen
I just tested this notebook on colab using both a float16 only GPU ( a T4) and a bfloat16 capable GPU (A100).
The notebook failed on both with the same set of runtime exceptions

  1. In its current form, the notebook fails both on a T4 and an A100 colab at the same post-SFT training sample generation cell, namely
sample = dataset[0]

message = [
        {"role":"system",
        "content": f"""You are given a problem with an image.
            Think about the problem and provide your working out.
            Place it between {reasoning_start} and {reasoning_end}.
            Then, provide your solution between {solution_start}{solution_end}"""
      },
      {"role": "user",
       "content": [
        {"type": "image"},
        {"type":"text","text":f"{sample['problem']}"},
        ]}]

image =sample['image']

input_text = tokenizer.apply_chat_template(message, add_generation_prompt=True)
inputs = tokenizer(
    image,
    input_text,
    add_special_tokens=False,
    return_tensors="pt",
).to("cuda")


from transformers import TextStreamer
_ = model.generate(
    **inputs,
    temperature = 0.1,
    max_new_tokens = 1024,
    streamer = TextStreamer(tokenizer, skip_prompt = False),
)

The error is as shown in this screenshot:
Screen Shot 2025-07-02 at 7 04 41 PM

This is caused by the Transformers library version installed in the colab environment.
The transformers version in colab is 4.53.0

2- If I downgrade the transformers version to 4.52.4 by forcing the version in the install cell as follows:

!pip install -U transformers==4.52.4

instead of

!pip install -U transformers

the error disappears, but then the notebook throws another cuda device assert runtime exception in the compiled compute_loss method , on both colab T4 and colab A100, during GRPO training. The runtime error is shown in the following screenshot
Screen Shot 2025-07-02 at 7 20 06 PM

Not sure if this is due to some code that was committed and merged to unsloth while unslothai/unsloth#2752 was being worked on, but it's definitely worth revisiting .

3- Not very trivial but should be fixed. reference to '/content/' library can lead to permissions error on local systems where the user requires sudo or isn't root in this cell

dataset = load_dataset('MMInstruction/Clevr_CoGenT_TrainA_R1',split='train',cache_dir = '/content/')

one way to solve this, is to replace '/content/' with './content' or '~/content'

@GAD-cell
Copy link
Author

GAD-cell commented Jul 2, 2025

@GAD-cell @danielhanchen I just tested this notebook on colab using both a float16 only GPU ( a T4) and a bfloat16 capable GPU (A100). The notebook failed on both with the same set of runtime exceptions

  1. In its current form, the notebook fails both on a T4 and an A100 colab at the same post-SFT training sample generation cell, namely
sample = dataset[0]

message = [
        {"role":"system",
        "content": f"""You are given a problem with an image.
            Think about the problem and provide your working out.
            Place it between {reasoning_start} and {reasoning_end}.
            Then, provide your solution between {solution_start}{solution_end}"""
      },
      {"role": "user",
       "content": [
        {"type": "image"},
        {"type":"text","text":f"{sample['problem']}"},
        ]}]

image =sample['image']

input_text = tokenizer.apply_chat_template(message, add_generation_prompt=True)
inputs = tokenizer(
    image,
    input_text,
    add_special_tokens=False,
    return_tensors="pt",
).to("cuda")


from transformers import TextStreamer
_ = model.generate(
    **inputs,
    temperature = 0.1,
    max_new_tokens = 1024,
    streamer = TextStreamer(tokenizer, skip_prompt = False),
)

The error is as shown in this screenshot: Screen Shot 2025-07-02 at 7 04 41 PM

This is caused by the Transformers library version installed in the colab environment. The transformers version in colab is 4.53.0

2- If I downgrade the transformers version to 4.52.4 by forcing the version in the install cell as follows:

!pip install -U transformers==4.52.4

instead of

!pip install -U transformers

the error disappears, but then the notebook throws another cuda device assert runtime exception in the compiled compute_loss method , on both colab T4 and colab A100, during GRPO training. The runtime error is shown in the following screenshot Screen Shot 2025-07-02 at 7 20 06 PM

Not sure if this is due to some code that was committed and merged to unsloth while unslothai/unsloth#2752 was being worked on, but it's definitely worth revisiting .

3- Not very trivial but should be fixed. reference to '/content/' library can lead to permissions error on local systems where the user requires sudo or isn't root in this cell

dataset = load_dataset('MMInstruction/Clevr_CoGenT_TrainA_R1',split='train',cache_dir = '/content/')

one way to solve this, is to replace '/content/' with './content' or '~/content'

Oh, thank you for the review! It was working last week, so you're right, it must be a recent update that broke the notebook.
I'll check that and get back to you.
Noted for the cache_dir thank you

@GAD-cell
Copy link
Author

GAD-cell commented Jul 2, 2025

Hey @rolandtannous.
II've identified the error in my code, which was indeed caused by recent updates in both unsloth and unsloth_zoo that I didn't notice.
So now my notebook should work along with PR 2752 and PR 188 can you confirm ?

@rolandtannous
Copy link
Contributor

@GAD-cell do you mind pushing the additional changes you made to the same 2752 PR? unslothai/unsloth#2752, that way we have the updated modifications in one consolidated file, and make sure there are no potential conflicts. Just switch locally to your 2752 branch, add the changes from PR188, then recommit and push 2752. This will update your PR2752. Then close PR188. Once that's done i'll go ahead and test.

@GAD-cell
Copy link
Author

GAD-cell commented Jul 3, 2025

@GAD-cell do you mind pushing the additional changes you made to the same 2752 PR? unslothai/unsloth#2752, that way we have the updated modifications in one consolidated file, and make sure there are no potential conflicts. Just switch locally to your 2752 branch, add the changes from PR188, then recommit and push 2752. This will update your PR2752. Then close PR188. Once that's done i'll go ahead and test.

Sorry maybe it wasn't clear but PR188 is for unsloth_zoo and PR2752 for unsloth so I can't push in the same PR.

@rolandtannous
Copy link
Contributor

rolandtannous commented Jul 3, 2025

you're right. completely missed that one. slow morning.
Thanks

@rolandtannous
Copy link
Contributor

Hello,

The ASSERT DEVICE CUDA runtime error still showing up on colab-T4 and colab-A100 during GRPO training
Screen Shot 2025-07-03 at 12 04 51 PM

Take your time to trace and debug. Check any code changes by testing (test on colab-T4 if you don't have access to colab-A100).
Please ping me once you got a final working solution so i can verify.

Also: do you happen to have up-to-date forks of unsloth-zoo and unsloth that contain your changes? This would make testing easier and avoid any issues that might be caused by manually patching files.

Thank you

@GAD-cell
Copy link
Author

GAD-cell commented Jul 3, 2025

Hello,

The ASSERT DEVICE CUDA runtime error still showing up on colab-T4 and colab-A100 during GRPO training Screen Shot 2025-07-03 at 12 04 51 PM

Take your time to trace and debug. Check any code changes by testing (test on colab-T4 if you don't have access to colab-A100). Please ping me once you got a final working solution so i can verify.

Also: do you happen to have up-to-date forks of unsloth-zoo and unsloth that contain your changes? This would make testing easier and avoid any issues that might be caused by manually patching files.

Thank you

@rolandtannous oh, okay, sorry about that. I just ran the notebook on a new VM and couldn’t reproduce your error with my updated code on colab-A100 and colab-T4. So maybe your code didn't update correctly ? If you used the same vm, don't forget to remove unsloth_compiled_cache. I'm checking again to see if I missed something.

Yes, I do have an updated fork, just run these commands at the beginning of the notebook instead of the regular installation:

! pip install -U git+https://github.com/GAD-cell/unsloth.git@VLM_GRPO
! pip install -U git+https://github.com/GAD-cell/unsloth-zoo.git@VLM_GRPO

And it should work ( and also run the cell 'extra colab install' right after ).

@rolandtannous
Copy link
Contributor

rolandtannous commented Jul 3, 2025

hello @GAD-cell

I verified that the code updated correctly by examining the in-place files after running the installation cells. I don't think that's the issue.
You're not using the --no-deps switch here:

! pip install -U git+https://github.com/GAD-cell/unsloth.git@VLM_GRPO
! pip install -U git+https://github.com/GAD-cell/unsloth-zoo.git@VLM_GRPO

You're overwriting a lot of packages on the colab environment. It results in installation taking more time.
We try to avoid that by using --no-deps and trying to make the code compatible with the pre-installed colab environment, except for some of the package versions we enforce manually in install cells.
Your fork installs should also be in "extra colab install" instead to make sure your files overwrite the installed unsloth-zoo and unsloth
Thank you for the forks. Will get back to you on this one :)

@GAD-cell
Copy link
Author

GAD-cell commented Jul 3, 2025

@rolandtannous ok yes my bad for --no-deps.
I re-ran the notebook with the following installation : first run extra colab install then install my forks with --no-deps.
This worked for me.
Also, thank you for your time :)

@rolandtannous
Copy link
Contributor

@GAD-cell it works installing from his branch with --no-deps both on a T4, and A100 colabs

If someone wants to reproduce. These are the updated installed cells I used

%%capture
import os
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth vllm
else:
    # [NOTE] Do the below ONLY in Colab! Use [[pip install unsloth vllm]]
    !pip install --no-deps unsloth vllm==0.8.5.post1

and

#@title Colab Extra Install (execute only in Colab) { display-mode: "form" }
%%capture
import os
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth vllm
else:
    !pip install --no-deps unsloth vllm==0.8.5.post1
    # [NOTE] Do the below ONLY in Colab! Use [[pip install unsloth vllm]]
    # Skip restarting message in Colab
    import sys, re, requests; modules = list(sys.modules.keys())
    for x in modules: sys.modules.pop(x) if "PIL" in x or "google" in x else None
    !pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl triton cut_cross_entropy 
    !pip install --force-reinstall --no-deps git+https://github.com/GAD-cell/unsloth-zoo.git@VLM_GRPO
    !pip install sentencepiece protobuf "datasets>=3.4.1" huggingface_hub hf_transfer

    #added for this specific notebook
    !pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
    !pip install --no-deps -U transformers==4.52.4
    !pip install --no-deps -U accelerate
    !pip install --no-deps trl==0.18.2

    # vLLM requirements - vLLM breaks Colab due to reinstalling numpy
    f = requests.get("https://raw.githubusercontent.com/vllm-project/vllm/refs/heads/main/requirements/common.txt").content
    with open("vllm_requirements.txt", "wb") as file:
        file.write(re.sub(rb"(transformers|numpy|xformers)[^\n]{1,}\n", b"", f))
    !pip install -r vllm_requirements.txt
    !pip install --force-reinstall --no-deps git+https://github.com/GAD-cell/unsloth.git@VLM_GRPO

note the --force-reinstall to ensure that the install is overwritten by the version with the fixes
This won't be necessary once the fixes are merged. This is only meant to allow proper testing pre-merge.

@danielhanchen confirmed to work now. can be merged
Requires unslothai/unsloth#2752 and unslothai/unsloth-zoo#188 to be merged first

Thank you for your contribution !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants