GRPO unbalanced-memory

### Reproduction

the memory in each rank(0-6) is not same, and i find when the train steps increase, the memory will increase much
step 0  use the origin code

![Image](https://github.com/user-attachments/assets/cea459ef-a998-4707-b78e-0d753ba32481)

Then i write a efficient grpo loss kernel by triton。
step 0

![Image](https://github.com/user-attachments/assets/de360884-491a-4c47-917e-78a7fb579207)

step 5

![Image](https://github.com/user-attachments/assets/2d148490-bb44-417c-8d7d-7befa2d853e7)

step 20

![Image](https://github.com/user-attachments/assets/1bd7930a-ce78-4183-9d6b-1e6e9ebcc6fd)

### System Info

trl = 0.14.0
torch = 2.5.1+cuda12.4
vllm = 0.7.1

### Checklist

- [x] I have checked that my issue isn't already filed (see [open issues](https://github.com/huggingface/trl/issues?q=is%3Aissue))
- [x] I have included my system information
- [x] Any code provided is minimal, complete, and reproducible ([more on MREs](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks))
- [x] Any code provided is properly formatted in code blocks, (no screenshot, [more on code blocks](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks))
- [x] Any traceback provided is complete

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GRPO unbalanced-memory #2805

Reproduction

System Info

Checklist

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

GRPO unbalanced-memory #2805

Description

Reproduction

System Info

Checklist

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions