Issue on reward calculation #11

Luodian · 2025-02-07T03:30:15Z

Kindly remind the issue from R1-V here for someone spot the similar issue

Sorry for making this mistake on our initial codebase. This may lead to our failed trial, as we
explained here:

0xvincii · 2025-02-07T11:49:10Z

We find out the problem is related to KL computation that the policy and ref policy do not have image as input. However, the consequences are still blurring (kind of good in our experiments?), waiting for more tests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue on reward calculation #11

Issue on reward calculation #11

Luodian commented Feb 7, 2025 •

edited

Loading

0xvincii commented Feb 7, 2025 •

edited

Loading

Issue on reward calculation #11

Issue on reward calculation #11

Comments

Luodian commented Feb 7, 2025 • edited Loading

0xvincii commented Feb 7, 2025 • edited Loading

Luodian commented Feb 7, 2025 •

edited

Loading

0xvincii commented Feb 7, 2025 •

edited

Loading