Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue on reward calculation #11

Open
Luodian opened this issue Feb 7, 2025 · 1 comment
Open

Issue on reward calculation #11

Luodian opened this issue Feb 7, 2025 · 1 comment

Comments

@Luodian
Copy link
Contributor

Luodian commented Feb 7, 2025

Kindly remind the issue from R1-V here for someone spot the similar issue

Deep-Agent/R1-V#20

Sorry for making this mistake on our initial codebase. This may lead to our failed trial, as we
explained here:

Image

@0xvincii
Copy link

0xvincii commented Feb 7, 2025

We find out the problem is related to KL computation that the policy and ref policy do not have image as input. However, the consequences are still blurring (kind of good in our experiments?), waiting for more tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants