Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clevr_cogen_a_train数据集上,think的结果和answer的结果不一致 #93

Open
xjx0524 opened this issue Feb 14, 2025 · 3 comments

Comments

@xjx0524
Copy link

xjx0524 commented Feb 14, 2025

Content: <think>
The image contains a few distinct objects:

  1. A large purple cylinder.
  2. A small purple cylinder.
  3. A matte yellow cube.
  4. A small gray metallic sphere.

All of these objects appear to be separate and distinct without overlap or attachment.
</think>
<answer>
5
</answer>
Solution: <answer> 5 </answer>

按实例代码跑的GRPO,目前训练到20%,发现很多都是think数错了,但是answer对了,这个怎么理解呢?

@Racktic
Copy link

Racktic commented Feb 14, 2025

Same observation! Seems that the model just surprisingly guess the correct answer.

@austingg
Copy link

There are no supervised signal for content, only format_reward and accuracy_reward, if you look into DeepSeek R1 's CoT, there are also something wrong when you ask math problem

@m-Just
Copy link

m-Just commented Feb 17, 2025

Because the pre-trained model is already pretty good at directly answering such questions without CoT. See #72.

@austingg That's interesting. Do you mind sharing some DeepSeek R1 wrong CoT examples (but with correct answers) here? Or could you point me to the place where I can find such examples?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants