clevr_cogen_a_train数据集上，think的结果和answer的结果不一致 #93

xjx0524 · 2025-02-14T05:36:33Z

Content: <think>
The image contains a few distinct objects:

A large purple cylinder.

A small purple cylinder.

A matte yellow cube.

A small gray metallic sphere.

All of these objects appear to be separate and distinct without overlap or attachment.
</think>
<answer>
5
</answer>
Solution: <answer> 5 </answer>

按实例代码跑的GRPO，目前训练到20%，发现很多都是think数错了，但是answer对了，这个怎么理解呢？

The text was updated successfully, but these errors were encountered:

Racktic · 2025-02-14T16:48:07Z

Same observation! Seems that the model just surprisingly guess the correct answer.

austingg · 2025-02-17T07:03:25Z

There are no supervised signal for content, only format_reward and accuracy_reward, if you look into DeepSeek R1 's CoT, there are also something wrong when you ask math problem

m-Just · 2025-02-17T08:31:40Z

Because the pre-trained model is already pretty good at directly answering such questions without CoT. See #72.

@austingg That's interesting. Do you mind sharing some DeepSeek R1 wrong CoT examples (but with correct answers) here? Or could you point me to the place where I can find such examples?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clevr_cogen_a_train数据集上，think的结果和answer的结果不一致 #93

clevr_cogen_a_train数据集上，think的结果和answer的结果不一致 #93

xjx0524 commented Feb 14, 2025

Racktic commented Feb 14, 2025

austingg commented Feb 17, 2025

m-Just commented Feb 17, 2025

clevr_cogen_a_train数据集上，think的结果和answer的结果不一致 #93

clevr_cogen_a_train数据集上，think的结果和answer的结果不一致 #93

Comments

xjx0524 commented Feb 14, 2025

Racktic commented Feb 14, 2025

austingg commented Feb 17, 2025

m-Just commented Feb 17, 2025