-
Notifications
You must be signed in to change notification settings - Fork 527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
关于奖励模型训练数据的构成 #409
Labels
question
Further information is requested
Comments
需要混合。 |
非常感谢您的回答,我可以再问您一下奖励模型数据的混合比例吗,大概通用数据集占多少,医疗数据集占多少? |
10:1,通用10 |
您好,请问可以和您交流一下奖励模型的训练吗?方便的话可以留一下联系方式。 |
你好,我最近才开始了解,也不是很熟悉,如果需要的话可以加我微信:Eren_139 |
@shibing624 请问下reward model数据集不支持像Instructgpt一样,一个prompt+k个response的排序集合吗?我看数据构造这块仅仅是偏好对的数据 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
我想请问一下,该项目在训练医学奖励模型的时候,是只用到了医学领域的偏好数据集吗?有没有和通用领域的偏好进行混合训练?我只用医学偏好数据集训练奖励模型会有严重的过拟合。
The text was updated successfully, but these errors were encountered: