-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support long context dataset accuracy measurement. #230
Conversation
The results you pasted -- are they from an actual benchmark run on 405b? Can you paste the full results (as a screenshot or paste link)? |
No, this is from a mock run. I am working on the actual benchmarking. |
sounds good |
91fbb46
to
877f8b2
Compare
return {"exact_match": round(score, 2)} | ||
|
||
|
||
def qa_em(label, pred): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add some docustring about each functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
4a56d13
to
d0ef346
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
The result should be like: