Skip to content

Commit 9b60364

Browse files
authored
Merge pull request meta-llama#292 from Vasanthrs-dev/patch-2
Update eval_details.md
2 parents 18f515a + c7c6c2e commit 9b60364

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

eval_details.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ This document contains additional context on the settings and parameters for how
33
### Auto-eval benchmark notes
44
#### MMLU
55
- We are reporting macro averages for MMLU benchmarks. The micro average numbers for MMLU are: 65.4 and 67.4 for the 8B pre-trained and instruct-aligned models, 78.9 and 82.0 for the 70B pre-trained and instruct-aligned models
6-
- The pre-trained models are evaluated in the standard way by calualting the likelihood of each choice character. For the instruct-aligned models, we use a dialogue prompt (*user/assistant*) for the shots and ask the model to generate the best choice character as answer.
6+
- The pre-trained models are evaluated in the standard way by calculating the likelihood of each choice character. For the instruct-aligned models, we use a dialogue prompt (*user/assistant*) for the shots and ask the model to generate the best choice character as answer.
77
#### AGI English
88
- We use the default few-shot and prompt settings as specified [here](https://github.com/ruixiangcui/AGIEval). The score is averaged over the english subtasks.
99
#### CommonSenseQA

0 commit comments

Comments
 (0)