Skip to content

Commit

Permalink
Align with leaderboard score
Browse files Browse the repository at this point in the history
  • Loading branch information
chrisliu298 committed Sep 6, 2024
1 parent 97ff303 commit 9e37dc6
Showing 1 changed file with 10 additions and 9 deletions.
19 changes: 10 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,14 +35,15 @@ During dataset curation, we adopt several tricks to achieve both performance imp

We evaluate our model on [RewardBench](https://huggingface.co/spaces/allenai/reward-bench) using the [official test script](https://github.com/allenai/reward-bench). As of September 2024, Skywork-Reward-Gemma-2-27B and Skywork-Reward-Llama-3.1-8B rank first and third on the RewardBench leaderboard.

| Rank | Model | Chat | Chat Hard | Safety | Reasoning | Score |
| :---: | --------------------------- | :---: | :-------: | :----: | :-------: | :---: |
| 1 | Skywork-Reward-Gemma-2-27B | 95.8 | 91.4 | 92.0 | 96.1 | 93.8 |
| 2 | SFR-LLaMa-3.1-70B-Judge-r | 96.9 | 84.8 | 92.2 | 97.6 | 92.8 |
| 3 | Skywork-Reward-Llama-3.1-8B | 95.8 | 87.3 | 90.6 | 96.2 | 92.5 |
| 4 | Nemotron-4-340B-Reward | 95.8 | 87.1 | 92.2 | 93.6 | 92.2 |
| 5 | ArmoRM-Llama3-8B-v0.1 | 96.9 | 76.8 | 92.2 | 97.3 | 90.8 |
| 6 | internlm2-20b-reward | 98.9 | 76.5 | 89.9 | 95.8 | 90.3 |
| Rank | Model | Chat | Chat Hard | Safety | Reasoning | Score |
| :---: | ------------------------------- | :---: | :-------: | :----: | :-------: | :---: |
| 1 | Skywork-Reward-Gemma-2-27B | 95.8 | 91.4 | 92.0 | 96.1 | 93.8 |
| 2 | SFR-LLaMa-3.1-70B-Judge-r | 96.9 | 84.8 | 92.2 | 97.6 | 92.8 |
| 3 | Skywork-Reward-Llama-3.1-8B | 95.8 | 87.3 | 90.6 | 96.2 | 92.5 |
| 4 | Nemotron-4-340B-Reward | 95.8 | 87.1 | 92.2 | 93.6 | 92.2 |
| 5 | ArmoRM-Llama3-8B-v0.1 | 96.9 | 76.8 | 92.2 | 97.3 | 90.8 |
| 6 | Salesforce/SFR-nemo-12B-Judge-r | 97.2 | 82.2 | 87.5 | 95.1 | 90.5 |
| 7 | internlm2-20b-reward | 98.9 | 76.5 | 89.9 | 95.8 | 90.3 |

## Demo Code

Expand Down Expand Up @@ -128,4 +129,4 @@ If you find our work helpful, please feel free to cite us using the following Bi
howpublished={\url{https://huggingface.co/Skywork}},
url={https://huggingface.co/Skywork},
}
```
```

0 comments on commit 9e37dc6

Please sign in to comment.