Skip to content

Commit

Permalink
clean repo
Browse files Browse the repository at this point in the history
  • Loading branch information
natolambert committed Feb 13, 2024
1 parent f441f9c commit 2fb495e
Show file tree
Hide file tree
Showing 7 changed files with 42 additions and 133 deletions.
58 changes: 34 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,47 +19,57 @@ Add the following to your `.bashrc`:
export HF_TOKEN="{your_token}"
```

### Older instructions
```
pip install requirements.txt
```
# Evaluating Models

If issues, run the following:
Install `fastchat` partially (for `conversation.py`):
```
pip3 install "fschat[model_worker,webui]"
pip install huggingface_hub datasets
```
For reference configs, see `scripts/default_eval_configs.yaml`.
For reference on Chat Templates, many models follow the base / sft model terminology [here](https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py).
A small model for debugging is available at `natolambert/gpt2-dummy-rm`.

The core scripts automatically evaluate our core evaluation set. To run these on [existing preference sets](https://huggingface.co/datasets/allenai/pref-test-sets), add the argument `--pref_sets`.

## Running Reward Models

### Models with chat templates
For reference on Chat Templates, many models follow the base / sft model terminology [here](https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py):
I was debugging with default gpt2, but the random head may be causing numerical stability issues.
Next:
To run individual models with `scripts/run_rm.py`, use any of the following examples:
```
python scripts/run_rm.py --model=openbmb/UltraRM-13b --chat_template=billa --batch_size=8
python scripts/run_rm.py --model=OpenAssistant/oasst-rm-2.1-pythia-1.4b-epoch-2.5 --chat_template=oasst_pythia
python scripts/run_rm.py --model=OpenAssistant/oasst-rm-2-pythia-6.9b-epoch-1 --chat_template=oasst_pythia
python scripts/run_rm.py --model=OpenAssistant/reward-model-deberta-v3-large-v2 --chat_template=raw
python scripts/run_rm.py --model=weqweasdas/hh_rlhf_rm_open_llama_3b --chat_template=Robin
python scripts/run_rm.py --model=llm-blender/PairRM-hf
python scripts/run_rm.py --model=berkeley-nest/Starling-RM-7B-alpha --tokenizer=meta-llama/Llama-2-7b-chat-hf --chat_template=llama-2 --batch_size=16
python scripts/run_rm.py --model=stanfordnlp/SteamSHP-flan-t5-xl --batch_size=32
python scripts/run_rm.py --model=PKU-Alignment/beaver-7b-v1.0-reward --chat_template=pku-align --batch_size=16
python scripts/run_rm.py --model=PKU-Alignment/beaver-7b-v1.0-cost --chat_template=pku-align --batch_size=16
python scripts/run_rm.py --model=IDEA-CCNL/Ziya-LLaMA-7B-Reward --batch_size=32 --trust_remote_code --chat_template=Ziya # custom code causing cuda issues
python scripts/run_rm.py --model=IDEA-CCNL/Ziya-LLaMA-7B-Reward --batch_size=32 --trust_remote_code --chat_template=Ziya
```

To run these models with AI2 infrastructure, run:
```
python scripts/submit_eval_jobs.py
```

## Running DPO Models

And for DPO:
```
python scripts/run_dpo.py --model=stabilityai/stablelm-zephyr-3b --ref_model=stabilityai/stablelm-3b-4e1t --batch_size=32
```

To run with the known test sets rather than our custom subsets, at the arg `--pref_sets`
## Repository structure

```
├── README.md <- The top-level README for researchers using this project
├── analysis/ <- Directory of tools to analyze HERM results or other reward model properties
├── herm/ <- Core utils and modeling files
| ├── models/ ├── Standalone files for running existing reward models
| └── *.py └── HERM tools and utilities
├── scripts/ <- Scripts and configs to train and evaluate reward models
├── tests <- Unit tests
├── Dockerfile <- Build file for reproducible and scaleable research at AI2
├── LICENSE
├── Makefile <- Makefile with commands like `make style`
└── setup.py <- Makes project pip installable (pip install -e .) so `alignment` can be imported
```

## Maitenence

### Updating the docker image (consider removing this section when we publicly release HERM)
When updating this repo, the docker image should be rebuilt to include those changes. For example, if you update `scripts/run_rm.py` and include a new package (or change a package version), you should rebuilt the image and verify it still works on known models.
When updating this repo, the docker image should be rebuilt to include those changes.
For example, if you update `scripts/run_rm.py` and include a new package (or change a package version), you should rebuilt the image and verify it still works on known models.

To update the image, run these commands in the root directory of this repo:
1. `docker built -t <local-image-name> . --platform linux/amd64`
Expand Down
98 changes: 0 additions & 98 deletions requirements.txt

This file was deleted.

6 changes: 6 additions & 0 deletions scripts/configs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Configs for experiments

The following configs are supported:
1. `beaker_eval.yaml`: Config for internal AI tooling to correctly setup compute environment.
2. `eval_configs.yaml`: Configs for models to reproduce results on `run_rm.py`/`run_dpo.py`.
3. [in progress] `training_configs.yaml`: Configs for training reward models.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -4,21 +4,18 @@ openbmb/UltraRM-13b:
tokenizer: 'openbmb/UltraRM-13b'
chat_template: 'billa'
batch_size: 8
direct_load: True
trust_remote_code: False
OpenAssistant/oasst-rm-2.1-pythia-1.4b-epoch-2.5:
model: 'OpenAssistant/oasst-rm-2.1-pythia-1.4b-epoch-2.5'
tokenizer: 'OpenAssistant/oasst-rm-2.1-pythia-1.4b-epoch-2.5'
chat_template: 'oasst_pythia'
batch_size: 64
direct_load: True
trust_remote_code: False
OpenAssistant/oasst-rm-2-pythia-6.9b-epoch-1:
model: 'OpenAssistant/oasst-rm-2-pythia-6.9b-epoch-1'
tokenizer: 'OpenAssistant/oasst-rm-2-pythia-6.9b-epoch-1'
chat_template: 'oasst_pythia'
batch_size: 64
direct_load: True
trust_remote_code: False
OpenAssistant/reward-model-deberta-v3-large-v2:
model: 'OpenAssistant/reward-model-deberta-v3-large-v2'
Expand All @@ -39,40 +36,34 @@ llm-blender/PairRM-hf:
tokenizer: 'llm-blender/PairRM-hf'
chat_template: 'tulu'
batch_size: 64
direct_load: True
trust_remote_code: False
berkeley-nest/Starling-RM-7B-alpha:
model: 'berkeley-nest/Starling-RM-7B-alpha'
tokenizer: 'meta-llama/Llama-2-7b-chat-hf'
chat_template: 'llama-2'
batch_size: 16
direct_load: True
trust_remote_code: False
stanfordnlp/SteamSHP-flan-t5-xl:
model: 'stanfordnlp/SteamSHP-flan-t5-xl'
tokenizer: 'stanfordnlp/SteamSHP-flan-t5-xl'
chat_template: 'tulu'
batch_size: 32
direct_load: True
trust_remote_code: False
PKU-Alignment/beaver-7b-v1.0-reward:
model: 'PKU-Alignment/beaver-7b-v1.0-reward'
tokenizer: 'PKU-Alignment/beaver-7b-v1.0-reward'
chat_template: 'pku-align'
batch_size: 16
direct_load: True
trust_remote_code: False
PKU-Alignment/beaver-7b-v1.0-cost:
model: 'PKU-Alignment/beaver-7b-v1.0-cost'
tokenizer: 'PKU-Alignment/beaver-7b-v1.0-cost'
chat_template: 'pku-align'
batch_size: 16
direct_load: True
trust_remote_code: False
IDEA-CCNL/Ziya-LLaMA-7B-Reward:
model: 'IDEA-CCNL/Ziya-LLaMA-7B-Reward'
tokenizer: 'IDEA-CCNL/Ziya-LLaMA-7B-Reward'
chat_template: 'Ziya'
batch_size: 32
direct_load: True
trust_remote_code: True
File renamed without changes.
4 changes: 2 additions & 2 deletions scripts/submit_eval_jobs.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,9 @@

today = date.today().strftime("%m%d%Y")

with open("beaker_configs/default_eval.yaml", "r") as f:
with open("scripts/configs/beaker_eval.yaml", "r") as f:
d1 = yaml.load(f.read(), Loader=yaml.FullLoader)
with open("scripts/default_eval_configs.yaml", "r") as f:
with open("scripts/configs/eval_configs.yaml", "r") as f:
configs = yaml.load(f.read(), Loader=yaml.FullLoader)
print(configs)

Expand Down

0 comments on commit 2fb495e

Please sign in to comment.