clean repo

allenai · Feb 13, 2024 · 2fb495e · 2fb495e
1 parent f441f9c
commit 2fb495e
Show file tree

Hide file tree

Showing 7 changed files with 42 additions and 133 deletions.
diff --git a/README.md b/README.md
@@ -19,47 +19,57 @@ Add the following to your `.bashrc`:
 export HF_TOKEN="{your_token}"
 ```
 
-### Older instructions
-```
-pip install requirements.txt
-```
+# Evaluating Models
 
-If issues, run the following:
-Install `fastchat` partially (for `conversation.py`):
-```
-pip3 install "fschat[model_worker,webui]"
-pip install huggingface_hub datasets
-```
+For reference configs, see `scripts/default_eval_configs.yaml`.
+For reference on Chat Templates, many models follow the base / sft model terminology [here](https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py).
+A small model for debugging is available at `natolambert/gpt2-dummy-rm`.
+
+The core scripts automatically evaluate our core evaluation set. To run these on [existing preference sets](https://huggingface.co/datasets/allenai/pref-test-sets), add the argument `--pref_sets`.
 
+## Running Reward Models
 
-### Models with chat templates
-For reference on Chat Templates, many models follow the base / sft model terminology [here](https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py):
-I was debugging with default gpt2, but the random head may be causing numerical stability issues.
-Next:
+To run individual models with `scripts/run_rm.py`, use any of the following examples:
 ```
 python scripts/run_rm.py --model=openbmb/UltraRM-13b --chat_template=billa --batch_size=8
 python scripts/run_rm.py --model=OpenAssistant/oasst-rm-2.1-pythia-1.4b-epoch-2.5 --chat_template=oasst_pythia
-python scripts/run_rm.py --model=OpenAssistant/oasst-rm-2-pythia-6.9b-epoch-1 --chat_template=oasst_pythia
-python scripts/run_rm.py --model=OpenAssistant/reward-model-deberta-v3-large-v2 --chat_template=raw
-python scripts/run_rm.py --model=weqweasdas/hh_rlhf_rm_open_llama_3b --chat_template=Robin
-python scripts/run_rm.py --model=llm-blender/PairRM-hf
-python scripts/run_rm.py --model=berkeley-nest/Starling-RM-7B-alpha --tokenizer=meta-llama/Llama-2-7b-chat-hf --chat_template=llama-2 --batch_size=16
-python scripts/run_rm.py --model=stanfordnlp/SteamSHP-flan-t5-xl --batch_size=32
-python scripts/run_rm.py --model=PKU-Alignment/beaver-7b-v1.0-reward --chat_template=pku-align --batch_size=16
 python scripts/run_rm.py --model=PKU-Alignment/beaver-7b-v1.0-cost --chat_template=pku-align --batch_size=16
-python scripts/run_rm.py --model=IDEA-CCNL/Ziya-LLaMA-7B-Reward --batch_size=32 --trust_remote_code --chat_template=Ziya # custom code causing cuda issues
+python scripts/run_rm.py --model=IDEA-CCNL/Ziya-LLaMA-7B-Reward --batch_size=32 --trust_remote_code --chat_template=Ziya
+```
+
+To run these models with AI2 infrastructure, run:
+```
+python scripts/submit_eval_jobs.py
 ```
 
+## Running DPO Models
+
 And for DPO:
 ```
 python scripts/run_dpo.py --model=stabilityai/stablelm-zephyr-3b --ref_model=stabilityai/stablelm-3b-4e1t --batch_size=32
 ```
 
-To run with the known test sets rather than our custom subsets, at the arg `--pref_sets`
+## Repository structure
+
+```
+├── README.md                   <- The top-level README for researchers using this project
+├── analysis/                   <- Directory of tools to analyze HERM results or other reward model properties
+├── herm/                       <- Core utils and modeling files
+|   ├── models/                     ├── Standalone files for running existing reward models
+|   └── *.py                        └── HERM tools and utilities
+├── scripts/                    <- Scripts and configs to train and evaluate reward models
+├── tests                       <- Unit tests
+├── Dockerfile                  <- Build file for reproducible and scaleable research at AI2
+├── LICENSE
+├── Makefile                    <- Makefile with commands like `make style`
+└── setup.py                    <- Makes project pip installable (pip install -e .) so `alignment` can be imported
+```
 
+## Maitenence
 
 ### Updating the docker image (consider removing this section when we publicly release HERM)
-When updating this repo, the docker image should be rebuilt to include those changes. For example, if you update `scripts/run_rm.py` and include a new package (or change a package version), you should rebuilt the image and verify it still works on known models.
+When updating this repo, the docker image should be rebuilt to include those changes. 
+For example, if you update `scripts/run_rm.py` and include a new package (or change a package version), you should rebuilt the image and verify it still works on known models.
 
 To update the image, run these commands in the root directory of this repo:
 1. `docker built -t <local-image-name> . --platform linux/amd64`

diff --git a/requirements.txt b/requirements.txt
diff --git a/scripts/configs/README.md b/scripts/configs/README.md
@@ -0,0 +1,6 @@
+# Configs for experiments
+
+The following configs are supported:
+1. `beaker_eval.yaml`: Config for internal AI tooling to correctly setup compute environment.
+2. `eval_configs.yaml`: Configs for models to reproduce results on `run_rm.py`/`run_dpo.py`.
+3. [in progress] `training_configs.yaml`: Configs for training reward models.
diff --git a/beaker_configs/default_eval.yaml → scripts/configs/beaker_eval.yaml b/beaker_configs/default_eval.yaml → scripts/configs/beaker_eval.yaml
diff --git a/scripts/default_eval_configs.yaml → scripts/configs/eval_configs.yaml b/scripts/default_eval_configs.yaml → scripts/configs/eval_configs.yaml
@@ -4,21 +4,18 @@ openbmb/UltraRM-13b:
   tokenizer: 'openbmb/UltraRM-13b'
   chat_template: 'billa'
   batch_size: 8
-  direct_load: True
   trust_remote_code: False
 OpenAssistant/oasst-rm-2.1-pythia-1.4b-epoch-2.5:
   model: 'OpenAssistant/oasst-rm-2.1-pythia-1.4b-epoch-2.5'
   tokenizer: 'OpenAssistant/oasst-rm-2.1-pythia-1.4b-epoch-2.5'
   chat_template: 'oasst_pythia'
   batch_size: 64
-  direct_load: True
   trust_remote_code: False
 OpenAssistant/oasst-rm-2-pythia-6.9b-epoch-1:
   model: 'OpenAssistant/oasst-rm-2-pythia-6.9b-epoch-1'
   tokenizer: 'OpenAssistant/oasst-rm-2-pythia-6.9b-epoch-1'
   chat_template: 'oasst_pythia'
   batch_size: 64
-  direct_load: True
   trust_remote_code: False
 OpenAssistant/reward-model-deberta-v3-large-v2:
   model: 'OpenAssistant/reward-model-deberta-v3-large-v2'
@@ -39,40 +36,34 @@ llm-blender/PairRM-hf:
   tokenizer: 'llm-blender/PairRM-hf'
   chat_template: 'tulu'
   batch_size: 64
-  direct_load: True
   trust_remote_code: False
 berkeley-nest/Starling-RM-7B-alpha:
   model: 'berkeley-nest/Starling-RM-7B-alpha'
   tokenizer: 'meta-llama/Llama-2-7b-chat-hf'
   chat_template: 'llama-2'
   batch_size: 16
-  direct_load: True
   trust_remote_code: False
 stanfordnlp/SteamSHP-flan-t5-xl:
   model: 'stanfordnlp/SteamSHP-flan-t5-xl'
   tokenizer: 'stanfordnlp/SteamSHP-flan-t5-xl'
   chat_template: 'tulu'
   batch_size: 32
-  direct_load: True
   trust_remote_code: False
 PKU-Alignment/beaver-7b-v1.0-reward:
   model: 'PKU-Alignment/beaver-7b-v1.0-reward'
   tokenizer: 'PKU-Alignment/beaver-7b-v1.0-reward'
   chat_template: 'pku-align'
   batch_size: 16
-  direct_load: True
   trust_remote_code: False
 PKU-Alignment/beaver-7b-v1.0-cost:
   model: 'PKU-Alignment/beaver-7b-v1.0-cost'
   tokenizer: 'PKU-Alignment/beaver-7b-v1.0-cost'
   chat_template: 'pku-align'
   batch_size: 16
-  direct_load: True
   trust_remote_code: False
 IDEA-CCNL/Ziya-LLaMA-7B-Reward:
   model: 'IDEA-CCNL/Ziya-LLaMA-7B-Reward'
   tokenizer: 'IDEA-CCNL/Ziya-LLaMA-7B-Reward'
   chat_template: 'Ziya'
   batch_size: 32
-  direct_load: True
   trust_remote_code: True
diff --git a/scripts/default_training_configs.yaml → scripts/configs/training_configs.yaml b/scripts/default_training_configs.yaml → scripts/configs/training_configs.yaml
diff --git a/scripts/submit_eval_jobs.py b/scripts/submit_eval_jobs.py
@@ -21,9 +21,9 @@
 
 today = date.today().strftime("%m%d%Y")
 
-with open("beaker_configs/default_eval.yaml", "r") as f:
+with open("scripts/configs/beaker_eval.yaml", "r") as f:
     d1 = yaml.load(f.read(), Loader=yaml.FullLoader)
-with open("scripts/default_eval_configs.yaml", "r") as f:
+with open("scripts/configs/eval_configs.yaml", "r") as f:
     configs = yaml.load(f.read(), Loader=yaml.FullLoader)
 print(configs)