You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -79,6 +82,14 @@ python setup_data.py # populates saves/eval with evaluation results of the uploa
79
82
80
83
---
81
84
85
+
### 🔄 Updated TOFU benchmark
86
+
87
+
We've updated Open-Unlearning's TOFU benchmark target models to use a wider variety of newer architectures with sizes varying from 1B to 8B. These include LLaMA 3.2 1B, LLaMA 3.2 3B, LLaMA 3.1 8B, and the original LLaMA-2 7B from [the old version of TOFU](github.com/locuslab/tofu).
88
+
89
+
For each architecture, we have finetuned with four different splits of the TOFU datasets: `full`, `retain90`, `retain95`, `retain99`, for a total of 16 finetuned models. The first serves as the target (base model for unlearning) and the rest are retain models used to measure performance against for each forget split. These models are on [HuggingFace](`https://huggingface.co/collections/open-unlearning/tofu-new-models-67bcf636334ea81727573a9f0`) and the paths to these models can be set in the experimental configs or in command-line overrides.
90
+
91
+
---
92
+
82
93
## 🧪 Running Experiments
83
94
84
95
We provide an easily configurable interface for running evaluations by leveraging Hydra configs. For a more detailed documentation of aspects like running experiments, commonly overriden arguments, interfacing with configurations, distributed training and simple finetuning of models, refer [`docs/experiments.md`](docs/experiments.md).
-`experiment`-Path to the evaluation configuration [`configs/experiment/eval/tofu/default.yaml`](configs/experiment/eval/tofu/default.yaml).
121
+
-`experiment`-Path to the evaluation configuration [`configs/experiment/eval/tofu/default.yaml`](configs/experiment/eval/tofu/default.yaml).
111
122
-`model`- Sets up the model and tokenizer configs for the `Llama-3.2-1B-Instruct` model.
112
123
-`model.model_args.pretrained_model_name_or_path`- Overrides the default experiment config to evaluate a model from a HuggingFace ID (can use a local model checkpoint path as well).
113
124
114
125
For more details about creating and running evaluations, refer [`docs/evaluation.md`](docs/evaluation.md).
115
126
127
+
116
128
### 📜 Running Baseline Experiments
117
129
The scripts below execute standard baseline unlearning experiments on the TOFU and MUSE datasets, evaluated using their corresponding benchmarks. The expected results for these are in [`docs/results.md`](docs/results.md).
118
130
@@ -130,7 +142,7 @@ Adding a new component (trainer, evaluation metric, benchmark, model, or dataset
130
142
Please feel free to raise a pull request for any new features after setting up the environment in development mode.
131
143
132
144
```bash
133
-
pip install .[flash-attn, dev]
145
+
pip install .[dev]
134
146
```
135
147
136
148
## 📚 Further Documentation
@@ -152,11 +164,7 @@ Developed and maintained by Vineeth Dorna ([@Dornavineeth](https://github.com/Do
152
164
153
165
If you encounter any issues or have questions, feel free to raise an issue in the repository 🛠️.
154
166
155
-
## 📝 Citation
156
-
157
-
This repo is inspired from [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory). We acknowledge the [TOFU](https://github.com/locuslab/tofu) and [MUSE](https://github.com/jaechan-repo/muse_bench) benchmarks, which served as the foundation for our re-implementation.
158
-
159
-
---
167
+
## 📝 Citing this work
160
168
161
169
If you use OpenUnlearning in your research, please cite:
162
170
@@ -176,7 +184,7 @@ If you use OpenUnlearning in your research, please cite:
176
184
}
177
185
```
178
186
<details>
179
-
<summary>To cite other benchmarks used from OpenUnlearning</summary>
187
+
<summary>Expand for bibtex to cite other benchmarks used from OpenUnlearning</summary>
180
188
181
189
```bibtex
182
190
@article{shi2024muse,
@@ -188,8 +196,14 @@ If you use OpenUnlearning in your research, please cite:
188
196
```
189
197
</details>
190
198
199
+
---
200
+
201
+
### 🤝 Acknowledgments
202
+
203
+
- This repo is inspired from [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory).
204
+
- The [TOFU](https://github.com/locuslab/tofu) and [MUSE](https://github.com/jaechan-repo/muse_bench) benchmarks served as the foundation for our re-implementation.
191
205
192
206
---
193
207
194
-
## 📄 License
208
+
###📄 License
195
209
This project is licensed under the MIT License. See the [`LICENSE`](LICENSE) file for details.
Copy file name to clipboardexpand all lines: docs/components.md
+8-3
Original file line number
Diff line number
Diff line change
@@ -142,10 +142,10 @@ A benchmark, aggregates various evaluation metrics into a suite, e.g. TOFU, MUSE
142
142
143
143
## Model
144
144
145
-
To add a new model:
145
+
To add a new model architecture:
146
146
147
147
### Implement and register a handler
148
-
For all the models currently supported, HuggingFace's `AutoModelForCausalLM` and `AutoTokenizer` are used, and therefore the user doesn't need to add or register any handler.
148
+
For all the models currently supported, HuggingFace's `AutoModelForCausalLM` and `AutoTokenizer` are used, and therefore the user doesn't need to create or register any handler.
149
149
150
150
__Note__: Currently, we do not support loading models modified with LoRA and related variants. If you wish use such features, please create define and register model handlers for this logic in [`src/model`](../src/model) and provide the config info as discussed next.
151
151
@@ -233,7 +233,12 @@ defaults: # load pre-defined configs for model, trainer, data format, datasets e
233
233
- override /eval: tofu
234
234
235
235
# Now, we have to further modify specific arguments from the defaults imported above
236
-
# This enables to easily run multiple experiments varying hyper paramters, data splits, models etc
236
+
# This enables easily running multiple experiments varying hyper paramters, data splits, models etc
237
+
238
+
model:
239
+
model_args: # use our finetuned target models for the TOFU benchmark task
-`--config-name=eval.yaml`-sets task to be [`configs/eval.yaml`](../configs/eval.yaml)
22
-
-`experiment=eval/tofu/default`-set experiment to use [`configs/eval/tofu/default.yaml`](../configs/eval/tofu/default.yaml)
23
-
-`model=Llama-3.2-3B-Instruct`-override the default (`Llama-3.2-1B-Instruct`) model config to use [`configs/model/Llama-3.2-3B-Instruct`](../configs/model/Phi-3.5-mini-instruct.yaml).
21
+
-`--config-name=eval.yaml`-sets task to be [`configs/eval.yaml`](../configs/eval.yaml)
22
+
-`experiment=eval/tofu/default`-set experiment to use [`configs/eval/tofu/default.yaml`](../configs/eval/tofu/default.yaml)
23
+
-`model=Llama-3.2-3B-Instruct`-override the default (`Llama-3.2-1B-Instruct`) model config to use [`configs/model/Llama-3.2-3B-Instruct`](../configs/model/Phi-3.5-mini-instruct.yaml).
24
24
25
25
26
26
Run the MUSE-Books benchmark evaluation on a checkpoint of a Phi-3.5 model:
**Note:** Evaluation runs are designed to work only a single GPU (this includes running evaluation during training). To run an evaluation job, modify your command to make only one GPU visible (assuming one GPU is enough for inference):
Copy file name to clipboardexpand all lines: docs/hydra.md
+11-8
Original file line number
Diff line number
Diff line change
@@ -14,6 +14,7 @@ defaults:
14
14
# , setting up data structure for loading data during unlearning
15
15
- override /eval: muse # loads MUSE evaluation suite from eval/muse.yaml into the eval attribute
16
16
17
+
# define variables
17
18
data_split: News
18
19
forget_split: forget
19
20
retain_split: retain1
@@ -54,23 +55,25 @@ trainer:
54
55
# optim: paged_adamw_32bit
55
56
# optim: adamw_torch
56
57
57
-
task_name: ???
58
+
task_name: ???# ??? raises and error if this attribute is not set
58
59
```
60
+
- **Structure & Attribute Access:** Configs are written in YAML and structured hierarchically like a dictionary. Attributes are accessed using dot notation: In code `cfg.model.args.learning_rate`, in command-line: `model.args.learning_rate=1e-5`.
59
61
60
-
- **Defaults & Overrides:** Base configurations are overridden using the `defaults` list.
62
+
- **Defaults & Overrides:** Configs are files are included in one another using `defaults` and `override` commands.
63
+
64
+
- **Command-Line Overrides:** Any parameter can be overridden directly from the command line. For instance:
- **Package Directives:** The `# @package` directive organizes configurations into namespaces for cleaner composition and specifies the configuration path. At the head of a YAML file, you might see directives like `# @package _global_` or more specific ones such as `# @package eval.muse.metrics.forget_knowmem_ROUGE` which inform Hydra exactly where the configuration parameters should be placed within the final composed config.
63
72
64
73
For example, refer [`configs/eval/muse_metrics/forget_knowmem_ROUGE.yaml`](../configs/eval/muse_metrics/forget_knowmem_ROUGE.yaml)
65
74
66
75
- **Variable Substitution:** Variables are defined once and reused using the `${}` syntax:
67
76
68
-
- **Command-Line Overrides:** Any parameter can be overridden directly from the command line. For instance:
To understand the structure of an evaluation config and the available parameters for overriding, refer to: [`configs/experiment/examples/tofu_eval.yaml`](../configs/experiment/examples/tofu_eval.yaml).
Copy file name to clipboardexpand all lines: docs/results.md
+8-7
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@
4
4
5
5
</div>
6
6
7
-
The scripts below execute standard baseline unlearning experiments on the TOFU and MUSE datasets, evaluated using their corresponding benchmarks.
7
+
The scripts below execute standard baseline unlearning experiments on the TOFU and MUSE datasets, evaluated using their corresponding benchmarks.
8
8
```bash
9
9
bash scripts/tofu_unlearn.sh
10
10
bash scripts/muse_unlearn.sh
@@ -27,7 +27,8 @@ __Note:__
27
27
2. NPO in MUSE: for NPO, the MUSE implementation is inconsistent with the [original paper](https://github.com/licong-lin/negative-preference-optimization) as discussed [here](https://github.com/jaechan-repo/muse_bench/issues/2). This inconsistency is carried over into implementations like [SimNPO](https://github.com/OPTML-Group/Unlearn-Simple/issues/5). Here, we use the original NPO implementation with the same loss function expression across datasets.
28
28
29
29
30
-
### TOFU unlearning on `Llama-2-7b-hf-chat`
30
+
31
+
### TOFU unlearning on the `Llama-2-7b-hf-chat` architecture
31
32
32
33
<divstyle="overflow-x: auto; max-width: 100%;"t>
33
34
<tableclass="dataframe">
@@ -144,7 +145,7 @@ __Note:__
144
145
</div>
145
146
146
147
147
-
### TOFU unlearning on `Llama-3.2-1B-Instruct`
148
+
### TOFU unlearning on the `Llama-3.2-1B-Instruct` architecture
148
149
149
150
<divstyle="overflow-x: auto; max-width: 100%;">
150
151
<tableclass="dataframe">
@@ -261,7 +262,7 @@ __Note:__
261
262
</div>
262
263
263
264
264
-
### MUSE unlearning on `Llama-2-7b-hf`
265
+
### MUSE unlearning on the benchmark's target models
0 commit comments