-
Notifications
You must be signed in to change notification settings - Fork 2.3k
docs: Rewrite PEFT integration guide with comprehensive examples #4421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
docs: Rewrite PEFT integration guide with comprehensive examples #4421
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
sergiopaniego
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update!! Super detailed 😄
|
|
||
| The notebooks and scripts in these examples show how to use Low Rank Adaptation (LoRA) to fine-tune models in a memory efficient manner. Most of PEFT methods supported in peft library but note that some PEFT methods such as Prompt tuning are not supported. | ||
| For more information on LoRA, see the [original paper](https://huggingface.co/papers/2106.09685). | ||
| TRL supports [PEFT](https://github.com/huggingface/peft) (Parameter-Efficient Fine-Tuning) methods for memory-efficient model training. PEFT enables fine-tuning large language models by training only a small number of additional parameters while keeping the base model frozen, significantly reducing computational costs and memory requirements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can add somewhere a link to this example notebook: https://github.com/huggingface/trl/blob/main/examples/notebooks/sft_trl_lora_qlora.ipynb
docs/source/peft_integration.md
Outdated
| And if you want to load your model in 8bit precision: | ||
| ## PEFT with Different Trainers | ||
|
|
||
| TRL's trainers support PEFT configurations for various training paradigms. Below are detailed examples for each major trainer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could leverage the usage of
<hfoptions id="command_line">
<hfoption id="SFT">
...
</hfoption>
<hfoption id="DPO">
...
</hfoption>
</hfoptions>
in this section to reduce the number of sections and improve readability.
docs/source/peft_integration.md
Outdated
| config.model_name, | ||
| load_in_8bit=True, | ||
| peft_config=lora_config, | ||
| from datasets import load_dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could focus only on the ideas needed for PEFT and simplify the rest to reduce the snippets.
For example, we could do:
training_args = SFTConfig(
...
)
similar for any part that is not strictly needed for the configuration
|
|
||
|
|
||
|
|
||
| ## Resources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could include here TRL notebooks, TRL examples, and recipes from cookbook (https://huggingface.co/learn/cookbook/index) that leverage PEFT
| dataset = load_dataset("trl-lib/Capybara", split="train") | ||
|
|
||
| # Configure LoRA | ||
| peft_config = LoraConfig( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually have 3 different ways of adding the peft config to the trainer:
- We give the model_name to the Trainer and the peft_config
- We give the model instance and at the peft_config
- We give the peft_model to the trainer directly, preparing it outside, without passing peft_config to the trainer.
We could add these details somewhere.
docs/source/peft_integration.md
Outdated
|
|
||
| TRL's trainers support PEFT configurations for various training paradigms. Below are detailed examples for each major trainer. | ||
|
|
||
| ### Supervised Fine-Tuning (SFT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of subsections, I'd write it with
<hfoptions id="trainer">
<hfoption id="SFT">
```
# Code for SFT
```
</hfoption>
<hfoption id="DPO">
```
Code for DPO
```
</hfoption>
</hfoptions>
docs/source/peft_integration.md
Outdated
| # Training arguments | ||
| training_args = SFTConfig( | ||
| output_dir="./Qwen2-0.5B-SFT-LoRA", | ||
| learning_rate=2.0e-4, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, it is very important that all examples on this page contain an explicit learning rate (corresponding to 10x the trainer's default learning rate). Even better would be a small section explaining why, with a link to https://thinkingmachines.ai/blog/lora/.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this one https://huggingface.co/docs/trl/lora_without_regret!
docs/source/peft_integration.md
Outdated
| #### Full Training (No PEFT) | ||
|
|
||
| ```bash | ||
| python trl/scripts/dpo.py \ | ||
| --model_name_or_path Qwen/Qwen2-0.5B-Instruct \ | ||
| --dataset_name trl-lib/ultrafeedback_binarized \ | ||
| --learning_rate 5.0e-7 \ | ||
| --per_device_train_batch_size 2 \ | ||
| --gradient_accumulation_steps 8 \ | ||
| --output_dir Qwen2-0.5B-DPO | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these "No PEFT" sections are necessary
docs/source/peft_integration.md
Outdated
| ## Troubleshooting | ||
|
|
||
| ### Out of Memory Errors | ||
|
|
||
| If you encounter OOM errors: | ||
|
|
||
| 1. Enable QLoRA: `--load_in_4bit` | ||
| 2. Reduce batch size: `--per_device_train_batch_size 1` | ||
| 3. Increase gradient accumulation: `--gradient_accumulation_steps 16` | ||
| 4. Enable gradient checkpointing: `--gradient_checkpointing` | ||
| 5. Reduce LoRA rank: `--lora_r 8` | ||
| 6. Reduce target modules: `--lora_target_modules q_proj v_proj` | ||
|
|
||
| ### Slow Training | ||
|
|
||
| If training is slow: | ||
|
|
||
| 1. Increase batch size (if memory allows) | ||
| 2. Use Flash Attention 2: `--attn_implementation flash_attention_2` | ||
| 3. Use bf16: `--bf16` | ||
| 4. Reduce gradient checkpointing frequency | ||
|
|
||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most a these are not specific to peft, so I recommend removing this section, and add these elements in reducing_memory_usage.md or speeding_up_training.md (can be done in a follow-up PR)
Addressed Reviewer FeedbackThank you for the detailed review! I've addressed all the comments: ✅ Completed Changes
Already Addressed
All changes committed in cbe38d7. |
This PR addresses Issue huggingface#4376 by completely rewriting the PEFT integration documentation with: - Comprehensive Learning Rate section with table and best practices - Documentation of three PEFT configuration methods - Enhanced Resources section with notebooks, examples, and Cookbook - Updated code examples for SFT, DPO, GRPO, QLoRA, and Prompt Tuning - Removed outdated sections per reviewer feedback - Fixed import ordering and code simplification All reviewer feedback from PR huggingface#4421 has been addressed.
|
@behroozazarkhalili could you review the conflicts? 😄 |
Incorporated content from PR huggingface#4436 (Multi-Adapter RL Training) and NPP section that were added to main after this PR branch was created. Changes: - Added Multi-Adapter RL Training subsection under PPO trainer section - Added Naive Pipeline Parallelism (NPP) subsection under Multi-GPU Training - Maintained consistent formatting with the rewritten documentation style Resolves merge conflict between PR huggingface#4421 complete rewrite and additions from PR huggingface#4436 that were merged to main.
15292f7 to
e09c67c
Compare
Co-authored-by: Quentin Gallouédec <[email protected]>
Co-authored-by: Quentin Gallouédec <[email protected]>
| ### Proximal Policy Optimization (PPO) | ||
|
|
||
| ## Multi-Adapter RL Training | ||
| #### Multi-Adapter RL Training |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this section still true? I can't find references to ppo_adapter_name so I'd suggest reviewing it and removing it if outdated
Co-authored-by: Sergio Paniego Blanco <[email protected]>
Co-authored-by: Sergio Paniego Blanco <[email protected]>
Co-authored-by: Sergio Paniego Blanco <[email protected]>
Co-authored-by: Kashif Rasul <[email protected]>
Co-authored-by: Kashif Rasul <[email protected]>
The ppo_adapter_name parameter documented in the Multi-Adapter RL section does not exist in the current codebase. The compute_reward_score() method handles adapter switching internally using rm_adapter_name and policy_adapter_name set during initialization.
Resolves #4376
This PR completely rewrites the PEFT integration documentation to address the concerns raised in #4376.
Changes
Documentation Structure
All examples have been verified against the current TRL codebase and official scripts.