docs: Rewrite PEFT integration guide with comprehensive examples #4421

behroozazarkhalili · 2025-11-02T18:03:46Z

Resolves #4376

This PR completely rewrites the PEFT integration documentation to address the concerns raised in #4376.

Changes

Comprehensive Trainer Coverage: Added detailed examples for SFT, DPO, and GRPO trainers with PEFT
QLoRA Section: Added complete QLoRA guide with 4-bit and 8-bit quantization examples
Prompt Tuning: Added new section on prompt tuning with configuration examples
Updated Examples: All code examples updated to match current TRL API (LoRA r=32, alpha=16)
Removed Outdated Content: Removed PPO-only focus and outdated information
Enhanced Documentation: Added troubleshooting section, multi-GPU training guidance, and command-line arguments reference

Documentation Structure

Installation
Quick Start
PEFT with Different Trainers (SFT, DPO, GRPO)
QLoRA: Quantized Low-Rank Adaptation
Prompt Tuning
Advanced PEFT Configurations
Saving and Loading PEFT Models
Multi-GPU Training
Troubleshooting
Resources

All examples have been verified against the current TRL codebase and official scripts.

HuggingFaceDocBuilderDev · 2025-11-02T18:06:37Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sergiopaniego

Thanks for the update!! Super detailed 😄

sergiopaniego · 2025-11-03T10:43:18Z

docs/source/peft_integration.md


-The notebooks and scripts in these examples show how to use Low Rank Adaptation (LoRA) to fine-tune models in a memory efficient manner. Most of PEFT methods supported in peft library but note that some PEFT methods such as Prompt tuning are not supported.
-For more information on LoRA, see the [original paper](https://huggingface.co/papers/2106.09685).
+TRL supports [PEFT](https://github.com/huggingface/peft) (Parameter-Efficient Fine-Tuning) methods for memory-efficient model training. PEFT enables fine-tuning large language models by training only a small number of additional parameters while keeping the base model frozen, significantly reducing computational costs and memory requirements.


You can add somewhere a link to this example notebook: https://github.com/huggingface/trl/blob/main/examples/notebooks/sft_trl_lora_qlora.ipynb

sergiopaniego · 2025-11-03T10:45:30Z

docs/source/peft_integration.md

-And if you want to load your model in 8bit precision:
+## PEFT with Different Trainers
+
+TRL's trainers support PEFT configurations for various training paradigms. Below are detailed examples for each major trainer.


We could leverage the usage of

<hfoptions id="command_line"> <hfoption id="SFT"> ... </hfoption> <hfoption id="DPO"> ... </hfoption> </hfoptions>

in this section to reduce the number of sections and improve readability.

sergiopaniego · 2025-11-03T10:47:16Z

docs/source/peft_integration.md

-    config.model_name, 
-    load_in_8bit=True,
-    peft_config=lora_config,
+from datasets import load_dataset


We could focus only on the ideas needed for PEFT and simplify the rest to reduce the snippets.

For example, we could do:

training_args = SFTConfig( ... )

similar for any part that is not strictly needed for the configuration

docs/source/peft_integration.md

sergiopaniego · 2025-11-03T10:49:04Z

docs/source/peft_integration.md

+
+
+
+## Resources


We could include here TRL notebooks, TRL examples, and recipes from cookbook (https://huggingface.co/learn/cookbook/index) that leverage PEFT

sergiopaniego · 2025-11-03T10:52:12Z

docs/source/peft_integration.md

+dataset = load_dataset("trl-lib/Capybara", split="train")
+
+# Configure LoRA
+peft_config = LoraConfig(


We actually have 3 different ways of adding the peft config to the trainer:

We give the model_name to the Trainer and the peft_config

We give the model instance and at the peft_config

We give the peft_model to the trainer directly, preparing it outside, without passing peft_config to the trainer.

We could add these details somewhere.

docs/source/peft_integration.md

qgallouedec · 2025-11-03T14:16:23Z

docs/source/peft_integration.md

+
+TRL's trainers support PEFT configurations for various training paradigms. Below are detailed examples for each major trainer.
+
+### Supervised Fine-Tuning (SFT)


instead of subsections, I'd write it with

<hfoptions id="trainer"> <hfoption id="SFT"> ``` # Code for SFT ``` </hfoption> <hfoption id="DPO"> ``` Code for DPO ``` </hfoption> </hfoptions>

qgallouedec · 2025-11-03T14:20:13Z

docs/source/peft_integration.md

+# Training arguments
+training_args = SFTConfig(
+    output_dir="./Qwen2-0.5B-SFT-LoRA",
+    learning_rate=2.0e-4,


In my opinion, it is very important that all examples on this page contain an explicit learning rate (corresponding to 10x the trainer's default learning rate). Even better would be a small section explaining why, with a link to https://thinkingmachines.ai/blog/lora/.

And this one https://huggingface.co/docs/trl/lora_without_regret!

qgallouedec · 2025-11-03T14:21:02Z

docs/source/peft_integration.md

+#### Full Training (No PEFT)
+
+```bash
+python trl/scripts/dpo.py \
+    --model_name_or_path Qwen/Qwen2-0.5B-Instruct \
+    --dataset_name trl-lib/ultrafeedback_binarized \
+    --learning_rate 5.0e-7 \
+    --per_device_train_batch_size 2 \
+    --gradient_accumulation_steps 8 \
+    --output_dir Qwen2-0.5B-DPO
 ```


I don't think these "No PEFT" sections are necessary

docs/source/peft_integration.md

qgallouedec · 2025-11-03T14:29:50Z

docs/source/peft_integration.md

+## Troubleshooting
+
+### Out of Memory Errors
+
+If you encounter OOM errors:
+
+1. Enable QLoRA: `--load_in_4bit`
+2. Reduce batch size: `--per_device_train_batch_size 1`
+3. Increase gradient accumulation: `--gradient_accumulation_steps 16`
+4. Enable gradient checkpointing: `--gradient_checkpointing`
+5. Reduce LoRA rank: `--lora_r 8`
+6. Reduce target modules: `--lora_target_modules q_proj v_proj`
+
+### Slow Training
+
+If training is slow:
+
+1. Increase batch size (if memory allows)
+2. Use Flash Attention 2: `--attn_implementation flash_attention_2`
+3. Use bf16: `--bf16`
+4. Reduce gradient checkpointing frequency
+
+
+


Most a these are not specific to peft, so I recommend removing this section, and add these elements in reducing_memory_usage.md or speeding_up_training.md (can be done in a follow-up PR)

behroozazarkhalili · 2025-11-05T15:02:52Z

Addressed Reviewer Feedback

Thank you for the detailed review! I've addressed all the comments:

✅ Completed Changes

Learning Rates - CRITICAL:
- Added comprehensive "Learning Rate Considerations" section with table showing recommended rates for each trainer
- Added explicit learning_rate parameters to all 6 Python code examples (SFT, DPO, GRPO, QLoRA, Prompt Tuning)
- Included LoRA blog post and LoRA Without Regret documentation references
Three PEFT Configuration Methods:
- Added new section documenting all three approaches:
  - CLI flags (--use_peft)
  - Passing peft_config to trainer (recommended)
  - Applying PEFT to model directly with get_peft_model (advanced)
- Each method includes pros/cons and code examples
Resources Enhancement:
- Reorganized Resources section with categories
- Added TRL notebooks (SFT with LoRA/QLoRA)
- Added TRL examples directory link
- Added TRL Cookbook recipes link
- Added LoRA Without Regret documentation reference
Code Simplification:
- All Python examples now use ellipsis (...) for non-PEFT configuration
- Focus maintained on PEFT-specific parameters
Removed Sections:
- Removed 3 "Full Training (No PEFT)" subsections from SFT, DPO, and GRPO
- Removed entire Troubleshooting section
Import Order:
- Fixed import order in QLoRA example (standard library before third-party)
- Properly grouped and ordered all imports

Already Addressed

Notebook link, hfoptions tabs, SFTTrainer import (already present in initial PR)

All changes committed in cbe38d7.

This PR addresses Issue huggingface#4376 by completely rewriting the PEFT integration documentation with: - Comprehensive Learning Rate section with table and best practices - Documentation of three PEFT configuration methods - Enhanced Resources section with notebooks, examples, and Cookbook - Updated code examples for SFT, DPO, GRPO, QLoRA, and Prompt Tuning - Removed outdated sections per reviewer feedback - Fixed import ordering and code simplification All reviewer feedback from PR huggingface#4421 has been addressed.

sergiopaniego · 2025-11-13T17:34:32Z

@behroozazarkhalili could you review the conflicts? 😄

…gration

Incorporated content from PR huggingface#4436 (Multi-Adapter RL Training) and NPP section that were added to main after this PR branch was created. Changes: - Added Multi-Adapter RL Training subsection under PPO trainer section - Added Naive Pipeline Parallelism (NPP) subsection under Multi-GPU Training - Maintained consistent formatting with the rewritten documentation style Resolves merge conflict between PR huggingface#4421 complete rewrite and additions from PR huggingface#4436 that were merged to main.

Co-authored-by: Quentin Gallouédec <[email protected]>

sergiopaniego · 2025-11-17T14:50:58Z

docs/source/peft_integration.md

+### Proximal Policy Optimization (PPO)

-## Multi-Adapter RL Training
+#### Multi-Adapter RL Training


Is this section still true? I can't find references to ppo_adapter_name so I'd suggest reviewing it and removing it if outdated

docs/source/peft_integration.md

Co-authored-by: Sergio Paniego Blanco <[email protected]>

Co-authored-by: Kashif Rasul <[email protected]>

The ppo_adapter_name parameter documented in the Multi-Adapter RL section does not exist in the current codebase. The compute_reward_score() method handles adapter switching internally using rm_adapter_name and policy_adapter_name set during initialization.

sergiopaniego reviewed Nov 3, 2025

View reviewed changes

qgallouedec reviewed Nov 3, 2025

View reviewed changes

behroozazarkhalili enabled auto-merge (squash) November 5, 2025 15:15

behroozazarkhalili added 2 commits November 5, 2025 07:22

trigger: Force GitHub PR conflict check refresh

20f1d84

behroozazarkhalili added 3 commits November 13, 2025 10:38

Merge remote-tracking branch 'origin/main' into docs/update-peft-inte…

9a1b410

…gration

chore: trigger CI recheck for merge conflict resolution

075f4d3

behroozazarkhalili force-pushed the docs/update-peft-integration branch from 15292f7 to e09c67c Compare November 13, 2025 18:48

sergiopaniego and others added 3 commits November 17, 2025 15:30

Merge branch 'main' into docs/update-peft-integration

1c0cf19

Apply suggestions from code review

6564538

Co-authored-by: Quentin Gallouédec <[email protected]>

Apply suggestion from @qgallouedec

161b5f1

Co-authored-by: Quentin Gallouédec <[email protected]>

sergiopaniego reviewed Nov 17, 2025

View reviewed changes

kashif reviewed Nov 17, 2025

View reviewed changes

docs/source/peft_integration.md Outdated Show resolved Hide resolved

kashif reviewed Nov 17, 2025

View reviewed changes

docs/source/peft_integration.md Outdated Show resolved Hide resolved

behroozazarkhalili and others added 6 commits November 17, 2025 08:28

Update docs/source/peft_integration.md

085d32b

Co-authored-by: Sergio Paniego Blanco <[email protected]>

Update docs/source/peft_integration.md

1226556

Co-authored-by: Sergio Paniego Blanco <[email protected]>

Update docs/source/peft_integration.md

76dadb2

Co-authored-by: Sergio Paniego Blanco <[email protected]>

Update docs/source/peft_integration.md

c32e271

Co-authored-by: Kashif Rasul <[email protected]>

Update docs/source/peft_integration.md

873b486

Co-authored-by: Kashif Rasul <[email protected]>


		TRL's trainers support PEFT configurations for various training paradigms. Below are detailed examples for each major trainer.

		### Supervised Fine-Tuning (SFT)

docs: Rewrite PEFT integration guide with comprehensive examples #4421

Are you sure you want to change the base?

docs: Rewrite PEFT integration guide with comprehensive examples #4421

Uh oh!

Conversation

behroozazarkhalili commented Nov 2, 2025

Changes

Documentation Structure

Uh oh!

HuggingFaceDocBuilderDev commented Nov 2, 2025

Uh oh!

sergiopaniego left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

behroozazarkhalili commented Nov 5, 2025

Addressed Reviewer Feedback

✅ Completed Changes

Already Addressed

Uh oh!

sergiopaniego commented Nov 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants