This repository contains the official implementation of the paper:
CoPL: Collaborative Preference Learning for Personalizing LLMs
Youngbin Choi, Seunghyuk Cho, Minjong Lee, MoonJeong Park, Yesong Ko, Jungseul Ok, Dongwoo Kim
EMNLP, 2025
Paper: arXiv:2503.01658
CoPL is a collaborative preference learning framework that learns user preferences and adapts to new users. This project consists of 4 stages:
- Dataset Generation - Generate user preference data
- User Representation Learning - Learn user embeddings
- RM Training - Train reward models
- Unseen Adaptation - Adapt to new users
# Install dependencies from requirements.txt
pip install -r requirements.txt
# Install custom PEFT package with MoLE support
cd peft-main
pip install -e .
cd ..Generate user preference data using the following commands:
# PLM dataset
python data/datagen_plm.py --tokenizer_name google/gemma-2b-it --num_users 10000 --n_context 16 --seed 1111
python data/datagen_plm.py --tokenizer_name google/gemma-2b-it --num_users 10000 --n_context 16 --seed 1111 --AVG
# TLDR dataset
python data/datagen_tldr.py --tokenizer_name google/gemma-2b-it --num_users 10000 --n_context 8 --seed 1111
python data/datagen_tldr.py --tokenizer_name google/gemma-2b-it --num_users 10000 --n_context 8 --seed 1111 --AVG
# UltraFeedback-P dataset
python data/datagen_ufp.py --other_subsets UF-P-2 --tokenizer_name google/gemma-2b --model_name google/gemma-2b --num_users 10000
python data/datagen_ufp.py --other_subsets UF-P-4 --tokenizer_name google/gemma-2b --model_name google/gemma-2b --num_users 10000
python data/datagen_ufp.py --other_subsets UF-P-2 --tokenizer_name google/gemma-2b --model_name google/gemma-2b --num_users 10000 --AVG
python data/datagen_ufp.py --other_subsets UF-P-4 --tokenizer_name google/gemma-2b --model_name google/gemma-2b --num_users 10000 --AVG The data is stored in the following format:
# Data format
{
'user': user_id,
'context': [(positive_item, negative_item), ...],
'context_unseen': [(positive_item, negative_item), ...],
'target': [(positive_item, negative_item), ...],
'user_type': user_type
}Learn user embeddings using the CoPLGCF model:
python train_CoPL_gcf.py \
--data_path dataset/your_data.pkl \
--hidden_dim 512 \
--l 4 \
--num_epoch 100 \
--learning_rate 1e-4 \
--wandbon TrueTrain personalized reward models using user embeddings:
bash scripts/CoPL_rm_train.shOr run directly:
torchrun --nproc_per_node=4 --master_port 4788 train_CoPL_rm.py \
--model_name google/gemma-2b-it \
--data_path dataset/UF-P-2-10000-ALL.pkl \
--user_embeds_path gcf_user_embeds/UF-P-2-ALL-user_emb.pt \
--log_dir logs/CoPL_RM \
--bf16 True \
--per_device_train_batch_size 16 \
--learning_rate 5e-5 \
--lora_r 8 \
--num_experts 8 \
--max_steps 1500 \
--deepspeed scripts/ds_config.jsonPerform adaptation for new users:
python unseen_user_adaptation.py \
--data_path dataset/your_data.pkl \
--unseen_data_path dataset/unseen_data.pkl \
--model_path gcf_models/your_model.pt \
--save_path unseen_embeddings.pt \
--hidden_dim 512 \
--l 4Adaptation Method:
- User embedding generation using 2-hop neighbor information
- Inference based on preference patterns of existing users
- Embedding aggregation through softmax weighted averaging
CoPL/
├── models/
│ ├── CoPL_gcf.py # User representation learning model
│ └── CoPL_rm.py # Reward model
| └── baselines/ # TODO: add baseline models
├── scripts/
│ ├── CoPL_rm_train.sh # RM training script
│ └── ds_config.json # DeepSpeed configuration
├── train_CoPL_gcf.py # User representation learning
├── train_CoPL_rm.py # Reward model training
├── unseen_user_adaptation.py # New user adaptation
Our implementation is inspired by and builds upon the following works:
- MoCLE: Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning. Our MoLE architecture implementation is based on their code.
- VPL-LLM: Understanding Hidden Context in Preference Learning. Our preference learning framework is based on their code.
If you find this work useful for your research, please cite our paper:
@inproceedings{
choi2025copl,
title={CoPL: Collaborative Preference Learning for Personalizing LLMs},
author={Youngbin Choi and Seunghyuk Cho and Minjong Lee and MoonJeong Park and Yesong Ko and Jungseul Ok and Dongwoo Kim},
booktitle={The 2025 Conference on Empirical Methods in Natural Language Processing},
year={2025},
url={https://arxiv.org/abs/2503.01658}
}If you have questions or suggestions about the project, please create an issue.