Skip to content

Factual Preference Alignment is a research and engineering framework for studying and improving factual alignment in preference-optimized Large Language Models (LLMs).

License

Notifications You must be signed in to change notification settings

VectorInstitute/Factual-Preference-Alignment

Reducing Hallucinations in LLMs via Factuality-Aware Preference Learning

A Modular Training Framework for Factuality-Aware Direct Preference Optimization(F-DPO)

🌐 Website: vectorinstitute.github.io/Factual-Preference-Alignment Β |Β  πŸ“„ Paper: arxiv.org/abs/2601.03027 Β |Β  πŸ“Š Dataset: Hugging Face


🧭 About

Factuality-aware Direct Preference Optimization is a research and engineering framework for studying and improving factual alignment in preference-optimized Large Language Models (LLMs).

The project introduces F-DPO, a factuality-aware extension of Direct Preference Optimization (DPO) that incorporates:

  • Explicit factuality supervision
  • Synthetic hallucination inversion
  • Margin-based factual penalties

The repository provides end-to-end infrastructure for:

  • Dataset construction
  • Multi-model preference fine-tuning
  • Automated factuality evaluation

All components are config-driven, reproducible, and aligned with the Vector Institute AI Engineering Template.


✨ Key Contributions

  • πŸ” Binary factuality supervision integrated into preference learning
  • πŸ§ͺ Synthetic hallucination inversion pairs
  • πŸ“ Ξ”-margin factual penalties for controllable hallucination suppression
  • βš™οΈ Fully config-driven data, training, and evaluation pipelines
  • πŸ“Š Multi-model Γ— multi-Ξ” benchmarking at scale

πŸ“¦ Repository Structure

aixpert/
β”‚
β”œβ”€β”€ src/aixpert/
β”‚   β”œβ”€β”€ config/                  # Central config.yaml
β”‚   β”œβ”€β”€ data_construction/       # 8-stage factual dataset pipeline
β”‚   β”œβ”€β”€ training/                # Original-DPO & F-DPO training
β”‚   β”œβ”€β”€ evaluation/              # GPT-4o-mini judge evaluation
β”‚   └── utils/                   # Shared helpers
β”‚
β”œβ”€β”€ README.md
└── pyproject.toml

🧠 What Is F-DPO?

Standard DPO aligns models to human preferences, but does not explicitly discourage hallucinated yet preferred responses.

F-DPO introduces a factuality-aware margin:

  • Each preference tuple includes (h_w, h_l) factuality indicators
  • A penalty Ξ» is applied when the preferred response is less factual
  • Optimization pressure shifts toward factually correct preferences

➑️ Result: Lower hallucination rates without sacrificing preference alignment


πŸ”¬ Skywork β†’ F-DPO Data Construction Pipeline

This repository contains a complete eight-stage pipeline for converting the Skywork Reward-Preference-80K dataset into balanced, factual-aware DPO datasets.

Pipeline Stages

Stage Description
1 Skywork extraction & de-duplication
2 Preference pair conversion
3 Binary factuality scoring (GPT-4o-mini)
4 Canonical DPO transformation
5 Synthetic hallucination generation
6 Dataset merging
7 Balanced bucket construction
8 Optional preference flipping

All paths and parameters are defined in:

src/aixpert/config/config.yaml

βš™οΈ Configuration-Driven Design

Every component β€” datasets, models, hyperparameters, outputs, and evaluation β€” is controlled via:

src/aixpert/config/config.yaml

Loaded using:

from utils.config_loader import load_config
cfg = load_config()

This enables:

  • Full reproducibility
  • Multi-model automation
  • Zero hard-coded paths

πŸ‹οΈ Training Pipelines

1️⃣ Original-DPO (Baseline)

python -m aixpert.training.run_dpo_training \
  --model "google/gemma-2-9b-it"

Trains standard DPO using Skywork preferences.


2️⃣ F-DPO (Ξ”-Margin Training)

python -m aixpert.training.run_factual_training \
  --model_id "google/gemma-2-9b-it" \
  --short "gemma2-9b" \
  --delta 10

Each Ξ” value produces a separate fine-tuned model.


πŸ“Š Evaluation Pipeline

Evaluation is performed using GPT-4o-mini as an LLM-as-a-Judge.

Metrics

Metric Meaning
factuality Mean factual score
halluc_rate % outputs below threshold
win_rate Ξ”-model vs baseline
count Prompts evaluated

Run evaluation:

python -m aixpert.evaluation.evaluations.run_all_evaluations

Outputs:

eval_results.json

πŸ§ͺ Supported Models

  • Gemma-2 (2B, 9B)
  • Qwen-2.5 / Qwen-3
  • LLaMA-3.x
  • Any TRL-compatible causal LLM

Models are registered centrally in config.yaml.


🧰 Frameworks & Tooling

  • Hugging Face TRL β€” DPO reference implementation
  • Unsloth β€” QLoRA optimization
  • BitsAndBytes β€” 4-bit quantization
  • Flash-Attention-2
  • Weights & Biases β€” experiment tracking
  • Accelerate β€” multi-GPU orchestration

πŸ“š Dataset Attribution & Credits

This project builds upon and extends the Skywork Reward-Preference-80K dataset.

We do not claim ownership of the Skywork dataset. All credit belongs to the original authors.

If you use this repository, please cite Skywork:

@article{liu2024skywork,
  title={Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs},
  author={Liu, Chris Yuhao and Zeng, Liang and Liu, Jiacai and Yan, Rui and He, Jujie and Wang, Chaojie and Yan, Shuicheng and Liu, Yang and Zhou, Yahui},
  journal={arXiv preprint arXiv:2410.18451},
  year={2024}
}

For dataset-related concerns, please contact the Skywork authors via their paper or Hugging Face repository.


πŸ“– Citation (Factuality-aware Direct Preference Optimization)

If you find this code or dataset useful for your research, please consider citing:

@article{FactualAlignment2026,
  title={Reducing Hallucinations in LLMs via Factuality-Aware Preference Learning},
  author={Sindhuja Chaduvula, Ahmed Radwan, Azib Farooq, Yani Ioannou, Shaina Raza},
  journal={arXiv preprint arXiv:2601.03027},
  year={2026}
}

πŸ“¬ Contact

For questions, collaborations, or issues:

  • Open a GitHub Issue
  • Or contact the maintainers via the Vector Institute

⚑ Factuality-aware Direct Preference Optimization promotes in reducing hallucinations and increase factualness


πŸ™ Acknowledgments

Resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute. This research was funded by the European Union’s Horizon Europe research and innovation programme under the AIXPERT project (Grant Agreement No. 101214389), which aims to develop an agentic, multi-layered, GenAI-powered framework for creating explainable, accountable, and transparent AI systems.


We invite researchers and practitioners to build upon this framework.

About

Factual Preference Alignment is a research and engineering framework for studying and improving factual alignment in preference-optimized Large Language Models (LLMs).

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages