Skip to content

This is our repository for the training code on the DeepResearch-9K dataset.

Notifications You must be signed in to change notification settings

Applied-Machine-Learning-Lab/DeepResearch-R1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepResearch-9K

This repository contains the dataset and codebase for the DeepResearch-9K project. All environment configuration files are stored in the env/ directory.


📊 Environment Overview

Environment Python Key Features Primary Use Case
react_infer_env 3.10 OpenAI SDK, Data processing Inference & Data Modification
search 3.9 vLLM, Verl, Flash-Attn 2 Model Training & RL Tasks
retrieval 3.10 Faiss-GPU, Pyserini, FastAPI Vector Search & Knowledge Retrieval

🛠 1. Inference Environment (react_infer_env)

Purpose: Optimized for running basic inference and large-scale data processing/modification scripts.

# Create and activate environment
conda create -n react_infer_env python=3.10.0 -y
conda activate react_infer_env

# Install dependencies from the env folder
conda install --file env/react_infer_requirements.txt

🚀 2. Search & Training Environment (deepresearch)

Purpose: Designed for model training (Verl), Reinforcement Learning (RL) tasks, and high-performance inference via vLLM.


# Create and activate environment
conda create -n deepresearch python=3.9 -y
conda activate deepresearch

# Install PyTorch and vLLM
pip install torch==2.4.0 --index-url [https://download.pytorch.org/whl/cu121](https://download.pytorch.org/whl/cu121)
pip install vllm==0.6.3

# Install Verl Framework & Flash Attention 2
pip install -e .
pip install flash-attn --no-build-isolation
pip install wandb

🔍 3. Retriever Environment (retriever)

Purpose: Specialized for knowledge retrieval, vector database management, and hosting API services.


# Create and activate environment
conda create -n retriever python=3.10 -y
conda activate retriever

# Install PyTorch and CUDA (Conda recommended for Faiss-GPU compatibility)
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.1 -c pytorch -c nvidia

# Install Faiss-GPU to guarantee efficient RL rollout
conda install -c pytorch -c nvidia faiss-gpu=1.8.0

# Install additional retrieval components
pip install transformers datasets pyserini uvicorn fastapi

📊 Dataset & Rollouts

You can access the complete dataset and model rollouts on Hugging Face. We provide two versions based on evaluation results:

Note: After downloading the dataset files, please place them in the data/ directory of the project root

Data Format

Each sample follows a standardized structure for seamless integration with SFT scripts:

  • question: The initial user query.
  • difficulty: Difficulty level (1-3).
  • search trajectory: Full reasoning and tool-use rollouts.
  • final answer: The definitive response enclosed within <answer></answer> tags.

🚀 Training and Evaluation

We provide optimized scripts for SFT and RL on 3B-parameter base models.

1. Base Models

The scripts are compatible with the following models:

2. SFT Training

The provided scripts are pre-configured to handle the dataset structure. Both the Full Dataset and Hard Subset can be used directly for training without additional preprocessing once placed in the data/ directory.

Launch the training process using the following scripts:

  • Llama 3.2: python sft_llama3b.py
  • Qwen 2.5: python sft_qwen3b.py

3. RL Training

Our dataset is also designed to support reinforcement learning paradigms. You can initiate RL training either directly from the base models or as a second stage after Supervised Fine-Tuning (SFT).

The training configurations, including reward functions and hyperparameter settings, align with the methodologies described in our paper. To start the process, use the provided bash scripts in the DeepResearch-R1/ directory:

  • PPO Training: bash DeepResearch-R1/train_ppo.sh

  • GRPO Training: bash DeepResearch-R1/train_grpo.sh

4. Evaluation

Run the following command to start the inference process:

  • Inference: python DeepResearch-R1/infer.py

About

This is our repository for the training code on the DeepResearch-9K dataset.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published