This repository contains the dataset and codebase for the DeepResearch-9K project. All environment configuration files are stored in the env/ directory.
| Environment | Python | Key Features | Primary Use Case |
|---|---|---|---|
| react_infer_env | 3.10 | OpenAI SDK, Data processing | Inference & Data Modification |
| search | 3.9 | vLLM, Verl, Flash-Attn 2 | Model Training & RL Tasks |
| retrieval | 3.10 | Faiss-GPU, Pyserini, FastAPI | Vector Search & Knowledge Retrieval |
Purpose: Optimized for running basic inference and large-scale data processing/modification scripts.
# Create and activate environment
conda create -n react_infer_env python=3.10.0 -y
conda activate react_infer_env
# Install dependencies from the env folder
conda install --file env/react_infer_requirements.txt
Purpose: Designed for model training (Verl), Reinforcement Learning (RL) tasks, and high-performance inference via vLLM.
# Create and activate environment
conda create -n deepresearch python=3.9 -y
conda activate deepresearch
# Install PyTorch and vLLM
pip install torch==2.4.0 --index-url [https://download.pytorch.org/whl/cu121](https://download.pytorch.org/whl/cu121)
pip install vllm==0.6.3
# Install Verl Framework & Flash Attention 2
pip install -e .
pip install flash-attn --no-build-isolation
pip install wandb
Purpose: Specialized for knowledge retrieval, vector database management, and hosting API services.
# Create and activate environment
conda create -n retriever python=3.10 -y
conda activate retriever
# Install PyTorch and CUDA (Conda recommended for Faiss-GPU compatibility)
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.1 -c pytorch -c nvidia
# Install Faiss-GPU to guarantee efficient RL rollout
conda install -c pytorch -c nvidia faiss-gpu=1.8.0
# Install additional retrieval components
pip install transformers datasets pyserini uvicorn fastapi
You can access the complete dataset and model rollouts on Hugging Face. We provide two versions based on evaluation results:
- Full Dataset: artillerywu/DeepResearch-9K
- Contains 9,000 high-quality samples covering three difficulty levels.
- Hard Subset: artillerywu/DeepResearch-Hard
- A curated subset of 3,974 challenging samples (filtered by
INCORRECTverdicts).
- A curated subset of 3,974 challenging samples (filtered by
Note: After downloading the dataset files, please place them in the data/ directory of the project root
Each sample follows a standardized structure for seamless integration with SFT scripts:
question: The initial user query.difficulty: Difficulty level (1-3).search trajectory: Full reasoning and tool-use rollouts.final answer: The definitive response enclosed within<answer></answer>tags.
We provide optimized scripts for SFT and RL on 3B-parameter base models.
The scripts are compatible with the following models:
- Qwen2.5-3B: Qwen/Qwen2.5-3B
- Llama-3.2-3B: meta-llama/Llama-3.2-3B
The provided scripts are pre-configured to handle the dataset structure. Both the Full Dataset and Hard Subset can be used directly for training without additional preprocessing once placed in the data/ directory.
Launch the training process using the following scripts:
- Llama 3.2:
python sft_llama3b.py - Qwen 2.5:
python sft_qwen3b.py
Our dataset is also designed to support reinforcement learning paradigms. You can initiate RL training either directly from the base models or as a second stage after Supervised Fine-Tuning (SFT).
The training configurations, including reward functions and hyperparameter settings, align with the methodologies described in our paper. To start the process, use the provided bash scripts in the DeepResearch-R1/ directory:
-
PPO Training:
bash DeepResearch-R1/train_ppo.sh -
GRPO Training:
bash DeepResearch-R1/train_grpo.sh
Run the following command to start the inference process:
- Inference:
python DeepResearch-R1/infer.py