DeepResearch-9K

This repository contains the dataset and codebase for the DeepResearch-9K project. All environment configuration files are stored in the env/ directory.

📊 Environment Overview

Environment	Python	Key Features	Primary Use Case
react_infer_env	3.10	OpenAI SDK, Data processing	Inference & Data Modification
search	3.9	vLLM, Verl, Flash-Attn 2	Model Training & RL Tasks
retrieval	3.10	Faiss-GPU, Pyserini, FastAPI	Vector Search & Knowledge Retrieval

🛠 1. Inference Environment (`react_infer_env`)

Purpose: Optimized for running basic inference and large-scale data processing/modification scripts.

# Create and activate environment
conda create -n react_infer_env python=3.10.0 -y
conda activate react_infer_env

# Install dependencies from the env folder
conda install --file env/react_infer_requirements.txt

🚀 2. Search & Training Environment (`deepresearch`)

Purpose: Designed for model training (Verl), Reinforcement Learning (RL) tasks, and high-performance inference via vLLM.


# Create and activate environment
conda create -n deepresearch python=3.9 -y
conda activate deepresearch

# Install PyTorch and vLLM
pip install torch==2.4.0 --index-url [https://download.pytorch.org/whl/cu121](https://download.pytorch.org/whl/cu121)
pip install vllm==0.6.3

# Install Verl Framework & Flash Attention 2
pip install -e .
pip install flash-attn --no-build-isolation
pip install wandb

🔍 3. Retriever Environment (retriever)

Purpose: Specialized for knowledge retrieval, vector database management, and hosting API services.


# Create and activate environment
conda create -n retriever python=3.10 -y
conda activate retriever

# Install PyTorch and CUDA (Conda recommended for Faiss-GPU compatibility)
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.1 -c pytorch -c nvidia

# Install Faiss-GPU to guarantee efficient RL rollout
conda install -c pytorch -c nvidia faiss-gpu=1.8.0

# Install additional retrieval components
pip install transformers datasets pyserini uvicorn fastapi

📊 Dataset & Rollouts

You can access the complete dataset and model rollouts on Hugging Face. We provide two versions based on evaluation results:

Full Dataset: artillerywu/DeepResearch-9K
- Contains 9,000 high-quality samples covering three difficulty levels.
Hard Subset: artillerywu/DeepResearch-Hard
- A curated subset of 3,974 challenging samples (filtered by INCORRECT verdicts).

Note: After downloading the dataset files, please place them in the data/ directory of the project root

Data Format

Each sample follows a standardized structure for seamless integration with SFT scripts:

question: The initial user query.
difficulty: Difficulty level (1-3).
search trajectory: Full reasoning and tool-use rollouts.
final answer: The definitive response enclosed within <answer></answer> tags.

🚀 Training and Evaluation

We provide optimized scripts for SFT and RL on 3B-parameter base models.

1. Base Models

The scripts are compatible with the following models:

Qwen2.5-3B: Qwen/Qwen2.5-3B
Llama-3.2-3B: meta-llama/Llama-3.2-3B

2. SFT Training

The provided scripts are pre-configured to handle the dataset structure. Both the Full Dataset and Hard Subset can be used directly for training without additional preprocessing once placed in the data/ directory.

Launch the training process using the following scripts:

Llama 3.2: python sft_llama3b.py
Qwen 2.5: python sft_qwen3b.py

3. RL Training

Our dataset is also designed to support reinforcement learning paradigms. You can initiate RL training either directly from the base models or as a second stage after Supervised Fine-Tuning (SFT).

The training configurations, including reward functions and hyperparameter settings, align with the methodologies described in our paper. To start the process, use the provided bash scripts in the DeepResearch-R1/ directory:

PPO Training: bash DeepResearch-R1/train_ppo.sh
GRPO Training: bash DeepResearch-R1/train_grpo.sh

4. Evaluation

Run the following command to start the inference process:

Inference: python DeepResearch-R1/infer.py

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
DeepResearch-R1		DeepResearch-R1
data		data
env		env
.gitignore		.gitignore
README.md		README.md
ds_config.json		ds_config.json
sft_llama3b.py		sft_llama3b.py
sft_qwen3b.py		sft_qwen3b.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepResearch-9K

📊 Environment Overview

🛠 1. Inference Environment (`react_infer_env`)

🚀 2. Search & Training Environment (`deepresearch`)

🔍 3. Retriever Environment (retriever)

📊 Dataset & Rollouts

Data Format

🚀 Training and Evaluation

1. Base Models

2. SFT Training

3. RL Training

4. Evaluation

About

Uh oh!

Releases

Packages

Languages

Applied-Machine-Learning-Lab/DeepResearch-R1

Folders and files

Latest commit

History

Repository files navigation

DeepResearch-9K

📊 Environment Overview

🛠 1. Inference Environment (react_infer_env)

🚀 2. Search & Training Environment (deepresearch)

🔍 3. Retriever Environment (retriever)

📊 Dataset & Rollouts

Data Format

🚀 Training and Evaluation

1. Base Models

2. SFT Training

3. RL Training

4. Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

🛠 1. Inference Environment (`react_infer_env`)

🚀 2. Search & Training Environment (`deepresearch`)

Packages