PromptCoT 2.0

Scaling Prompt Synthesis for LLM Reasoning

✨ Overview

PromptCoT 2.0 is a principled and scalable framework for prompt synthesis that substantially advances LLM reasoning in both mathematics and programming.

It introduces an EM-style rationale-driven synthesis loop (concept → rationale → problem), enabling the automatic generation of diverse and challenging problems at scale. These synthetic prompts support two complementary training regimes:

Self-Play: the model improves autonomously by learning from verifiable signals (e.g., unit tests for code, boxed answers for math). With this approach, a 30B-A3B self-play model achieves 92.1 on AIME24, 89.8 on AIME25, and 76.7 on HMMT Feb25, as well as 74.2 on LiveCodeBench v5, 71.0 on v6, and 2079 Elo on Codeforces. These results surpass strong open-source baselines (Qwen3-30B-A3B-Thinking) and achieve competitive performance with closed-source leaders such as Gemini 2.5 Pro and OpenAI o3 across math and code.

SFT: a 7B model trained 100% on synthetic data—using prompts synthesized by PromptCoT 2.0 and complete reasoning trajectories distilled from GPT-OSS-120B (medium)—reaches 73.1 on AIME24, 65.6 on AIME25, and 1815 Elo on Codeforces, outperforming counterparts trained on human-written prompts.

Unleash the PromptCoT tide of reasoning!

⚡ Main Results

Self-Play @ Qwen3-30B-A3B-2507-Thinking:

PromptCoT 2.0 demonstrates that large-scale self-play with verifiable signals is effective for advancing LLM reasoning. At 30B scale, self-play achieves performance competitive with closed-source leaders (Gemini 2.5 Pro, OpenAI o3) and surpasses strong open-source baselines.

SFT @ Qwen2.5-7B-Instruct:

PromptCoT 2.0 (7B, SFT) is the first model trained entirely on synthetic prompts with trajectories distilled from GPT-OSS-120B. Unlike OpenCodeReasoning and OpenMathReasoning — both built on human-written prompts — PromptCoT 2.0 achieves stronger performance, highlighting the potential of fully synthetic prompt synthesis as a foundation for reasoning models.

🔮 Releases

[2025/10/26] We release the problem generation recipe (problem_generation.sh), enabling full reproduction of PromptCoT 2.0's scalable synthesis pipeline from concept files.

[2025/09/24] We release PromptCoT 2.0:
the first framework to scale prompt synthesis across both math and programming, enabling 30B self-play competitive with Gemini 2.5 Pro / OpenAI o3, and 7B SFT (100% synthetic prompts) surpassing human-written baselines.

📂 Resources

SFT Data (4.8M fully synthetic prompts + trajectories): PromptCoT-2.0-SFT-4.8M.
SFT Model (7B): PromptCoT-2.0-SFT-7B.
Self-Play Data: PromptCoT-2.0-SelfPlay-30B-11K and PromptCoT-2.0-SelfPlay-4B-48K.
Self-Play Models: PromptCoT-2.0-SelfPlay-30B-A3B and PromptCoT-2.0-SelfPlay-4B.
Problem Generation Model: PromptCoT-2.0-Prompt-Generation-Model.

[2025/05/30] We release PromptCoT-Mamba (🤗 PromptCoT-Mamba-7B):
the first attention-free reasoning model, combining PromptCoT with Mamba-2 to achieve strong math & code performance with constant-memory inference.

[2025/04/11] We release PromptCoT-QwQ-32B and PromptCoT-QwQ-Dataset:
self-play of QwQ-32B using PromptCoT synthetic problems, with dedicated datasets for reproducible training.

[2025/03/07] We release PromptCoT 1.0 (🤗 HF Collection):
the first rationale-driven synthesis pipeline for Olympiad-level math problems, releasing problem generation models, distilled models, and datasets.

Quick Start

git clone https://github.com/inclusionAI/PromptCoT
cd PromptCoT
pip install -r requirements.txt

Configuration

Top-level scripts support loading default configuration values from a local .env file.

Copy .env.example to .env
Edit values (for example MODEL_PATH, N_GPUS, DATA_PATH, OUTPUT_PATH)
Validate your setup:

python validate_config.py

Notes:

Precedence is CLI args > .env > code defaults.
MODEL_PATH / TOKENIZER_PATH can be a local path or a Hugging Face model id; the validator only checks filesystem paths.
Empty strings in .env are treated as "unset" (e.g. DATA_PATH= behaves like not set).
Prefer namespaced environment variables (e.g. SPLIT_MERGE_OUTPUT_PATH, SELF_PLAY_OUTPUT_PATH) to avoid collisions when you run multiple scripts from the same .env.
Some scripts historically used different env var names (e.g. infer_split_merge.py uses N_SPLITS, while infer_self_play.py uses NUM_SPLITS); .env.example documents the mapping and the code includes small fallbacks for these.

To run the lightweight unit tests in this repo:

python -m unittest discover -s tests -v

🧩 Problem Generation (Concept → Rationale → Problem)

We provide a script to synthesize problems from concept files using the PromptCoT 2.0 pipeline.

Concept files: available at xl-zhao/PromptCoT-2.0-Concepts (e.g., PromptCoT-2.0-Concepts/code.jsonl).
Model: set --model_path in the script to your PromptCoT-2.0-Prompt-Generation-Model (see Releases for links).

Make the script executable and run:

chmod +x problem_generation.sh
./problem_generation.sh

Self-Play Pipeline (Code Example)

We illustrate the self-play workflow in the code domain, where unit tests provide verifiable reward signals.

Step 1 — Verifiable Reward Generation (test case construction)
The input .jsonl file must include a "problem" field for each instance, specifying the coding task to be solved.
In each run, a new test case is generated and appended to the "completions" field, progressively enriching the specification.

# Generate 4 rounds of test cases with different seeds
for seed in {0..3}; do
  python test_cases_generation.py \
    --seed $seed \
    --data_path code/prompts_test_cases_${seed}.jsonl \
    --output_path code/prompts_test_cases_$((seed+1)).jsonl \
    --model_path Qwen/Qwen3-32B \
    --n_gpus 4 \
    --temperature 0.6 \
    --max_len 16384 \
    --use_chat_template True
done

Post-process the generated test cases into a structured format:

python test_cases_postprocess.py \
  --input_file code/prompts_test_cases_4.jsonl \
  --output_path code/prompts_test_cases_processed.jsonl

Step 2 — Self-Play Trajectory Collection Using the processed test cases, generate diverse trajectories by sampling across multiple seeds:

for seed in {0..7}; do
  python infer_self_play.py \
    --data_path code/selfplay_${seed}.jsonl \
    --output_path code/selfplay_$((seed+1)).jsonl \
    --model_path Qwen/Qwen3-30B-A3B-Thinking-2507 \
    --trust_remote_code True \
    --n_gpus 8 \
    --num_splits 4 \
    --num_completions 8 \
    --seed $seed \
    --temperature 1.2 \
    --max_len 81920 \
    --use_chat_template True
done

Step 3 — Reward Assignment Evaluate each trajectory against the constructed test cases and assign reward signals automatically:

python self_play_eval.py \
  --data_path code/selfplay_8.jsonl \
  --output_path code/selfplay_verified.jsonl \
  --eval_type code \
  --num_workers 16

Step 4 — Pair Construction Aggregate verified trajectories into chosen vs. rejected pairs for offline self-play training:

python prepare_self_play_data.py \
  --data_path code/selfplay_verified.jsonl \
  --output_path code/selfplay_training.jsonl

SFT Pipeline (Code Example)

We illustrate the SFT workflow in the code domain, using teacher trajectories from GPT-OSS-120B.

Step 1 — Teacher Trajectory Collection Sample teacher responses for each prompt, with one trajectory per problem:

python infer_self_play.py \
  --data_path code/prompts_test_cases_processed.jsonl \
  --output_path code/prompts_trajectories.jsonl \
  --model_path openai/gpt-oss-120b \
  --trust_remote_code True \
  --n_gpus 8 \
  --num_splits 4 \
  --num_completions 1 \
  --seed 0 \
  --temperature 1.0 \
  --max_len 16384 \
  --use_chat_template True

Step 2 — Data Post-Processing Filter incomplete or invalid trajectories, and format them into clean prompt–completion pairs for supervised fine-tuning:

python prepare_sft_data_code.py \
  --data_path code/prompts_trajectories.jsonl \
  --output_path code/sft_training.jsonl \
  --tokenizer_path Qwen/Qwen2.5-7B-Instruct

Benchmark Results Reproduction

We provide scripts to reproduce results for both self-play and SFT models.
For math evaluations, we recommend setting VLLM_USE_V1=0 to ensure reproducibility.

Self-Play Models

30B-A3B (Math)

for dataset in aime24 aime25 hmmt25; do
  python infer_split_merge.py \
    --data_path data/promptcot2_${dataset}_test.jsonl \
    --output_path qwen_evals/30b_a3b/${dataset}.jsonl \
    --model_path /path/to/PromptCoT-2.0-SelfPlay-30B-A3B \
    --n_splits 4 \
    --expected_runs 16 \
    --temperature 0.6 \
    --top_p 0.95 \
    --max_len 81920 \
    --factor 1.75 \
    --original_max_position_embeddings 262144
done

30B-A3B (Code)

# Codeforces
python infer_split_merge.py \
  --data_path data/promptcot2_codeforces_test.jsonl \
  --output_path qwen_evals/30b_a3b/codeforces.jsonl \
  --model_path /path/to/PromptCoT-2.0-SelfPlay-30B-A3B \
  --n_splits 1 \
  --expected_runs 8 \
  --temperature 0.6 \
  --top_p 0.95 \
  --max_len 81920 \
  --factor 1.75 \
  --original_max_position_embeddings 262144

# LiveCodeBench v5 / v6
for dataset in lcb_v5 lcb_v6; do
  python infer_split_merge.py \
    --data_path data/promptcot2_${dataset}_test.jsonl \
    --output_path qwen_evals/30b_a3b/${dataset}.jsonl \
    --model_path /path/to/PromptCoT-2.0-SelfPlay-30B-A3B \
    --n_splits 1 \
    --expected_runs 1 \
    --temperature 0.6 \
    --top_p 0.95 \
    --max_len 81920 \
    --factor 1.75 \
    --original_max_position_embeddings 262144
done

4B (Math)

for dataset in aime24 aime25 hmmt25; do
  python infer_split_merge.py \
    --data_path data/promptcot2_${dataset}_test.jsonl \
    --output_path qwen_evals/4b/${dataset}.jsonl \
    --model_path /path/to/PromptCoT-2.0-SelfPlay-4B \
    --n_splits 8 \
    --expected_runs 16 \
    --temperature 0.6 \
    --top_p 0.95 \
    --max_len 81920 \
    --factor 1.75 \
    --original_max_position_embeddings 262144
done

4B (Code)

# Codeforces
python infer_split_merge.py \
  --data_path data/promptcot2_codeforces_test.jsonl \
  --output_path qwen_evals/4b/codeforces.jsonl \
  --model_path /path/to/PromptCoT-2.0-SelfPlay-4B \
  --n_splits 4 \
  --expected_runs 8 \
  --temperature 0.6 \
  --top_p 0.95 \
  --max_len 81920 \
  --factor 1.75 \
  --original_max_position_embeddings 262144

# LiveCodeBench v5 / v6
for dataset in lcb_v5 lcb_v6; do
  python infer_split_merge.py \
    --data_path data/promptcot2_${dataset}_test.jsonl \
    --output_path qwen_evals/4b/${dataset}.jsonl \
    --model_path /path/to/PromptCoT-2.0-SelfPlay-4B \
    --n_splits 8 \
    --expected_runs 1 \
    --temperature 0.6 \
    --top_p 0.95 \
    --max_len 81920 \
    --factor 1.75 \
    --original_max_position_embeddings 262144
done

SFT Models (7B)

Math

for dataset in aime24 aime25 hmmt25; do
  python infer_split_merge.py \
    --data_path data/promptcot2_${dataset}_test.jsonl \
    --output_path qwen_evals/sft/${dataset}.jsonl \
    --model_path /path/to/PromptCoT-2.0-SFT-7B \
    --n_splits 8 \
    --expected_runs 16 \
    --temperature 0.6 \
    --top_p 0.95 \
    --max_len 81920
done

Code

# Codeforces
python infer_split_merge.py \
  --data_path data/promptcot2_codeforces_test.jsonl \
  --output_path qwen_evals/sft/codeforces.jsonl \
  --model_path /path/to/PromptCoT-2.0-SFT-7B \
  --n_splits 8 \
  --expected_runs 8 \
  --temperature 0.6 \
  --top_p 0.95 \
  --max_len 81920

# LiveCodeBench v5 / v6
for dataset in lcb_v5 lcb_v6; do
  python infer_split_merge.py \
    --data_path data/promptcot2_${dataset}_test.jsonl \
    --output_path qwen_evals/sft/${dataset}.jsonl \
    --model_path /path/to/PromptCoT-2.0-SFT-7B \
    --n_splits 8 \
    --expected_runs 1 \
    --temperature 0.6 \
    --top_p 0.95 \
    --max_len 81920
done

📜 Citation

If you find the PromptCoT series useful, please consider citing our work:

@article{zhao2025promptcot2,
  title     = {PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning},
  author    = {Zhao, Xueliang and Wu, Wei and Guan, Jian and Gong, Zhuocheng and Kong, Lingpeng},
  journal   = {arXiv preprint arXiv:2509.19894},
  year      = {2025},
  url       = {https://arxiv.org/abs/2509.19894}
}

@article{zhao2025scaling,
  title     = {Scaling Reasoning without Attention},
  author    = {Zhao, Xueliang and Wu, Wei and Kong, Lingpeng},
  journal   = {arXiv preprint arXiv:2505.22425},
  year      = {2025},
  url       = {https://arxiv.org/abs/2505.22425}
}

@article{zhao2025promptcot,
  title     = {PromptCoT: Synthesizing Olympiad-Level Problems for Mathematical Reasoning in Large Language Models},
  author    = {Zhao, Xueliang and Wu, Wei and Guan, Jian and Kong, Lingpeng},
  journal   = {arXiv preprint arXiv:2503.02324},
  year      = {2025},
  url       = {https://arxiv.org/abs/2503.02324}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PromptCoT 2.0

✨ Overview

⚡ Main Results

🔮 Releases

Quick Start

Configuration

🧩 Problem Generation (Concept → Rationale → Problem)

Self-Play Pipeline (Code Example)

SFT Pipeline (Code Example)

Benchmark Results Reproduction

📜 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
PromptCoT_1.0		PromptCoT_1.0
PromptCoT_Mamba		PromptCoT_Mamba
assets		assets
data		data
eval		eval
livecodebench_v5_utils		livecodebench_v5_utils
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
deduplicate_problems.py		deduplicate_problems.py
env_config.py		env_config.py
infer_self_play.py		infer_self_play.py
infer_split_merge.py		infer_split_merge.py
prepare_self_play_data.py		prepare_self_play_data.py
prepare_sft_data_code.py		prepare_sft_data_code.py
problem_generation.sh		problem_generation.sh
requirements.txt		requirements.txt
self_play_eval.py		self_play_eval.py
test_cases_generation.py		test_cases_generation.py
test_cases_postprocess.py		test_cases_postprocess.py
utils.py		utils.py
validate_config.py		validate_config.py

License

inclusionAI/PromptCoT

Folders and files

Latest commit

History

Repository files navigation

PromptCoT 2.0

✨ Overview

⚡ Main Results

🔮 Releases

Quick Start

Configuration

🧩 Problem Generation (Concept → Rationale → Problem)

Self-Play Pipeline (Code Example)

SFT Pipeline (Code Example)

Benchmark Results Reproduction

📜 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages