Skip to content

Conversation

@JYMiracle305
Copy link
Contributor

No description provided.

@JYMiracle305 JYMiracle305 force-pushed the add_1F1B branch 3 times, most recently from 496bbfd to 7108a12 Compare December 16, 2025 14:54
@JYMiracle305 JYMiracle305 force-pushed the add_1F1B branch 2 times, most recently from 3726518 to 9af4751 Compare December 22, 2025 09:04
@JYMiracle305
Copy link
Contributor Author

JYMiracle305 commented Dec 22, 2025

新增超参数virtual_pipeline_parallel(vpp_size),表示PP场景对stage进行虚拟切分的块数,PP场景将模型切分成pp_size * vpp_size块,分配到对应的设备上;重构后统一不同调度策略对上层的接口,构造调度器PipelineParallelScheduler时根据不同策略填充任务Task表,任务表中保存子任务(关联chunk、microbatch和当前属于正/反向),训练时上层调用StepMicroBatches,StepMicroBatches内部遍历任务表。

virtual_pipeline_parallel为1时,调度表示如下:
image

virtual_pipeline_parallel大于1时,调度表示如下:
image

多机训练gpt2,配置 DDP=2,TP=2(SP=ON),PP=2(VPP=2)
image

多机训练LLaMA3,配置 DDP=2,TP=2(SP=ON),PP=2(VPP=2)
image

std::vector<std::shared_ptr<infini_train::Tensor>>
GPT2::Forward(const std::vector<std::shared_ptr<infini_train::Tensor>> &x) {
int pp_rank = nn::parallel::pp_rank;
void GPT2::BuildChunks() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

针对 Transformer 模型的话,BuildChunks 也可以合并,gpt2/llama 仅是一个 pos_emb 的区别,加个 if 判断就可以

@JYMiracle305 JYMiracle305 force-pushed the add_1F1B branch 3 times, most recently from 0f5628b to aeb8ee0 Compare December 25, 2025 04:59
@JYMiracle305 JYMiracle305 force-pushed the add_1F1B branch 2 times, most recently from f8b086c to c22da40 Compare December 26, 2025 03:21
@kilinchange kilinchange merged commit 83d11cc into master Jan 5, 2026
2 checks passed
@kilinchange kilinchange deleted the add_1F1B branch January 5, 2026 02:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants