Replace 1F1B with ZB-H1 #93

QPHutu · 2024-01-22T07:58:04Z

The change is a quick implementation to replace 1F1B with ZB-H1 proposed in Zero Bubble Pipeline Parallelism, which reduces the bubbles in pipeline parallelism.

QPHutu · 2024-01-22T08:08:35Z

The paper been accepted by ICLR 2024.

The key idea is to split the backward computation into two parts, one that computes gradient for the input and another that
computes for the parameters. By rescheduling the parameters' gradient computation, we can have get a better efficiency without scrificing anything.

Dylancer1998 · 2024-01-28T11:36:38Z

May I ask what led you to commit to this repository over the original one? Just curious about your thoughts! @QPHutu

QPHutu · 2024-01-30T07:09:59Z

Thanks for the reply. There are 2 main reasons.

We have one internal team using this repo to train LLM. So to better support their training, we decide to merge this commit to upstream.
We also have plans to merge our new scheduling methods to the original Megatron, not only ZB-H1, but also all other schedulers. However, the whole code changes are quite complicated, so both us and Nvidia want to be careful about that. To make it simpler, we want to push ZB-H1 to the community first.

martinjaggi · 2024-02-07T08:03:22Z

thanks for the PR!

for merging we'd like to understand the impact a bit better. did you verify how model parallel training of the current models supported here (such as llama2) is impacted by your change? (in terms of speed, stability and also verify model behavior is unchanged?)

indeed could be nice to also hear the feedback from the Nvidia/Megatron-LM team if you get a chance

Quick implementation for ZB-H1

baafaf8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replace 1F1B with ZB-H1 #93

Replace 1F1B with ZB-H1 #93

Uh oh!

QPHutu commented Jan 22, 2024

Uh oh!

QPHutu commented Jan 22, 2024

Uh oh!

Dylancer1998 commented Jan 28, 2024

Uh oh!

QPHutu commented Jan 30, 2024

Uh oh!

martinjaggi commented Feb 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Replace 1F1B with ZB-H1 #93

Are you sure you want to change the base?

Replace 1F1B with ZB-H1 #93

Uh oh!

Conversation

QPHutu commented Jan 22, 2024

Uh oh!

QPHutu commented Jan 22, 2024

Uh oh!

Dylancer1998 commented Jan 28, 2024

Uh oh!

QPHutu commented Jan 30, 2024

Uh oh!

martinjaggi commented Feb 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants