Remove redundant layer norm operation #67

owmohamm · 2020-12-05T22:47:25Z

In the pre-layernorm version of BERT, the application of layernorm on the embeddings is redundant since it is applied by the first transformer layer as well.

For reference: https://github.com/NVIDIA/Megatron-LM/blob/19301985dd31c8b612095cbad15bd903e8ddd497/megatron/model/language_model.py#L165

owmohamm · 2020-12-08T17:49:15Z

I don't have permissions to merge the pull request. So could someone else from the deepspeed team do that?

RezaYazdaniAminabadi · 2020-12-08T18:03:42Z

Hi Owais,

Thanks again for pointing this possible bug in the deepspeed example. We are discussing this in the team and will merge it soon if there is no accuracy impact!

Thanks,
Reza

Remove redundant layer norm operation

6fdebfb

owmohamm requested review from arashashari, awan-10, cli99, conglongli, eltonzheng, jeffra, minjiaz, niumanar, RezaYazdaniAminabadi, samyam, ShadenSmith and tjruwase as code owners December 5, 2020 22:47

RezaYazdaniAminabadi approved these changes Dec 6, 2020

View reviewed changes

raghavthind2005 approved these changes Apr 30, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove redundant layer norm operation #67

Remove redundant layer norm operation #67

Uh oh!

owmohamm commented Dec 5, 2020

Uh oh!

owmohamm commented Dec 8, 2020

Uh oh!

RezaYazdaniAminabadi commented Dec 8, 2020

Uh oh!

Uh oh!

Remove redundant layer norm operation #67

Are you sure you want to change the base?

Remove redundant layer norm operation #67

Uh oh!

Conversation

owmohamm commented Dec 5, 2020

Uh oh!

owmohamm commented Dec 8, 2020

Uh oh!

RezaYazdaniAminabadi commented Dec 8, 2020

Uh oh!

Uh oh!