Skip to content

Latest commit

 

History

History
9 lines (4 loc) · 431 Bytes

megablocks-code.md

File metadata and controls

9 lines (4 loc) · 431 Bytes

Grouped GEMM: A lightweight library exposing grouped GEMM kernels in PyTorch

Grouped GEMM Naive 实现

megablocks 里有好几种 MoE 实现,如果想实现 dMoE(dropless MoE),就得用 grouped GEMM。需要设置 mlp_impl == 'grouped'

看看 moe-expert-model-parallelism 是怎么做的?只是让每张卡上一个 expert?