Open
Description
Grouped GEMM kernels (https://github.com/fanshiqing/grouped_gemm) are used in many MoE models.
I just wander does torchao support FP8 kernels for Grouped GEMM, such like the three commonly used ops:
grouped_gemm.backend.gmm
grouped_gemm.ops.unpermute
grouped_gemm.ops.permute