Skip to content

debug api scale_inv_min is always zero when using padding or divisibility requirements #2628

@jomitchellnv

Description

@jomitchellnv

The debug_api provides the ability to acquire log information specific to FP8 through FP8TensorStats. However, we noticed that when we add padding (32 for MXFP8 or 16 for NVFP4) to the end of a sequence to make it divisible, the scale_inv_min is always zero.

This is because, there exists a consecutive block of 32 0s, which means that one of your scale_inv values will be 0, and therefore your scale_inv_min will always be zero.

See this slide

Image

Thus, we are always going to have some zeros in that batch, because its a requirement, which makes scale_inv_min a non useful metric for us.

If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:

  • OS version
  • PyTorch version
  • Python version
  • Transformer Engine version
  • CUDA version
  • CUDNN version

Device details

  • GPU model

Additional context

Add any other context about the problem here.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions