Skip to content

NVFP4Quantizer::convert_and_update_tensor columnwise_data shape does not match 'expected_shape' from rowwise_data #2607

@skydoorkai

Description

@skydoorkai

Describe the bug

For nvfp4's columnwise data , it is using enforced 2D shape. Thus, the check below would fail if rowwise_data shape is 3D shape:

shape = convert_shape_back_from_fp4(getTensorShape(*columnwise_data), true);
if (rowwise_data) {
  auto expected_shape = convert_shape_back_from_fp4(getTensorShape(*rowwise_data), false);
  NVTE_CHECK(shape == expected_shape, "NVFP4 row-wise data (shape=", expected_shape,
             ") and column-wise data (shape=", shape, ") do not match");
}

RuntimeError: /workspace/bin/TransformerEngine/transformer_engine/pytorch/csrc/quantizer.cpp:1326 in function convert_and_update_tensor: Assertion failed: shape == expected_shape. NVFP4 row-wise data (shape=(256,4,1024)) and column-wise data (shape=(1024,1024)) do not match

To fix it, (1) expected_data should also be enforced into 2D shape. (2) use rowwise_data's shape.

Steps/Code to reproduce bug

Please list minimal steps or code snippet for us to be able to reproduce the bug.

A helpful guide on on how to craft a minimal bug report http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports.

Expected behavior

A clear and concise description of what you expected to happen.

Environment overview (please complete the following information)

  • Environment location: [Bare-metal, Docker, Cloud(specify cloud provider - AWS, Azure, GCP, Collab)]
  • Method of Transformer Engine install: [pip install or from source]. Please specify exact commands you used to install.
  • If method of install is [Docker], provide docker pull & docker run commands used

Environment details

If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:

  • OS version
  • PyTorch version
  • Python version
  • Transformer Engine version
  • CUDA version
  • CUDNN version

Device details

  • GPU model

Additional context

Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions