Skip to content

Conversation

@nvashutoshd
Copy link
Contributor

This PR explicitly lists out the requirements for calibration and quantization docs.

  1. Lists out the details that must be provided.
  2. Adds clarifications for preview submissions.
  3. Adds details for open submissions as well.

@nvashutoshd nvashutoshd requested a review from a team as a code owner July 22, 2025 15:41
@github-actions
Copy link

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@mrmhodak
Copy link
Contributor

WG Meeting: @nvashutoshd to split Closed from Open.

@psyhtest
Copy link
Contributor

A couple of questions.

  1. What's the timeline for policy adoption? Is it reasonable for this info to be required for the upcoming v5.1 round? We are only a week away from the v5.1 deadline. I imagine a submitter could have already prepared and submitted their package, assuming that a generic description would suffice. Now they may be faced with a choice: disclose more or withdraw.

  2. Not all submitters can be expected to have detailed knowledge of the quantization method they use. For example, vLLM users specifying quantization="fp8" could refer to documentation, which currently says:

In this mode, all Linear modules (except for the final lm_head) have their weights quantized down to FP8_E4M3 precision with a per-tensor scale. Activations have their minimum and maximum values calculated during each forward pass to provide a dynamic per-tensor scale for high accuracy. As a result, latency improvements are limited in this mode.

However, this probably doesn't cover all details requested in this PR.

@psyhtest
Copy link
Contributor

This bit (untouched by this PR) looks weird.

OPEN: Weights and biases must be initialized to the same values for each run,
any quantization scheme is allowed that achieves the desired quality.

@nvashutoshd
Copy link
Contributor Author

nvashutoshd commented Jul 23, 2025

@psyhtest - yes, we agreed its too late for this PR.
We are discussing how to advance this for v6.0

Not all submitters can be expected to have detailed knowledge of the quantization method they use.

I think this a valid case and we will need to think about how to accommodate them while still accomplishing the transparency we're trying with this PR.

The concern also extends to SW-only solutions whose business relies on novel quantization tools/algorithms.

@mrmhodak
Copy link
Contributor

mrmhodak commented Oct 7, 2025

WG: @nvashutoshd to take a look to move this forward

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants