Skip to content

Conversation

@byte-deve
Copy link

@byte-deve byte-deve commented Jan 11, 2026

What does this PR do?

Add multi-batch calibration data support for autocast precision conversion. This enhancement allows users to provide multiple batches of calibration data (via a directory of NPZ files or Polygraphy JSON with multiple batches) to aggregate tensor statistics across batches, resulting in more robust precision conversion decisions.

Usage

Single NPZ file (existing behavior)

python -m modelopt.onnx.autocast --onnx_path model.onnx --calibration_data calibration_data.npz --output_path model_fp16.onnx

Directory containing multiple NPZ files for multi-batch calibration (new)

python -m modelopt.onnx.autocast --onnx_path model.onnx --calibration_data calibration_data_dir/ --output_path model_fp16.onnx

Testing

  • Tested with single NPZ file to ensure backward compatibility
  • Tested with directory containing multiple NPZ files for multi-batch calibration
  • Verified that aggregated statistics (absmax, min, max) are correctly computed across batches

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: No
  • Did you add or update any necessary documentation?: Yes
  • Did you update Changelog?: No

Additional Information

Key changes:

  • Added TensorStats dataclass to store aggregated tensor statistics (absmax, min_val, max_val, shape)
  • Updated ReferenceRunner to:
    • Load multiple NPZ files from a directory (_load_inputs_from_npz)
    • Aggregate statistics across batches (_aggregate_tensor_stats)
    • Process multi-batch inference in run() method
  • Updated IORangeRule and DepthOfReductionRule to handle both raw numpy arrays and TensorStats objects
  • Enhanced --calibration_data CLI help text to document multi-batch support

Summary by CodeRabbit

  • New Features

    • Added multi-batch calibration support via directories of NPZ files or Polygraphy JSON files.
    • Implemented cross-batch statistics aggregation for more robust precision conversion decisions.
  • Documentation

    • Expanded calibration_data CLI option guidance with detailed support for multiple input formats and batch processing benefits.

✏️ Tip: You can customize this high-level summary in your review settings.

@byte-deve byte-deve requested a review from a team as a code owner January 11, 2026 10:38
@byte-deve byte-deve requested a review from ajrasane January 11, 2026 10:38
@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 11, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 11, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

The changes introduce multi-batch calibration support to the autocast module. A new TensorStats data structure aggregates tensor statistics (absmax, min, max) across multiple calibration batches. The reference runner now supports directory-based multi-batch inputs and computes aggregated statistics. Node classification rules are enhanced to use these statistics for precision conversion decisions.

Changes

Cohort / File(s) Summary
CLI Documentation
modelopt/onnx/autocast/__main__.py
Expanded help text for --calibration_data option to document support for single NPZ files, directories of NPZ files for multi-batch calibration, and Polygraphy JSON files with multiple batches. Notes aggregation behavior across batches.
Statistics Aggregation Infrastructure
modelopt/onnx/autocast/referencerunner.py
Added TensorStats dataclass to hold aggregated tensor statistics (absmax, min_val, max_val, shape). Implemented _aggregate_tensor_stats() to compute per-tensor aggregations across batches. Extended NPZ loading to support directory input for multi-batch calibration. Refactored run() to collect and aggregate statistics when multiple batches are present.
Rule Enhancement for Multi-Batch Support
modelopt/onnx/autocast/nodeclassifier.py
Enhanced IORangeRule with _get_tensor_stats() helper to compute and cache statistics from both numpy arrays and TensorStats; updated is_io_out_of_range() and _log_skipped() to use aggregated statistics for range violation detection. Extended DepthOfReductionRule._get_tensor_shape() to support TensorStats in addition to raw numpy arrays. Updated logging to display min/max/absmax values.

Sequence Diagram(s)

sequenceDiagram
    participant CLI
    participant ReferenceRunner
    participant TensorStats
    participant NodeClassifier
    
    CLI->>ReferenceRunner: run() with multiple calibration batches
    activate ReferenceRunner
    loop for each batch
        ReferenceRunner->>ReferenceRunner: load batch data (NPZ directory)
        ReferenceRunner->>ReferenceRunner: execute model, collect outputs
    end
    
    ReferenceRunner->>ReferenceRunner: _aggregate_tensor_stats(all_batches)
    ReferenceRunner->>TensorStats: create aggregated statistics<br/>(absmax, min, max per tensor)
    activate TensorStats
    TensorStats-->>ReferenceRunner: TensorStats objects
    deactivate TensorStats
    
    ReferenceRunner->>NodeClassifier: pass aggregated TensorStats
    deactivate ReferenceRunner
    
    activate NodeClassifier
    NodeClassifier->>NodeClassifier: IORangeRule._get_tensor_stats()
    NodeClassifier->>NodeClassifier: DepthOfReductionRule._get_tensor_shape()
    NodeClassifier->>NodeClassifier: evaluate precision conversion rules<br/>using aggregated statistics
    NodeClassifier-->>CLI: precision decisions
    deactivate NodeClassifier
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Support multiple-batch input for autocast calibration' clearly and concisely summarizes the main feature addition across all modified files.
Docstring Coverage ✅ Passed Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
modelopt/onnx/autocast/referencerunner.py (1)

153-170: Directory paths are not handled, breaking multi-batch support.

The _load_inputs method only checks for .json or .npz file extensions. A directory path (e.g., calibration_data_dir/) will fall through to the raise ValueError branch, making the multi-batch directory feature non-functional despite being documented in the CLI help text.

🐛 Proposed fix
+        import os
+
         if inputs is not None:
             if isinstance(inputs, str):
                 if inputs.endswith(".json"):
                     data_loader = self._load_inputs_from_json(inputs)
-                elif inputs.endswith(".npz"):
+                elif inputs.endswith(".npz") or os.path.isdir(inputs):
                     data_loader = self._load_inputs_from_npz(inputs)
                 else:
                     raise ValueError(
-                        f"Invalid input file: {inputs}. Supported input file types: .json (Polygraphy JSON format), "
-                        ".npz (Numpy)"
+                        f"Invalid input file: {inputs}. Supported input types: .json (Polygraphy JSON format), "
+                        ".npz (Numpy), or a directory containing .npz files"
                     )
🧹 Nitpick comments (3)
modelopt/onnx/autocast/referencerunner.py (2)

62-68: Consider using math.prod for the size property.

The manual loop works correctly, but math.prod (Python 3.8+) would be more concise and idiomatic.

♻️ Suggested refactor
+import math
+
 @property
 def size(self):
     """Return total number of elements."""
-    result = 1
-    for dim in self.shape:
-        result *= dim
-    return result
+    return math.prod(self.shape)

199-201: Silently skipping missing tensors may mask data inconsistencies.

If a tensor present in the first batch is missing from subsequent batches, the aggregated statistics will only reflect partial data without any warning. Consider logging a debug message when tensors are skipped.

♻️ Suggested enhancement
             for batch_data in all_batch_data:
                 if name not in batch_data:
+                    logger.debug(f"Tensor '{name}' not found in batch, skipping for aggregation")
                     continue
modelopt/onnx/autocast/nodeclassifier.py (1)

282-288: Redundant isinstance check.

Both branches return ref_data.shape, and both numpy arrays and TensorStats objects have a .shape attribute. The conditional is unnecessary.

♻️ Suggested simplification
         if tensor_name in self.reference_data:
             ref_data = self.reference_data[tensor_name]
-            # Import here to avoid circular imports
-            from modelopt.onnx.autocast.referencerunner import TensorStats
-
-            if isinstance(ref_data, TensorStats):
-                return ref_data.shape
+            # Both numpy arrays and TensorStats have .shape attribute
             return ref_data.shape
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5104513 and b90f689.

📒 Files selected for processing (3)
  • modelopt/onnx/autocast/__main__.py
  • modelopt/onnx/autocast/nodeclassifier.py
  • modelopt/onnx/autocast/referencerunner.py
🧰 Additional context used
🧬 Code graph analysis (1)
modelopt/onnx/autocast/nodeclassifier.py (1)
modelopt/onnx/autocast/referencerunner.py (2)
  • TensorStats (43-68)
  • size (63-68)
🔇 Additional comments (7)
modelopt/onnx/autocast/__main__.py (1)

69-74: LGTM!

The updated help text clearly documents the three supported calibration data formats and explains the multi-batch aggregation behavior.

modelopt/onnx/autocast/referencerunner.py (3)

22-25: LGTM!

Module docstring appropriately updated to reflect multi-batch aggregation behavior.


104-127: LGTM!

The directory loading implementation is well-structured with proper error handling for empty directories and deterministic file ordering via sorting.


276-302: LGTM!

The multi-batch processing logic correctly combines inputs and outputs per batch and delegates to _aggregate_tensor_stats for aggregation. The fallback for exhausted data loaders handles the random input generation case appropriately.

modelopt/onnx/autocast/nodeclassifier.py (3)

152-172: LGTM!

The updated docstring clearly documents support for both single-batch and multi-batch reference data formats, and the new output_stats attribute enables proper logging for TensorStats.


174-197: LGTM!

Clean abstraction that properly handles both TensorStats and numpy arrays. The local import correctly avoids circular dependencies, and edge cases for empty arrays are handled appropriately.


210-228: LGTM!

The refactored is_io_out_of_range function properly uses the abstracted _get_tensor_stats method, providing consistent handling for both single-batch and multi-batch data while maintaining clear debug logging.

@byte-deve byte-deve force-pushed the dev/toyin/5761612_autocast_calibration_with_multi_batch_input branch from 583c9f5 to 6d665cf Compare January 11, 2026 10:52
@galagam galagam requested a review from gcunhase January 11, 2026 11:08
@galagam
Copy link
Contributor

galagam commented Jan 11, 2026

@gcunhase can you please review with the context of https://nvbugspro.nvidia.com/bug/5676209 ?

Copy link
Contributor

@galagam galagam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to make sure I understand the need here:

If, for example, the model accepts (N, 3, 256, 256) where N is the batch size, we can pass calibration_data of size (N, 3, 256, 256) with N>1, and the reference data statistics will take all N examples into account.

However, if the model accepts (1,3,256,256) - that is the batch dim is static, and we want to pass N examples for the calibration - current code doesn't handle it well.
If the input is provided in polygraphy json, it only uses the first example (index 0) and ignores the rest. If the input is provided in npz format - it will fail due to shape mismatch.

@byte-deve Please confirm or correct.

@byte-deve
Copy link
Author

byte-deve commented Jan 12, 2026

I want to make sure I understand the need here:

If, for example, the model accepts (N, 3, 256, 256) where N is the batch size, we can pass calibration_data of size (N, 3, 256, 256) with N>1, and the reference data statistics will take all N examples into account.

However, if the model accepts (1,3,256,256) - that is the batch dim is static, and we want to pass N examples for the calibration - current code doesn't handle it well. If the input is provided in polygraphy json, it only uses the first example (index 0) and ignores the rest. If the input is provided in npz format - it will fail due to shape mismatch.

@byte-deve Please confirm or correct.

@galagam I think you are right. Assuming the model takes (N, 3, 256, 256) static shape input, we can pass 2 or more inputs of (N, 3, 256, 256). The original naming "frame" is possible better to avoid confusion with model batch. For the difference on polygraphy json and npz format, shall I add a test to clarify? Thanks!

@galagam
Copy link
Contributor

galagam commented Jan 12, 2026

I want to make sure I understand the need here:
If, for example, the model accepts (N, 3, 256, 256) where N is the batch size, we can pass calibration_data of size (N, 3, 256, 256) with N>1, and the reference data statistics will take all N examples into account.
However, if the model accepts (1,3,256,256) - that is the batch dim is static, and we want to pass N examples for the calibration - current code doesn't handle it well. If the input is provided in polygraphy json, it only uses the first example (index 0) and ignores the rest. If the input is provided in npz format - it will fail due to shape mismatch.
@byte-deve Please confirm or correct.

@galagam I think you are right. Assuming the model takes (N, 3, 256, 256) static shape input, we can pass 2 or more inputs of (N, 3, 256, 256). The original naming "frame" is possible better to avoid confusion with model batch. For the difference on polygraphy json and npz format, shall I add a test to clarify? Thanks!

If N is a dynamic dimension - you don't need this, right? Because you can pass (N*K, 3, 256, 256).
Only if N is a static dimension, it doesn't have to be 1, but it has to be static.

@galagam
Copy link
Contributor

galagam commented Jan 12, 2026

@galagam I think you are right. Assuming the model takes (N, 3, 256, 256) static shape input, we can pass 2 or more inputs of (N, 3, 256, 256). The original naming "frame" is possible better to avoid confusion with model batch. For the difference on polygraphy json and npz format, shall I add a test to clarify? Thanks!

@byte-deve Yes, please add a test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants