Skip to content

Conversation

@willg-nv
Copy link

@willg-nv willg-nv commented Dec 17, 2025

What does this PR do?

Type of change: new feature

Overview: This PR integrates an automatical QDQ placment tool into ModelOpt.

This PR is the 1/4 parts of the change, it contains the following changes:

  1. Defines common types: Region, RegionType, Error types
  2. Defines InsertionPoints (the logical localtion to place QDQ pairs), InsertionScheme (a set of insertion points)
  3. Unit tests for new types

Part 1: #701
Part 2: #702
Part 3: #703
Part 4: #704

Usage

        # Region type usage:
        region = Region(region_id=1, level=0, region_type=RegionType.LEAF)
        assert region.get_id() == 1
        assert region.get_level() == 0
        region.add_node(1) # 1 is the index of ONNX graph node
        ...

        point = NodeInputInsertionPoint(node_index=0, input_index=2)
        assert point.node_index == 0 # relative node index in region
        assert point.input_index == 2 # relative input tensor index in specific node
        resolved = point.resolve(region, graph)
        ...

Testing

Implement unit tests, all tests could get passed.

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: Yes
  • Did you add or update any necessary documentation?: No, document change will be included in part 4.
  • Did you update Changelog?: No, this could be done when all parts of the change are merged.

Additional Information

Summary by CodeRabbit

  • New Features

    • Added foundational autotuner infrastructure for quantization optimization, including region hierarchies and insertion scheme management.
    • Introduced insertion point system for managing quantize/dequantize operation placement across ONNX graph regions.
    • Added utility functions for tensor consumer mapping and boolean operation identification.
  • Tests

    • Added comprehensive test coverage for autotuner components, insertion points, and region management.

✏️ Tip: You can customize this high-level summary in your review settings.

@willg-nv willg-nv requested a review from a team as a code owner December 17, 2025 06:18
@willg-nv willg-nv requested a review from gcunhase December 17, 2025 06:18
@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 17, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part1 branch from 9c53783 to f872e70 Compare December 19, 2025 05:32
@willg-nv
Copy link
Author

Hi @gcunhase, could you help me review this PR? thanks!

@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part1 branch 5 times, most recently from bbbc98b to 80792fa Compare January 7, 2026 07:12
@ajrasane
Copy link
Contributor

ajrasane commented Jan 7, 2026

LGTM from my side. Will wait for @gcunhase review.

@gcunhase
Copy link
Contributor

gcunhase commented Jan 8, 2026

LGTM, added a few comments, thanks.

@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part1 branch from 80792fa to 66ef3ad Compare January 9, 2026 02:30
@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part1 branch from 66ef3ad to 01b383a Compare January 12, 2026 01:30
@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part1 branch 2 times, most recently from 4545a57 to be965aa Compare January 15, 2026 02:40
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 15, 2026

📝 Walkthrough

Walkthrough

This pull request introduces foundational infrastructure for ONNX quantization autotuning. It adds a boolean operations utility function, establishes core data structures for hierarchical region modeling (Region, RegionType, InsertionScheme), defines insertion point abstractions for pattern-based Q/DQ placement with resolution logic, extends graph utilities, and provides comprehensive test coverage for the new components.

Changes

Cohort / File(s) Summary
Boolean operations utility
modelopt/onnx/op_types.py
Added get_bool_ops() function returning a set of boolean/comparison operation type identifiers for external use.
Autotuner core infrastructure
modelopt/onnx/quantization/autotune/common.py
Introduced Region class with hierarchical parent-child relationships, node/tensor management, metadata storage, and recursive accessors; RegionType enum (LEAF, COMPOSITE, ROOT); InsertionScheme dataclass for Q/DQ insertion configurations with latency/error metadata, hashing, serialization (to_dict/from_dict), and distance computation; exception hierarchy (RegionError, AutotunerError, AutotunerNotInitializedError, InvalidSchemeError).
Insertion point pattern management
modelopt/onnx/quantization/autotune/insertion_points.py
Added abstract InsertionPoint base class with three concrete types (NodeInputInsertionPoint, ChildRegionInputInsertionPoint, RegionOutputInsertionPoint); immutable ResolvedInsertionPoint dataclass for concrete insertion targets; resolution logic to map pattern-relative indices to actual tensor names; collection methods for extracting insertion points from regions; utility functions (skip_invalid_insertion_points, has_quantizable_operations, resolve_region_io_insertion_points, merge_resolved_insertion_points); quantizable and skip operation sets.
Graph utilities enhancement
modelopt/onnx/quantization/graph_utils.py
Added get_tensor_consumer_node_indices() function mapping tensor names to indices of consuming nodes, complementing existing tensor-consumer utilities.
Region functionality tests
tests/unit/onnx/quantization/autotune/test_region.py
Added unit tests validating Region class construction, parent-child relationships, node/tensor management, recursive size computation, metadata storage, hierarchical structures, and child removal.
Insertion points comprehensive tests
tests/unit/onnx/quantization/autotune/test_insertion_points.py
Added extensive test suite covering InsertionPoint types (creation, immutability, equality, hashing, serialization), InsertionScheme behavior (empty/populated schemes, hashing, deserialization), pattern utilities (skip_invalid_insertion_points, has_quantizable_operations), and resolution/collection logic across simple and complex graph topologies including residual blocks and composite regions.

Sequence Diagram

sequenceDiagram
    participant User as Autotuner<br/>Workflow
    participant Region
    participant InsertionPoint as InsertionPoint<br/>Collector
    participant Graph as ONNX Graph
    participant Resolver as InsertionPoint<br/>Resolver
    participant ResolvedIP as ResolvedInsertionPoint

    User->>Region: collect_from_region(region)
    Region->>InsertionPoint: Identify candidate<br/>insertion points
    InsertionPoint->>Graph: Query node inputs,<br/>child regions, outputs
    Graph-->>InsertionPoint: Tensor info returned
    InsertionPoint-->>Region: Insertion points
    
    User->>Resolver: resolve(pattern_relative_point,<br/>region, graph)
    Resolver->>Region: Get region structure<br/>and nodes
    Resolver->>Graph: Map indices to<br/>actual tensor names
    Graph-->>Resolver: Concrete tensors
    Resolver-->>ResolvedIP: Create resolved<br/>insertion points
    ResolvedIP-->>User: Set of concrete<br/>insertion targets
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Integrate Automated QDQ placement tool - Part 1' directly and clearly describes the main objective of the PR, which is to introduce the foundational types and abstractions for an automated QDQ placement system.
Docstring Coverage ✅ Passed Docstring coverage is 98.76% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@modelopt/onnx/op_types.py`:
- Around line 308-339: The set returned by get_bool_ops incorrectly includes the
non-standard ONNX operator "Median" and several non-boolean ops; remove "Median"
from the returned set, and either rename get_bool_ops or update its docstring to
accurately state it returns ops that should be excluded from Q/DQ insertion (not
strictly boolean/comparison ops); update references to the function name or
docstring accordingly and ensure the returned set still contains the intended
exclusion symbols like "Abs", "Max", "Min", "ArgMax", "ArgMin", etc., but no
vendor-specific "Median".

In `@modelopt/onnx/quantization/autotune/insertion_points.py`:
- Around line 777-818: The function merge_resolved_insertion_points can raise a
KeyError when accessing tensor_users_map[tensor_name] for tensors with no
consumers; change that access to use tensor_users_map.get(tensor_name, []) (or
an empty set) and treat empty user lists as non-mergeable so you skip creating a
tensor-level insertion point; ensure all_users is computed from the safe default
(e.g., set(tensor_users_map.get(tensor_name, []))) and leave existing logic for
updating results unchanged.
- Around line 728-775: The function resolve_region_io_insertion_points() can
raise attribute errors if node.inputs contain entries without a .name and may
produce insertion points for consumers that will later be rejected; guard
accesses by checking hasattr(inp, "name") before comparing to tensor_name, and
filter out invalid consumer nodes/insertion points using
skip_invalid_insertion_points() (either by filtering node_indices derived from
tensor_users_map or by applying skip_invalid_insertion_points() to the
resolved_insertion_points set) so only valid ResolvedInsertionPoint objects are
returned; keep the rest of the logic (region-derived nodes and graph-derived
users) intact and ensure you reference resolve_region_io_insertion_points,
tensor_users_map, get_tensor_consumer_node_indices, and
skip_invalid_insertion_points when making the changes.

In `@modelopt/onnx/quantization/graph_utils.py`:
- Around line 305-320: The function get_tensor_consumer_node_indices currently
assumes node.input elements have a .name attribute and will crash for
onnx.GraphProto where node.input entries are plain strings; update the loop that
iterates inputs (the inputs variable derived from nodes via nodes = graph.nodes
if isinstance(graph, gs.Graph) else graph.node and inputs = node.inputs if
isinstance(node, gs.Node) else node.input) to normalize each tensor: if the
input item is a str use it directly as the tensor name, otherwise use
tensor.name for GraphSurgeon tensors; append that resolved name to
tensor_consumer_map instead of unconditionally accessing tensor.name.
🧹 Nitpick comments (8)
tests/unit/onnx/quantization/autotune/test_region.py (1)

74-119: Prefer exercising public APIs vs direct attribute mutation in tests

For IO + metadata, consider using Region.add_input(), Region.add_output(), and Region.set_metadata() / Region.get_metadata() instead of assigning region.inputs, region.outputs, region.metadata directly. This keeps tests stable if internals change.

tests/unit/onnx/quantization/autotune/test_insertion_points.py (2)

364-528: Mock graph builders are OK, but consider a thin real gs.Graph for higher fidelity

MagicMocks work here, but a minimal real GraphSurgeon graph (where feasible) would catch more structural issues (e.g., type/attribute expectations) and reduce reliance on spec behavior.


1207-1313: Strengthen a few vacuous assertions in collect-from tests

Assertions like assert len(result) >= 0 always pass; consider asserting expected non-emptiness (or exact counts) for at least one representative case per collector so these tests actually catch regressions.

modelopt/onnx/quantization/autotune/insertion_points.py (3)

202-247: Consider replacing assert-based bounds checks with explicit exceptions

These are public-ish resolution helpers; assert can be stripped with -O, and failures become less diagnosable. Raising IndexError/ValueError with context would be more robust.


551-620: skip_invalid_insertion_points() doesn’t actually filter output tensors in collect_from_region

RegionOutputInsertionPoint.collect_from_region() passes output tensor names into skip_invalid_insertion_points(), but that helper only matches on node.inputs, so it’s effectively a no-op for outputs (producer node inputs rarely contain its output name). Either remove the call for clarity, or extend filtering to evaluate outputs (e.g., via tensor dtype/shape and/or consumer ops).

Also applies to: 623-707


821-863: Op-set helpers are fine, but consider centralizing in modelopt/onnx/op_types.py

If these sets are expected to grow/standardize across quantization features, centralizing them (or at least typing them as set[str]) would reduce duplication and make semantics clearer.

modelopt/onnx/quantization/autotune/common.py (2)

389-405: Consider clearing the source region after merge.

After merging, the other region retains references to nodes and children that are now also in self. This could lead to unexpected behavior if other is used after the merge.

♻️ Optional: Clear merged region to prevent accidental reuse
     def merge(self, other: "Region") -> None:
         """Merge another region into this one.

         Combines the nodes and children from the other region into this region.
         The other region's children become children of this region, updating
         their parent references accordingly.

         Args:
             other: Region to merge into this one
         """
         if not other:
             return
         # Merge direct nodes
         self.nodes.update(other.nodes)
         # Merge children (updates their parent references)
         for child in other.children:
             self.add_child(child)
+        # Clear merged region to prevent accidental reuse
+        other.nodes.clear()
+        other.children.clear()

606-616: Inconsistent naming between attribute and serialization key.

The attribute is named node_inputs but the serialization key is nodes_insertion_points. This inconsistency could cause maintenance confusion.

♻️ Optional: Align serialization key with attribute name
     def to_dict(self) -> dict[str, Any]:
         """Convert to dictionary for serialization."""
         return {
             "latency_ms": self.latency_ms,
             "error": self.error,
             "profile_timestamp": self.profile_timestamp,
-            "nodes_insertion_points": [pt.to_dict() for pt in self.node_inputs],
+            "node_inputs": [pt.to_dict() for pt in self.node_inputs],
             "child_region_inputs": [pt.to_dict() for pt in self.child_region_inputs],
             "region_outputs": [pt.to_dict() for pt in self.region_outputs],
             "hash": self.hash,
         }

And in from_dict:

         scheme.node_inputs = [
-            NodeInputInsertionPoint.from_dict(pt) for pt in data.get("nodes_insertion_points", [])
+            NodeInputInsertionPoint.from_dict(pt) for pt in data.get("node_inputs", data.get("nodes_insertion_points", []))
         ]

The backward-compatible approach in from_dict supports both old and new key names.

Also applies to: 638-640

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0f05d67 and bd87f1a.

📒 Files selected for processing (6)
  • modelopt/onnx/op_types.py
  • modelopt/onnx/quantization/autotune/common.py
  • modelopt/onnx/quantization/autotune/insertion_points.py
  • modelopt/onnx/quantization/graph_utils.py
  • tests/unit/onnx/quantization/autotune/test_insertion_points.py
  • tests/unit/onnx/quantization/autotune/test_region.py
🧰 Additional context used
🧬 Code graph analysis (4)
tests/unit/onnx/quantization/autotune/test_insertion_points.py (2)
modelopt/onnx/quantization/autotune/insertion_points.py (25)
  • ChildRegionInputInsertionPoint (291-431)
  • NodeInputInsertionPoint (166-287)
  • RegionOutputInsertionPoint (435-620)
  • ResolvedInsertionPoint (121-162)
  • merge_resolved_insertion_points (777-818)
  • resolve_region_io_insertion_points (728-774)
  • skip_invalid_insertion_points (623-706)
  • to_dict (82-84)
  • to_dict (147-153)
  • to_dict (193-195)
  • to_dict (332-334)
  • to_dict (478-484)
  • from_dict (88-90)
  • from_dict (156-162)
  • from_dict (198-200)
  • from_dict (337-350)
  • from_dict (487-500)
  • resolve (93-103)
  • resolve (202-247)
  • resolve (352-388)
  • resolve (502-549)
  • collect_from_region (107-117)
  • collect_from_region (250-287)
  • collect_from_region (391-431)
  • collect_from_region (552-620)
modelopt/onnx/quantization/autotune/common.py (10)
  • InsertionScheme (465-717)
  • Region (88-456)
  • RegionType (75-85)
  • to_dict (606-616)
  • from_dict (619-652)
  • is_empty (540-550)
  • num_node_insertions (575-581)
  • num_region_insertions (584-593)
  • num_region_output_insertions (596-604)
  • hash (504-537)
modelopt/onnx/quantization/autotune/insertion_points.py (3)
modelopt/onnx/quantization/autotune/common.py (10)
  • Region (88-456)
  • to_dict (606-616)
  • from_dict (619-652)
  • get_nodes (262-276)
  • RegionType (75-85)
  • get_type (152-154)
  • get_children (172-186)
  • get_inputs (339-341)
  • get_outputs (343-345)
  • get_region_nodes_and_descendants (278-300)
modelopt/onnx/op_types.py (2)
  • get_bool_ops (308-339)
  • get_copy_ops (99-118)
modelopt/onnx/quantization/graph_utils.py (1)
  • get_tensor_consumer_node_indices (305-320)
tests/unit/onnx/quantization/autotune/test_region.py (1)
modelopt/onnx/quantization/autotune/common.py (14)
  • Region (88-456)
  • RegionType (75-85)
  • get_id (136-138)
  • get_level (144-146)
  • get_type (152-154)
  • add_child (205-234)
  • get_children (172-186)
  • get_parent (164-166)
  • add_node (254-256)
  • get_size (351-357)
  • get_nodes (262-276)
  • get_inputs (339-341)
  • get_outputs (343-345)
  • get_region_nodes_and_descendants (278-300)
modelopt/onnx/quantization/autotune/common.py (1)
modelopt/onnx/quantization/autotune/insertion_points.py (13)
  • ChildRegionInputInsertionPoint (291-431)
  • NodeInputInsertionPoint (166-287)
  • RegionOutputInsertionPoint (435-620)
  • to_dict (82-84)
  • to_dict (147-153)
  • to_dict (193-195)
  • to_dict (332-334)
  • to_dict (478-484)
  • from_dict (88-90)
  • from_dict (156-162)
  • from_dict (198-200)
  • from_dict (337-350)
  • from_dict (487-500)
🔇 Additional comments (20)
tests/unit/onnx/quantization/autotune/test_region.py (2)

31-73: Core Region API coverage looks solid (creation, parenting, node membership)

These tests hit the main Region primitives and relationships in a clear way.


120-167: Hierarchy + removal tests are useful and deterministic

get_region_nodes_and_descendants() and remove_child() coverage here is valuable.

tests/unit/onnx/quantization/autotune/test_insertion_points.py (4)

54-234: InsertionPoint value-object tests (immutability/hash/serde) are thorough

Good coverage for frozen dataclasses and dict round-trips.


236-357: InsertionScheme hashing + serialization coverage looks good

Nice to see order-independence and error/latency fields exercised.


535-935: Utility + resolve/merge tests cover key behaviors

The tests for skip_invalid_insertion_points(), resolve_region_io_insertion_points(), and merge_resolved_insertion_points() provide good confidence in core mechanics.


1019-1092: Confirm intended semantics: child-region input resolution currently captures external consumers too

ChildRegionInputInsertionPoint.resolve() (via resolve_region_io_insertion_points) will include all consumers from tensor_users_map, not just nodes inside the child region (your test_resolve_multiple_children implicitly allows this). Worth double-checking this is the desired boundary semantics.

modelopt/onnx/quantization/autotune/insertion_points.py (3)

65-118: Good abstraction: InsertionPoint ABC makes the API consistent

Clean interface for (de)serialization + resolve/collect across point types.


120-163: ResolvedInsertionPoint as a frozen value-object is a good fit

Helps correctness (hashability) and simplifies set-based merges.


352-388: Verify region-boundary semantics: child input resolution may include nodes outside the child region

Because resolve_region_io_insertion_points() unions region.get_region_nodes_and_descendants() with the global tensor_users_map[tensor_name], ChildRegionInputInsertionPoint.resolve() can return insertion points for external consumers of a child’s input tensor. If the intent is “quantize only inside the child boundary”, consider restricting to region nodes for child-input resolution (or adding a flag controlling whether to include external users).

Also applies to: 728-775

modelopt/onnx/quantization/autotune/common.py (11)

47-51: Circular import dependency is handled correctly.

The top-level import from insertion_points is paired with deferred imports of Region and RegionType inside methods in insertion_points.py, avoiding circular import issues at runtime.


57-73: LGTM!

Clean exception hierarchy with appropriate base classes for categorizing region vs. autotuner errors.


75-86: LGTM!

Clean enum definition with clear documentation of each region type's purpose in the hierarchy.


114-130: LGTM!

Clean initialization with appropriate types for each attribute.


205-234: Robust cycle prevention and re-parenting logic.

The add_child method correctly:

  1. Prevents self-reference
  2. Detects cycles via ancestor traversal
  3. Handles re-parenting by removing from old parent
  4. Prevents duplicate children

329-337: O(n) membership check is acceptable given ordering requirements.

The not in check on lists is O(n), but as discussed in past reviews, inputs/outputs must be indexable for pattern matching. For typical region sizes, this is acceptable.


438-456: Placeholder method should indicate future implementation.

The compute_structural_signature raises NotImplementedError, which is appropriate for a placeholder. The docstring clearly describes the intended behavior for when it's implemented.


503-537: LGTM!

The hash computation is deterministic with sorted insertion points and properly truncated SHA-256. The implementation correctly excludes performance metrics (latency, error) from the identity hash.


561-572: LGTM!

The is_profiled logic correctly identifies schemes that have been measured (either successfully with a finite latency or with an error).


654-707: LGTM!

The symmetric difference approach correctly computes edit distance. This relies on the insertion point classes being frozen dataclasses (and thus hashable), which they are per the relevant code snippets.


709-717: LGTM!

Clean string representation with all relevant debugging information.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part1 branch from bd87f1a to 4826707 Compare January 15, 2026 03:13
"All",
"Any",
"Unique",
"NonZero",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest renaming this to something more appropriate (as well as updating the docstring), as they include various ops: boolean, bitwise, numeric unary, comparison, and conditional and indexing-like ops.

It seems that what they have in common is that they are all logical or mask-style tensor ops, so maybe get_logical_and_mask_ops() could be appropriate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants