-
Notifications
You must be signed in to change notification settings - Fork 237
Integrate Automated QDQ placement tool - Part 1 #701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Integrate Automated QDQ placement tool - Part 1 #701
Conversation
9c53783 to
f872e70
Compare
|
Hi @gcunhase, could you help me review this PR? thanks! |
bbbc98b to
80792fa
Compare
|
LGTM from my side. Will wait for @gcunhase review. |
|
LGTM, added a few comments, thanks. |
80792fa to
66ef3ad
Compare
66ef3ad to
01b383a
Compare
4545a57 to
be965aa
Compare
📝 WalkthroughWalkthroughThis pull request introduces foundational infrastructure for ONNX quantization autotuning. It adds a boolean operations utility function, establishes core data structures for hierarchical region modeling (Region, RegionType, InsertionScheme), defines insertion point abstractions for pattern-based Q/DQ placement with resolution logic, extends graph utilities, and provides comprehensive test coverage for the new components. Changes
Sequence DiagramsequenceDiagram
participant User as Autotuner<br/>Workflow
participant Region
participant InsertionPoint as InsertionPoint<br/>Collector
participant Graph as ONNX Graph
participant Resolver as InsertionPoint<br/>Resolver
participant ResolvedIP as ResolvedInsertionPoint
User->>Region: collect_from_region(region)
Region->>InsertionPoint: Identify candidate<br/>insertion points
InsertionPoint->>Graph: Query node inputs,<br/>child regions, outputs
Graph-->>InsertionPoint: Tensor info returned
InsertionPoint-->>Region: Insertion points
User->>Resolver: resolve(pattern_relative_point,<br/>region, graph)
Resolver->>Region: Get region structure<br/>and nodes
Resolver->>Graph: Map indices to<br/>actual tensor names
Graph-->>Resolver: Concrete tensors
Resolver-->>ResolvedIP: Create resolved<br/>insertion points
ResolvedIP-->>User: Set of concrete<br/>insertion targets
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@modelopt/onnx/op_types.py`:
- Around line 308-339: The set returned by get_bool_ops incorrectly includes the
non-standard ONNX operator "Median" and several non-boolean ops; remove "Median"
from the returned set, and either rename get_bool_ops or update its docstring to
accurately state it returns ops that should be excluded from Q/DQ insertion (not
strictly boolean/comparison ops); update references to the function name or
docstring accordingly and ensure the returned set still contains the intended
exclusion symbols like "Abs", "Max", "Min", "ArgMax", "ArgMin", etc., but no
vendor-specific "Median".
In `@modelopt/onnx/quantization/autotune/insertion_points.py`:
- Around line 777-818: The function merge_resolved_insertion_points can raise a
KeyError when accessing tensor_users_map[tensor_name] for tensors with no
consumers; change that access to use tensor_users_map.get(tensor_name, []) (or
an empty set) and treat empty user lists as non-mergeable so you skip creating a
tensor-level insertion point; ensure all_users is computed from the safe default
(e.g., set(tensor_users_map.get(tensor_name, []))) and leave existing logic for
updating results unchanged.
- Around line 728-775: The function resolve_region_io_insertion_points() can
raise attribute errors if node.inputs contain entries without a .name and may
produce insertion points for consumers that will later be rejected; guard
accesses by checking hasattr(inp, "name") before comparing to tensor_name, and
filter out invalid consumer nodes/insertion points using
skip_invalid_insertion_points() (either by filtering node_indices derived from
tensor_users_map or by applying skip_invalid_insertion_points() to the
resolved_insertion_points set) so only valid ResolvedInsertionPoint objects are
returned; keep the rest of the logic (region-derived nodes and graph-derived
users) intact and ensure you reference resolve_region_io_insertion_points,
tensor_users_map, get_tensor_consumer_node_indices, and
skip_invalid_insertion_points when making the changes.
In `@modelopt/onnx/quantization/graph_utils.py`:
- Around line 305-320: The function get_tensor_consumer_node_indices currently
assumes node.input elements have a .name attribute and will crash for
onnx.GraphProto where node.input entries are plain strings; update the loop that
iterates inputs (the inputs variable derived from nodes via nodes = graph.nodes
if isinstance(graph, gs.Graph) else graph.node and inputs = node.inputs if
isinstance(node, gs.Node) else node.input) to normalize each tensor: if the
input item is a str use it directly as the tensor name, otherwise use
tensor.name for GraphSurgeon tensors; append that resolved name to
tensor_consumer_map instead of unconditionally accessing tensor.name.
🧹 Nitpick comments (8)
tests/unit/onnx/quantization/autotune/test_region.py (1)
74-119: Prefer exercising public APIs vs direct attribute mutation in testsFor IO + metadata, consider using
Region.add_input(),Region.add_output(), andRegion.set_metadata()/Region.get_metadata()instead of assigningregion.inputs,region.outputs,region.metadatadirectly. This keeps tests stable if internals change.tests/unit/onnx/quantization/autotune/test_insertion_points.py (2)
364-528: Mock graph builders are OK, but consider a thin realgs.Graphfor higher fidelityMagicMocks work here, but a minimal real GraphSurgeon graph (where feasible) would catch more structural issues (e.g., type/attribute expectations) and reduce reliance on
specbehavior.
1207-1313: Strengthen a few vacuous assertions in collect-from testsAssertions like
assert len(result) >= 0always pass; consider asserting expected non-emptiness (or exact counts) for at least one representative case per collector so these tests actually catch regressions.modelopt/onnx/quantization/autotune/insertion_points.py (3)
202-247: Consider replacingassert-based bounds checks with explicit exceptionsThese are public-ish resolution helpers;
assertcan be stripped with-O, and failures become less diagnosable. RaisingIndexError/ValueErrorwith context would be more robust.
551-620:skip_invalid_insertion_points()doesn’t actually filter output tensors incollect_from_region
RegionOutputInsertionPoint.collect_from_region()passes output tensor names intoskip_invalid_insertion_points(), but that helper only matches onnode.inputs, so it’s effectively a no-op for outputs (producer node inputs rarely contain its output name). Either remove the call for clarity, or extend filtering to evaluate outputs (e.g., via tensor dtype/shape and/or consumer ops).Also applies to: 623-707
821-863: Op-set helpers are fine, but consider centralizing inmodelopt/onnx/op_types.pyIf these sets are expected to grow/standardize across quantization features, centralizing them (or at least typing them as
set[str]) would reduce duplication and make semantics clearer.modelopt/onnx/quantization/autotune/common.py (2)
389-405: Consider clearing the source region after merge.After merging, the
otherregion retains references to nodes and children that are now also inself. This could lead to unexpected behavior ifotheris used after the merge.♻️ Optional: Clear merged region to prevent accidental reuse
def merge(self, other: "Region") -> None: """Merge another region into this one. Combines the nodes and children from the other region into this region. The other region's children become children of this region, updating their parent references accordingly. Args: other: Region to merge into this one """ if not other: return # Merge direct nodes self.nodes.update(other.nodes) # Merge children (updates their parent references) for child in other.children: self.add_child(child) + # Clear merged region to prevent accidental reuse + other.nodes.clear() + other.children.clear()
606-616: Inconsistent naming between attribute and serialization key.The attribute is named
node_inputsbut the serialization key isnodes_insertion_points. This inconsistency could cause maintenance confusion.♻️ Optional: Align serialization key with attribute name
def to_dict(self) -> dict[str, Any]: """Convert to dictionary for serialization.""" return { "latency_ms": self.latency_ms, "error": self.error, "profile_timestamp": self.profile_timestamp, - "nodes_insertion_points": [pt.to_dict() for pt in self.node_inputs], + "node_inputs": [pt.to_dict() for pt in self.node_inputs], "child_region_inputs": [pt.to_dict() for pt in self.child_region_inputs], "region_outputs": [pt.to_dict() for pt in self.region_outputs], "hash": self.hash, }And in
from_dict:scheme.node_inputs = [ - NodeInputInsertionPoint.from_dict(pt) for pt in data.get("nodes_insertion_points", []) + NodeInputInsertionPoint.from_dict(pt) for pt in data.get("node_inputs", data.get("nodes_insertion_points", [])) ]The backward-compatible approach in
from_dictsupports both old and new key names.Also applies to: 638-640
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
modelopt/onnx/op_types.pymodelopt/onnx/quantization/autotune/common.pymodelopt/onnx/quantization/autotune/insertion_points.pymodelopt/onnx/quantization/graph_utils.pytests/unit/onnx/quantization/autotune/test_insertion_points.pytests/unit/onnx/quantization/autotune/test_region.py
🧰 Additional context used
🧬 Code graph analysis (4)
tests/unit/onnx/quantization/autotune/test_insertion_points.py (2)
modelopt/onnx/quantization/autotune/insertion_points.py (25)
ChildRegionInputInsertionPoint(291-431)NodeInputInsertionPoint(166-287)RegionOutputInsertionPoint(435-620)ResolvedInsertionPoint(121-162)merge_resolved_insertion_points(777-818)resolve_region_io_insertion_points(728-774)skip_invalid_insertion_points(623-706)to_dict(82-84)to_dict(147-153)to_dict(193-195)to_dict(332-334)to_dict(478-484)from_dict(88-90)from_dict(156-162)from_dict(198-200)from_dict(337-350)from_dict(487-500)resolve(93-103)resolve(202-247)resolve(352-388)resolve(502-549)collect_from_region(107-117)collect_from_region(250-287)collect_from_region(391-431)collect_from_region(552-620)modelopt/onnx/quantization/autotune/common.py (10)
InsertionScheme(465-717)Region(88-456)RegionType(75-85)to_dict(606-616)from_dict(619-652)is_empty(540-550)num_node_insertions(575-581)num_region_insertions(584-593)num_region_output_insertions(596-604)hash(504-537)
modelopt/onnx/quantization/autotune/insertion_points.py (3)
modelopt/onnx/quantization/autotune/common.py (10)
Region(88-456)to_dict(606-616)from_dict(619-652)get_nodes(262-276)RegionType(75-85)get_type(152-154)get_children(172-186)get_inputs(339-341)get_outputs(343-345)get_region_nodes_and_descendants(278-300)modelopt/onnx/op_types.py (2)
get_bool_ops(308-339)get_copy_ops(99-118)modelopt/onnx/quantization/graph_utils.py (1)
get_tensor_consumer_node_indices(305-320)
tests/unit/onnx/quantization/autotune/test_region.py (1)
modelopt/onnx/quantization/autotune/common.py (14)
Region(88-456)RegionType(75-85)get_id(136-138)get_level(144-146)get_type(152-154)add_child(205-234)get_children(172-186)get_parent(164-166)add_node(254-256)get_size(351-357)get_nodes(262-276)get_inputs(339-341)get_outputs(343-345)get_region_nodes_and_descendants(278-300)
modelopt/onnx/quantization/autotune/common.py (1)
modelopt/onnx/quantization/autotune/insertion_points.py (13)
ChildRegionInputInsertionPoint(291-431)NodeInputInsertionPoint(166-287)RegionOutputInsertionPoint(435-620)to_dict(82-84)to_dict(147-153)to_dict(193-195)to_dict(332-334)to_dict(478-484)from_dict(88-90)from_dict(156-162)from_dict(198-200)from_dict(337-350)from_dict(487-500)
🔇 Additional comments (20)
tests/unit/onnx/quantization/autotune/test_region.py (2)
31-73: Core Region API coverage looks solid (creation, parenting, node membership)These tests hit the main Region primitives and relationships in a clear way.
120-167: Hierarchy + removal tests are useful and deterministic
get_region_nodes_and_descendants()andremove_child()coverage here is valuable.tests/unit/onnx/quantization/autotune/test_insertion_points.py (4)
54-234: InsertionPoint value-object tests (immutability/hash/serde) are thoroughGood coverage for frozen dataclasses and dict round-trips.
236-357: InsertionScheme hashing + serialization coverage looks goodNice to see order-independence and error/latency fields exercised.
535-935: Utility + resolve/merge tests cover key behaviorsThe tests for
skip_invalid_insertion_points(),resolve_region_io_insertion_points(), andmerge_resolved_insertion_points()provide good confidence in core mechanics.
1019-1092: Confirm intended semantics: child-region input resolution currently captures external consumers too
ChildRegionInputInsertionPoint.resolve()(viaresolve_region_io_insertion_points) will include all consumers fromtensor_users_map, not just nodes inside the child region (yourtest_resolve_multiple_childrenimplicitly allows this). Worth double-checking this is the desired boundary semantics.modelopt/onnx/quantization/autotune/insertion_points.py (3)
65-118: Good abstraction:InsertionPointABC makes the API consistentClean interface for (de)serialization + resolve/collect across point types.
120-163:ResolvedInsertionPointas a frozen value-object is a good fitHelps correctness (hashability) and simplifies set-based merges.
352-388: Verify region-boundary semantics: child input resolution may include nodes outside the child regionBecause
resolve_region_io_insertion_points()unionsregion.get_region_nodes_and_descendants()with the globaltensor_users_map[tensor_name],ChildRegionInputInsertionPoint.resolve()can return insertion points for external consumers of a child’s input tensor. If the intent is “quantize only inside the child boundary”, consider restricting to region nodes for child-input resolution (or adding a flag controlling whether to include external users).Also applies to: 728-775
modelopt/onnx/quantization/autotune/common.py (11)
47-51: Circular import dependency is handled correctly.The top-level import from
insertion_pointsis paired with deferred imports ofRegionandRegionTypeinside methods ininsertion_points.py, avoiding circular import issues at runtime.
57-73: LGTM!Clean exception hierarchy with appropriate base classes for categorizing region vs. autotuner errors.
75-86: LGTM!Clean enum definition with clear documentation of each region type's purpose in the hierarchy.
114-130: LGTM!Clean initialization with appropriate types for each attribute.
205-234: Robust cycle prevention and re-parenting logic.The
add_childmethod correctly:
- Prevents self-reference
- Detects cycles via ancestor traversal
- Handles re-parenting by removing from old parent
- Prevents duplicate children
329-337: O(n) membership check is acceptable given ordering requirements.The
not incheck on lists is O(n), but as discussed in past reviews, inputs/outputs must be indexable for pattern matching. For typical region sizes, this is acceptable.
438-456: Placeholder method should indicate future implementation.The
compute_structural_signatureraisesNotImplementedError, which is appropriate for a placeholder. The docstring clearly describes the intended behavior for when it's implemented.
503-537: LGTM!The hash computation is deterministic with sorted insertion points and properly truncated SHA-256. The implementation correctly excludes performance metrics (latency, error) from the identity hash.
561-572: LGTM!The
is_profiledlogic correctly identifies schemes that have been measured (either successfully with a finite latency or with an error).
654-707: LGTM!The symmetric difference approach correctly computes edit distance. This relies on the insertion point classes being frozen dataclasses (and thus hashable), which they are per the relevant code snippets.
709-717: LGTM!Clean string representation with all relevant debugging information.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
Signed-off-by: Will Guo <[email protected]>
bd87f1a to
4826707
Compare
| "All", | ||
| "Any", | ||
| "Unique", | ||
| "NonZero", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest renaming this to something more appropriate (as well as updating the docstring), as they include various ops: boolean, bitwise, numeric unary, comparison, and conditional and indexing-like ops.
It seems that what they have in common is that they are all logical or mask-style tensor ops, so maybe get_logical_and_mask_ops() could be appropriate.
What does this PR do?
Type of change: new feature
Overview: This PR integrates an automatical QDQ placment tool into ModelOpt.
This PR is the 1/4 parts of the change, it contains the following changes:
Part 1: #701
Part 2: #702
Part 3: #703
Part 4: #704
Usage
Testing
Implement unit tests, all tests could get passed.
Before your PR is "Ready for review"
Additional Information
Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.