-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable adaptive stripping and eliminate dependency of weight sharing feature on OVEP qdq stripping #629
base: ovep-develop
Are you sure you want to change the base?
Conversation
@@ -359,14 +359,35 @@ BackendManager::GetModelProtoFromFusedNode(const onnxruntime::Node& fused_node, | |||
} | |||
}; | |||
|
|||
#if (((OPENVINO_VERSION_MAJOR == 2025) && (OPENVINO_VERSION_MINOR > 0)) || (OPENVINO_VERSION_MAJOR > 2025)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@saurabhkale17 .. Is this OV version check required? I think it is redundant and can be optimized out.
Below line, would only enable the property if OV 2025.1 and driver supports QDQ stripping:
if (std::find(supported_properties.begin(), supported_properties.end(), "NPU_QDQ_OPTIMIZATION") != supported_properties.end()) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This OpenVINO version check is necessary to avoid compilation errors in older versions.
For older OV versions I get this error: C2039: 'qdq_optimization': is not a member of 'ov::intel_npu'
The reason for this behavior:
This check happens at runtime, because it queries the device's supported properties dynamically.
if (std::find(supported_properties.begin(), supported_properties.end(), "NPU_QDQ_OPTIMIZATION") != supported_properties.end())
However, the line
OVCore::Get()->core.set_property("NPU", {ov::intel_npu::qdq_optimization(true)});
happens at compile-time, since the compiler needs to resolve qdq_optimization(true) during compilation.
If ov::intel_npu::qdq_optimization(true) does not exist in an older OpenVINO version, the compiler treats it as an undefined symbol and throws an error before execution—long before the std::find(...) check can even run.
#if preprocessor checks ensures that qdq_optimization(true) is only compiled when the OpenVINO version supports it, preventing compilation failures in older versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. I was looking at the OV C/C++ API documentation to find out what the behavior is when an unregonized property is used but it's not described or undefined. I was trying to see if instead of doing a get_supported_properties ->set_property we could directly do something like if (!set_property) then fallback
.
@@ -529,7 +541,7 @@ static void AddQDQNodeUnit(onnxruntime::Graph& dst_graph, | |||
SkipReason reason = SkipReason::Other; | |||
bool keep_dq = CheckDQRuleSet(node_unit, dq_node, src_graph, reason); | |||
|
|||
if (keep_dq) { | |||
if (keep_dq || (enable_ovep_weight_sharing && !enable_ovep_qdq_optimizer)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Saurabh Can we have a flag for this in qdq_stripping.h instead of checking this again and again.
I think this will be a good code design
enable_ovep_weight_sharing && !enable_ovep_qdq_optimizer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Saurabh Can we have a flag for this in qdq_stripping.h instead of checking this again and again. I think this will be a good code design enable_ovep_weight_sharing && !enable_ovep_qdq_optimizer
I'd prefer we separate the passes. Sure, there's replication but it simplifies the design. Alternatively the routine can be redesigned to be more modular.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve noted your comment and completely agree with the improvement. However, given the limited time we have to enable compiler stripping, I’ll defer this change for now and revisit it later once the immediate priorities are addressed.
auto supported_properties = OVCore::Get()->core.get_property(session_context_.device_type, ov::supported_properties); | ||
|
||
// query ov properties for deciding on which stripping to use | ||
if (std::find(supported_properties.begin(), supported_properties.end(), "NPU_QDQ_OPTIMIZATION") != supported_properties.end()) { // 25.1 exist or not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd put this in the OV interface, cache the results and expose a function to query aa specific value and return an std::optional
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in the latest commit
IsQDQGraph(subgraph)) { | ||
LOGS_DEFAULT(INFO) << "[OpenVINO-EP] QDQ optimization pass status: 1"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message about QDQ being disabled is still there. Let's keep both in their respective implementation blocks.
} else { | ||
LOGS_DEFAULT(INFO) << "[OpenVINO-EP]: OVEP QDQ optimization pass is enabled"; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move back to implementation block
const auto& onnx_model_path_name = subgraph.ModelPath(); | ||
// QDQ stripping enabled only for the NPU | ||
if (session_context_.device_type.find("NPU") != std::string::npos && | ||
session_context_.enable_qdq_optimizer && | ||
(enable_ovep_qdq_optimizer || session_context_.so_share_ep_contexts) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check will fail to do WAI if model is not QDQ. That looks like a bug. Let's move that check up to the declaration of
bool enable_ovep_qdq_optimizer = session_context_.enable_qdq_optimizer && IsQDQGraph(subgraph);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in the latest commit
@@ -529,7 +541,7 @@ static void AddQDQNodeUnit(onnxruntime::Graph& dst_graph, | |||
SkipReason reason = SkipReason::Other; | |||
bool keep_dq = CheckDQRuleSet(node_unit, dq_node, src_graph, reason); | |||
|
|||
if (keep_dq) { | |||
if (keep_dq || (enable_ovep_weight_sharing && !enable_ovep_qdq_optimizer)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Saurabh Can we have a flag for this in qdq_stripping.h instead of checking this again and again. I think this will be a good code design enable_ovep_weight_sharing && !enable_ovep_qdq_optimizer
I'd prefer we separate the passes. Sure, there's replication but it simplifies the design. Alternatively the routine can be redesigned to be more modular.
@saurabhkale17 . Can you please rebase and fix the issue reported by CI |
Description
The weight sharing feature in OVEP currently depends on QDQ stripping, requiring users to enable OVEP QDQ stripping for weight sharing to function correctly.
This PR removes that dependency, allowing weight sharing to work independently.
The broader goal is to enable compiler stripping through OVEP using a query, and eliminating this dependency is a crucial first step toward that objective.
Enable Default Compiler Stripping & Rename QDQ Optimizer Flag
Query OV for NPU_QDQ_OPTIMIZATION and set it to true if available, ensuring compiler-based stripping is the default.
Disable OVEP QDQ stripping by setting ovep_qdq_optimizer = false.
Rename enable_qdq_optimizer to enable_ovep_qdq_optimizer for clarity.
Motivation and Context
Prevents redundant optimizations by disabling OVEP QDQ stripping (ovep_qdq_optimizer = false).
Improves code clarity by renaming enable_qdq_optimizer to enable_ovep_qdq_optimizer, reducing confusion for future developers.