Skip to content

Enable adaptive stripping and eliminate dependency of weight sharing feature on OVEP qdq stripping #629

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 7, 2025

Conversation

saurabhkale17
Copy link

@saurabhkale17 saurabhkale17 commented Mar 27, 2025

Description

The weight sharing feature in OVEP currently depends on QDQ stripping, requiring users to enable OVEP QDQ stripping for weight sharing to function correctly.

This PR removes that dependency, allowing weight sharing to work independently.

The broader goal is to enable compiler stripping through OVEP using a query, and eliminating this dependency is a crucial first step toward that objective.

Enable Default Compiler Stripping & Rename QDQ Optimizer Flag

  • Query OV for NPU_QDQ_OPTIMIZATION and set it to true if available, ensuring compiler-based stripping is the default.

  • Disable OVEP QDQ stripping by setting ovep_qdq_optimizer = false.

  • Rename enable_qdq_optimizer to enable_ovep_qdq_optimizer for clarity.

Motivation and Context

  • Ensures compiler-based stripping is the default mechanism for NPU by explicitly setting NPU_DQD_OPTIMIZATION = true if available.

Prevents redundant optimizations by disabling OVEP QDQ stripping (ovep_qdq_optimizer = false).

Improves code clarity by renaming enable_qdq_optimizer to enable_ovep_qdq_optimizer, reducing confusion for future developers.

@@ -359,14 +359,35 @@ BackendManager::GetModelProtoFromFusedNode(const onnxruntime::Node& fused_node,
}
};

#if (((OPENVINO_VERSION_MAJOR == 2025) && (OPENVINO_VERSION_MINOR > 0)) || (OPENVINO_VERSION_MAJOR > 2025))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saurabhkale17 .. Is this OV version check required? I think it is redundant and can be optimized out.

Below line, would only enable the property if OV 2025.1 and driver supports QDQ stripping:
if (std::find(supported_properties.begin(), supported_properties.end(), "NPU_QDQ_OPTIMIZATION") != supported_properties.end()) {

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This OpenVINO version check is necessary to avoid compilation errors in older versions.
For older OV versions I get this error: C2039: 'qdq_optimization': is not a member of 'ov::intel_npu'

The reason for this behavior:
This check happens at runtime, because it queries the device's supported properties dynamically.
if (std::find(supported_properties.begin(), supported_properties.end(), "NPU_QDQ_OPTIMIZATION") != supported_properties.end())

However, the line
OVCore::Get()->core.set_property("NPU", {ov::intel_npu::qdq_optimization(true)});
happens at compile-time, since the compiler needs to resolve qdq_optimization(true) during compilation.

If ov::intel_npu::qdq_optimization(true) does not exist in an older OpenVINO version, the compiler treats it as an undefined symbol and throws an error before execution—long before the std::find(...) check can even run.

#if preprocessor checks ensures that qdq_optimization(true) is only compiled when the OpenVINO version supports it, preventing compilation failures in older versions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I was looking at the OV C/C++ API documentation to find out what the behavior is when an unregonized property is used but it's not described or undefined. I was trying to see if instead of doing a get_supported_properties ->set_property we could directly do something like if (!set_property) then fallback.

@MayureshV1 MayureshV1 requested a review from javier-intel March 28, 2025 04:45
@@ -529,7 +541,7 @@ static void AddQDQNodeUnit(onnxruntime::Graph& dst_graph,
SkipReason reason = SkipReason::Other;
bool keep_dq = CheckDQRuleSet(node_unit, dq_node, src_graph, reason);

if (keep_dq) {
if (keep_dq || (enable_ovep_weight_sharing && !enable_ovep_qdq_optimizer)) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saurabh Can we have a flag for this in qdq_stripping.h instead of checking this again and again.
I think this will be a good code design
enable_ovep_weight_sharing && !enable_ovep_qdq_optimizer

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saurabh Can we have a flag for this in qdq_stripping.h instead of checking this again and again. I think this will be a good code design enable_ovep_weight_sharing && !enable_ovep_qdq_optimizer

I'd prefer we separate the passes. Sure, there's replication but it simplifies the design. Alternatively the routine can be redesigned to be more modular.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve noted your comment and completely agree with the improvement. However, given the limited time we have to enable compiler stripping, I’ll defer this change for now and revisit it later once the immediate priorities are addressed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I'll approve but let's work on that after the merge.

IsQDQGraph(subgraph)) {
LOGS_DEFAULT(INFO) << "[OpenVINO-EP] QDQ optimization pass status: 1";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message about QDQ being disabled is still there. Let's keep both in their respective implementation blocks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will get this addressed in the next PR we will initiate for refactoring.

Comment on lines +380 to +376
} else {
LOGS_DEFAULT(INFO) << "[OpenVINO-EP]: OVEP QDQ optimization pass is enabled";
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move back to implementation block

const auto& onnx_model_path_name = subgraph.ModelPath();
// QDQ stripping enabled only for the NPU
if (session_context_.device_type.find("NPU") != std::string::npos &&
session_context_.enable_qdq_optimizer &&
(enable_ovep_qdq_optimizer || session_context_.so_share_ep_contexts) &&

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check will fail to do WAI if model is not QDQ. That looks like a bug. Let's move that check up to the declaration of
bool enable_ovep_qdq_optimizer = session_context_.enable_qdq_optimizer && IsQDQGraph(subgraph);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the latest commit

@@ -529,7 +541,7 @@ static void AddQDQNodeUnit(onnxruntime::Graph& dst_graph,
SkipReason reason = SkipReason::Other;
bool keep_dq = CheckDQRuleSet(node_unit, dq_node, src_graph, reason);

if (keep_dq) {
if (keep_dq || (enable_ovep_weight_sharing && !enable_ovep_qdq_optimizer)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saurabh Can we have a flag for this in qdq_stripping.h instead of checking this again and again. I think this will be a good code design enable_ovep_weight_sharing && !enable_ovep_qdq_optimizer

I'd prefer we separate the passes. Sure, there's replication but it simplifies the design. Alternatively the routine can be redesigned to be more modular.

@MayureshV1
Copy link

@saurabhkale17 . Can you please rebase and fix the issue reported by CI

@vthaniel vthaniel force-pushed the saurabh/enable_adaptive_stripping branch from 2d3846b to 2969048 Compare April 7, 2025 11:53
Copy link

@MayureshV1 MayureshV1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have verified the OV property is set accurately while passing to compiler and the model dump is as expected.

Looks good to merge!

@MayureshV1 MayureshV1 merged commit 2e4d541 into ovep-develop Apr 7, 2025
6 of 8 checks passed
ankitm3k pushed a commit that referenced this pull request Jul 2, 2025
…feature on OVEP qdq stripping (#629)

* eliminate dependency of weight sharing on ovep qdq stripping pass

* fix qdqnodeunit issue

* enable compiler stripping

* enable adaptive stripping: cleanup code

* fix backward compatibility issue

* add logs to identify which stripping is enabled

* address PR review comments

* fix unused variable error

* resolve unused var issue

* fix CI issues
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants