Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable adaptive stripping and eliminate dependency of weight sharing feature on OVEP qdq stripping #629

Open
wants to merge 7 commits into
base: ovep-develop
Choose a base branch
from

Conversation

saurabhkale17
Copy link

@saurabhkale17 saurabhkale17 commented Mar 27, 2025

Description

The weight sharing feature in OVEP currently depends on QDQ stripping, requiring users to enable OVEP QDQ stripping for weight sharing to function correctly.

This PR removes that dependency, allowing weight sharing to work independently.

The broader goal is to enable compiler stripping through OVEP using a query, and eliminating this dependency is a crucial first step toward that objective.

Enable Default Compiler Stripping & Rename QDQ Optimizer Flag

  • Query OV for NPU_QDQ_OPTIMIZATION and set it to true if available, ensuring compiler-based stripping is the default.

  • Disable OVEP QDQ stripping by setting ovep_qdq_optimizer = false.

  • Rename enable_qdq_optimizer to enable_ovep_qdq_optimizer for clarity.

Motivation and Context

  • Ensures compiler-based stripping is the default mechanism for NPU by explicitly setting NPU_DQD_OPTIMIZATION = true if available.

Prevents redundant optimizations by disabling OVEP QDQ stripping (ovep_qdq_optimizer = false).

Improves code clarity by renaming enable_qdq_optimizer to enable_ovep_qdq_optimizer, reducing confusion for future developers.

@@ -359,14 +359,35 @@ BackendManager::GetModelProtoFromFusedNode(const onnxruntime::Node& fused_node,
}
};

#if (((OPENVINO_VERSION_MAJOR == 2025) && (OPENVINO_VERSION_MINOR > 0)) || (OPENVINO_VERSION_MAJOR > 2025))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saurabhkale17 .. Is this OV version check required? I think it is redundant and can be optimized out.

Below line, would only enable the property if OV 2025.1 and driver supports QDQ stripping:
if (std::find(supported_properties.begin(), supported_properties.end(), "NPU_QDQ_OPTIMIZATION") != supported_properties.end()) {

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This OpenVINO version check is necessary to avoid compilation errors in older versions.
For older OV versions I get this error: C2039: 'qdq_optimization': is not a member of 'ov::intel_npu'

The reason for this behavior:
This check happens at runtime, because it queries the device's supported properties dynamically.
if (std::find(supported_properties.begin(), supported_properties.end(), "NPU_QDQ_OPTIMIZATION") != supported_properties.end())

However, the line
OVCore::Get()->core.set_property("NPU", {ov::intel_npu::qdq_optimization(true)});
happens at compile-time, since the compiler needs to resolve qdq_optimization(true) during compilation.

If ov::intel_npu::qdq_optimization(true) does not exist in an older OpenVINO version, the compiler treats it as an undefined symbol and throws an error before execution—long before the std::find(...) check can even run.

#if preprocessor checks ensures that qdq_optimization(true) is only compiled when the OpenVINO version supports it, preventing compilation failures in older versions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I was looking at the OV C/C++ API documentation to find out what the behavior is when an unregonized property is used but it's not described or undefined. I was trying to see if instead of doing a get_supported_properties ->set_property we could directly do something like if (!set_property) then fallback.

@MayureshV1 MayureshV1 requested a review from javier-intel March 28, 2025 04:45
@@ -529,7 +541,7 @@ static void AddQDQNodeUnit(onnxruntime::Graph& dst_graph,
SkipReason reason = SkipReason::Other;
bool keep_dq = CheckDQRuleSet(node_unit, dq_node, src_graph, reason);

if (keep_dq) {
if (keep_dq || (enable_ovep_weight_sharing && !enable_ovep_qdq_optimizer)) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saurabh Can we have a flag for this in qdq_stripping.h instead of checking this again and again.
I think this will be a good code design
enable_ovep_weight_sharing && !enable_ovep_qdq_optimizer

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saurabh Can we have a flag for this in qdq_stripping.h instead of checking this again and again. I think this will be a good code design enable_ovep_weight_sharing && !enable_ovep_qdq_optimizer

I'd prefer we separate the passes. Sure, there's replication but it simplifies the design. Alternatively the routine can be redesigned to be more modular.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve noted your comment and completely agree with the improvement. However, given the limited time we have to enable compiler stripping, I’ll defer this change for now and revisit it later once the immediate priorities are addressed.

Comment on lines 367 to 370
auto supported_properties = OVCore::Get()->core.get_property(session_context_.device_type, ov::supported_properties);

// query ov properties for deciding on which stripping to use
if (std::find(supported_properties.begin(), supported_properties.end(), "NPU_QDQ_OPTIMIZATION") != supported_properties.end()) { // 25.1 exist or not

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd put this in the OV interface, cache the results and expose a function to query aa specific value and return an std::optional.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the latest commit

IsQDQGraph(subgraph)) {
LOGS_DEFAULT(INFO) << "[OpenVINO-EP] QDQ optimization pass status: 1";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message about QDQ being disabled is still there. Let's keep both in their respective implementation blocks.

Comment on lines +380 to +382
} else {
LOGS_DEFAULT(INFO) << "[OpenVINO-EP]: OVEP QDQ optimization pass is enabled";
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move back to implementation block

const auto& onnx_model_path_name = subgraph.ModelPath();
// QDQ stripping enabled only for the NPU
if (session_context_.device_type.find("NPU") != std::string::npos &&
session_context_.enable_qdq_optimizer &&
(enable_ovep_qdq_optimizer || session_context_.so_share_ep_contexts) &&

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check will fail to do WAI if model is not QDQ. That looks like a bug. Let's move that check up to the declaration of
bool enable_ovep_qdq_optimizer = session_context_.enable_qdq_optimizer && IsQDQGraph(subgraph);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the latest commit

@@ -529,7 +541,7 @@ static void AddQDQNodeUnit(onnxruntime::Graph& dst_graph,
SkipReason reason = SkipReason::Other;
bool keep_dq = CheckDQRuleSet(node_unit, dq_node, src_graph, reason);

if (keep_dq) {
if (keep_dq || (enable_ovep_weight_sharing && !enable_ovep_qdq_optimizer)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saurabh Can we have a flag for this in qdq_stripping.h instead of checking this again and again. I think this will be a good code design enable_ovep_weight_sharing && !enable_ovep_qdq_optimizer

I'd prefer we separate the passes. Sure, there's replication but it simplifies the design. Alternatively the routine can be redesigned to be more modular.

@MayureshV1
Copy link

@saurabhkale17 . Can you please rebase and fix the issue reported by CI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants