Skip to content

Enable adaptive stripping and eliminate dependency of weight sharing feature on OVEP qdq stripping #629

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 7, 2025
26 changes: 21 additions & 5 deletions onnxruntime/core/providers/openvino/backend_manager.cc
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
#include "core/providers/openvino/ibackend.h"
#include "core/providers/openvino/backend_utils.h"
#include "core/providers/openvino/qdq_transformations/qdq_stripping.h"
#include "core/providers/openvino/ov_interface.h"

namespace onnxruntime {
namespace openvino_ep {
Expand Down Expand Up @@ -359,22 +360,37 @@ BackendManager::GetModelProtoFromFusedNode(const onnxruntime::Node& fused_node,
}
};

[[maybe_unused]] bool enable_ovep_qdq_optimizer = session_context_.enable_qdq_optimizer && IsQDQGraph(subgraph);
[[maybe_unused]] std::optional<bool> enable_compiler_qdq_optimization = queryOVProperty("NPU_QDQ_OPTIMIZATION", session_context_.device_type);
#if (((OPENVINO_VERSION_MAJOR == 2025) && (OPENVINO_VERSION_MINOR > 0)) || (OPENVINO_VERSION_MAJOR > 2025))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saurabhkale17 .. Is this OV version check required? I think it is redundant and can be optimized out.

Below line, would only enable the property if OV 2025.1 and driver supports QDQ stripping:
if (std::find(supported_properties.begin(), supported_properties.end(), "NPU_QDQ_OPTIMIZATION") != supported_properties.end()) {

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This OpenVINO version check is necessary to avoid compilation errors in older versions.
For older OV versions I get this error: C2039: 'qdq_optimization': is not a member of 'ov::intel_npu'

The reason for this behavior:
This check happens at runtime, because it queries the device's supported properties dynamically.
if (std::find(supported_properties.begin(), supported_properties.end(), "NPU_QDQ_OPTIMIZATION") != supported_properties.end())

However, the line
OVCore::Get()->core.set_property("NPU", {ov::intel_npu::qdq_optimization(true)});
happens at compile-time, since the compiler needs to resolve qdq_optimization(true) during compilation.

If ov::intel_npu::qdq_optimization(true) does not exist in an older OpenVINO version, the compiler treats it as an undefined symbol and throws an error before execution—long before the std::find(...) check can even run.

#if preprocessor checks ensures that qdq_optimization(true) is only compiled when the OpenVINO version supports it, preventing compilation failures in older versions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I was looking at the OV C/C++ API documentation to find out what the behavior is when an unregonized property is used but it's not described or undefined. I was trying to see if instead of doing a get_supported_properties ->set_property we could directly do something like if (!set_property) then fallback.

if (session_context_.device_type.find("NPU") != std::string::npos && session_context_.enable_qdq_optimizer) {
if (enable_compiler_qdq_optimization.has_value() && enable_compiler_qdq_optimization.value()) {
LOGS_DEFAULT(INFO) << "[OpenVINO-EP]: Compiler QDQ optimization pass is enabled";
OVCore::Get()->core.set_property("NPU", {ov::intel_npu::qdq_optimization(true)});
// disabling OVEP qdq stripping
// at this stage provider option "enable_qdq_optimizer" is still true but OVEP stripping is (disabled) false
// as compiler stripping is enabled
enable_ovep_qdq_optimizer = false;
} else {
LOGS_DEFAULT(INFO) << "[OpenVINO-EP]: OVEP QDQ optimization pass is enabled";
}
Comment on lines +374 to +376

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move back to implementation block

}
#endif

const auto& onnx_model_path_name = subgraph.ModelPath();
// QDQ stripping enabled only for the NPU
if (session_context_.device_type.find("NPU") != std::string::npos &&
session_context_.enable_qdq_optimizer &&
IsQDQGraph(subgraph)) {
LOGS_DEFAULT(INFO) << "[OpenVINO-EP] QDQ optimization pass status: 1";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message about QDQ being disabled is still there. Let's keep both in their respective implementation blocks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will get this addressed in the next PR we will initiate for refactoring.

(enable_ovep_qdq_optimizer || session_context_.so_share_ep_contexts)) {
std::unique_ptr<onnxruntime::Model> model;
Status status = CreateModelWithStrippedQDQNodes(subgraph, logger, session_context_.so_share_ep_contexts, model, shared_context_.shared_weights);
Status status = CreateModelWithStrippedQDQNodes(subgraph, logger, session_context_.so_share_ep_contexts, model, shared_context_.shared_weights, enable_ovep_qdq_optimizer);
auto model_proto = model->ToProto();
model_proto->set_ir_version(ONNX_NAMESPACE::Version::IR_VERSION);
print_model_proto_duration();
DumpOpenVINOEPModel(onnx_model_path_name, model_proto.get(), fused_node);
ORT_ENFORCE(status.IsOK(), status.ErrorMessage());
return model_proto;
} else {
LOGS_DEFAULT(INFO) << "[OpenVINO-EP] QDQ optimization pass status: 0";
LOGS_DEFAULT(INFO) << "[OpenVINO-EP] OVEP QDQ optimization pass is disabled";
auto model = subgraph.CreateModel(logger);
auto model_proto = model->ToProto();
model_proto->set_ir_version(ONNX_NAMESPACE::Version::IR_VERSION);
Expand Down
11 changes: 11 additions & 0 deletions onnxruntime/core/providers/openvino/ov_interface.cc
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,17 @@ void printDebugInfo(const ov::CompiledModel& obj) {
}
#endif

// Function to check if a given OV property is enabled
std::optional<bool> queryOVProperty(const std::string& property, const std::string& device_type) {
try {
// Get the property value
auto supported_properties = OVCore::Get()->core.get_property(device_type, ov::supported_properties);
return std::find(supported_properties.begin(), supported_properties.end(), property) != supported_properties.end();
} catch (const std::exception&) {
return std::nullopt; // Property not found or invalid
}
}

std::shared_ptr<OVNetwork> OVCore::ReadModel(std::string&& model, const std::string& model_path) {
try {
std::istringstream modelStringStream(std::move(model));
Expand Down
3 changes: 3 additions & 0 deletions onnxruntime/core/providers/openvino/ov_interface.h
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
#include <fstream>
#include <sstream>
#include <utility>
#include <optional>

#include "openvino/openvino.hpp"
#include "openvino/runtime/intel_npu/properties.hpp"
Expand Down Expand Up @@ -37,6 +38,8 @@ typedef ov::intel_gpu::ocl::ClContext* OVRemoteContextPtr;
typedef ov::RemoteContext OVRemoteContext;
#endif

std::optional<bool> queryOVProperty(const std::string& property, const std::string& device_type);

template <typename T>
class WeakSingleton {
public:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -341,6 +341,7 @@ static bool CheckDQRuleSet(const NodeUnit& node_unit,
}
}

// this check is if QLinear node feed into the output of src graph which expects quantized output
static bool CheckQFeedsIntoQuantizedOutput(const NodeUnit& node_unit,
const std::unordered_map<std::string, std::string> graph_op_data_type) {
auto op_of_quantized_layer = node_unit.Outputs();
Expand Down Expand Up @@ -447,9 +448,17 @@ static bool HandleDoubleQDQ(onnxruntime::Graph& dst_graph, const onnxruntime::Gr
static void AddStandaloneNodeUnit(onnxruntime::Graph& dst_graph, const onnxruntime::GraphViewer& src_graph,
const NodeUnit& node_unit,
std::set<std::string>& initializers_to_keep,
const logging::Logger& /* logger */) {
const logging::Logger& /* logger */,
bool IsWeightSharingWithoutOVEPQDQStripping) {
assert(node_unit.UnitType() == NodeUnit::Type::SingleNode);

// this is the scenario where WAI is enabled and ovep stripping is disabled
// do not strip off any Q or DQ node
if (IsWeightSharingWithoutOVEPQDQStripping) {
AddNode(initializers_to_keep, src_graph, dst_graph, node_unit.GetNode());
return;
}

if (HandleDoubleQDQ(dst_graph, src_graph, node_unit, initializers_to_keep)) return;

auto add_identity_op = [&](bool duplicate_dq) {
Expand Down Expand Up @@ -511,7 +520,8 @@ static void AddQDQNodeUnit(onnxruntime::Graph& dst_graph,
const onnxruntime::GraphViewer& src_graph,
const NodeUnit& node_unit,
std::set<std::string>& initializers_to_keep,
const logging::Logger& /* logger */) {
const logging::Logger& /* logger */,
bool IsWeightSharingWithoutOVEPQDQStripping) {
assert(node_unit.UnitType() == NodeUnit::Type::QDQGroup);

// Collect inputs coming into the node unit.
Expand All @@ -529,7 +539,7 @@ static void AddQDQNodeUnit(onnxruntime::Graph& dst_graph,
SkipReason reason = SkipReason::Other;
bool keep_dq = CheckDQRuleSet(node_unit, dq_node, src_graph, reason);

if (keep_dq) {
if (IsWeightSharingWithoutOVEPQDQStripping || keep_dq) {
AddNode(initializers_to_keep, src_graph, dst_graph, *dq_node);
dq_node_args_to_keep.insert({input_defs.at(0)->Name(),
&dst_graph.GetOrCreateNodeArg(dq_node->OutputDefs().at(0)->Name(),
Expand Down Expand Up @@ -597,7 +607,7 @@ static void AddQDQNodeUnit(onnxruntime::Graph& dst_graph,

bool keep_q = CheckQRuleSet(node_unit, q_node, src_graph, reason);

if (keep_q) {
if (IsWeightSharingWithoutOVEPQDQStripping || keep_q) {
AddNode(initializers_to_keep, src_graph, dst_graph, *q_node);
// if keep_q, then output defs of the target node doesn't change
output_args.push_back(&dst_graph.GetOrCreateNodeArg(target_node.OutputDefs().at(i)->Name(),
Expand Down Expand Up @@ -675,7 +685,8 @@ Status CreateModelWithStrippedQDQNodes(const GraphViewer& src_graph,
const logging::Logger& logger,
bool enable_ovep_weight_sharing,
/*out*/ std::unique_ptr<onnxruntime::Model>& model,
/*out*/ sw& shared_weights) {
/*out*/ sw& shared_weights,
bool enable_ovep_qdq_optimizer) {
// NOTE: This function is a re-implementation of GraphViewerToProto() in core/graph/graph_proto_serializer.cc
// with the following differences:
// - Uses onnxruntime::Graph APIs instead of onnx::GraphProto APIs.
Expand Down Expand Up @@ -766,10 +777,12 @@ Status CreateModelWithStrippedQDQNodes(const GraphViewer& src_graph,
continue; // Already handled this node unit
}

bool IsWeightSharingWithoutOVEPQDQStripping = enable_ovep_weight_sharing && !enable_ovep_qdq_optimizer;

if (node_unit->UnitType() == NodeUnit::Type::SingleNode) {
AddStandaloneNodeUnit(dst_graph, src_graph, *node_unit, initializers_to_keep, logger);
AddStandaloneNodeUnit(dst_graph, src_graph, *node_unit, initializers_to_keep, logger, IsWeightSharingWithoutOVEPQDQStripping);
} else {
AddQDQNodeUnit(dst_graph, src_graph, *node_unit, initializers_to_keep, logger);
AddQDQNodeUnit(dst_graph, src_graph, *node_unit, initializers_to_keep, logger, IsWeightSharingWithoutOVEPQDQStripping);
}

seen_node_units.insert(node_unit);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@ Status CreateModelWithStrippedQDQNodes(const GraphViewer& src_graph,
const logging::Logger& logger,
bool enable_ovep_weight_sharing,
/*out*/ std::unique_ptr<onnxruntime::Model>& model,
/*out*/ sw& shared_weights);
/*out*/ sw& shared_weights,
bool enable_ovep_qdq_optimizer);

bool dumpMetaDataMapToBinary(const sw::Metadata::Map& shared_weights, const std::string& filename);
} // namespace openvino_ep
Expand Down
Loading