Skip to content

Commit 632f238

Browse files
preetha-inteljatinwadhwa921vthanielTejalKhade28
authored
ORT-OVEP Doc update (microsoft#24395)
### Description This PR updates the ONNX Runtime documentation to reflect the changes and aligns with ORT 1.21.0. --------- Co-authored-by: sfatimar <[email protected]> Co-authored-by: vthaniel <[email protected]> Co-authored-by: TejalKhade28 <[email protected]>
1 parent 3262c43 commit 632f238

File tree

2 files changed

+99
-14
lines changed

2 files changed

+99
-14
lines changed

docs/build/eps.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -277,11 +277,11 @@ See more information on the OpenVINO™ Execution Provider [here](../execution-p
277277

278278
1. Install the OpenVINO™ offline/online installer from Intel<sup>®</sup> Distribution of OpenVINO™<sup>TM</sup> Toolkit **Release 2024.3** for the appropriate OS and target hardware:
279279
* [Windows - CPU, GPU, NPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?PACKAGE=OPENVINO_BASE&VERSION=v_2024_3_0&OP_SYSTEM=WINDOWS&DISTRIBUTION=ARCHIVE).
280-
* [Linux - CPU, GPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?PACKAGE=OPENVINO_BASE&VERSION=v_2024_3_0&OP_SYSTEM=LINUX&DISTRIBUTION=ARCHIVE)
280+
* [Linux - CPU, GPU, NPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?PACKAGE=OPENVINO_BASE&VERSION=v_2024_3_0&OP_SYSTEM=LINUX&DISTRIBUTION=ARCHIVE)
281281

282282
Follow [documentation](https://docs.openvino.ai/2024/home.html) for detailed instructions.
283283

284-
*2024.3 is the current recommended OpenVINO™ version. [OpenVINO™ 2023.3](https://docs.openvino.ai/2023.3/home.html) is minimal OpenVINO™ version requirement.*
284+
*2024.5 is the current recommended OpenVINO™ version. [OpenVINO™ 2024.5](https://docs.openvino.ai/2024/index.html) is minimal OpenVINO™ version requirement.*
285285

286286
2. Configure the target hardware with specific follow on instructions:
287287
* To configure Intel<sup>®</sup> Processor Graphics(GPU) please follow these instructions: [Windows](https://docs.openvino.ai/2024/get-started/configurations/configurations-intel-gpu.html#windows), [Linux](https://docs.openvino.ai/2024/get-started/configurations/configurations-intel-gpu.html#linux)

docs/execution-providers/OpenVINO-ExecutionProvider.md

Lines changed: 97 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ Accelerate ONNX models on Intel CPUs, GPUs, NPU with Intel OpenVINO™ Execution
2020
## Install
2121

2222
Pre-built packages and Docker images are published for OpenVINO™ Execution Provider for ONNX Runtime by Intel for each release.
23-
* OpenVINO™ Execution Provider for ONNX Runtime Release page: [Latest v5.4 Release](https://github.com/intel/onnxruntime/releases)
23+
* OpenVINO™ Execution Provider for ONNX Runtime Release page: [Latest v5.6 Release](https://github.com/intel/onnxruntime/releases)
2424
* Python wheels Ubuntu/Windows: [onnxruntime-openvino](https://pypi.org/project/onnxruntime-openvino/)
2525
* Docker image: [openvino/onnxruntime_ep_ubuntu20](https://hub.docker.com/r/openvino/onnxruntime_ep_ubuntu20)
2626

@@ -29,7 +29,9 @@ Pre-built packages and Docker images are published for OpenVINO™ Execution Pro
2929
ONNX Runtime OpenVINO™ Execution Provider is compatible with three lastest releases of OpenVINO™.
3030

3131
|ONNX Runtime|OpenVINO™|Notes|
32-
|---|---|---|
32+
|---|---|---|
33+
|1.21.0|2025.0|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.6)|
34+
|1.20.0|2024.4|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.5)|
3335
|1.19.0|2024.3|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.4)|
3436
|1.18.0|2024.1|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.3)|
3537
|1.17.1|2023.3|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.2)|
@@ -227,7 +229,61 @@ Refer to [Session Options](https://github.com/microsoft/onnxruntime/blob/main/in
227229
### Enable QDQ Optimizations Passes
228230
Optimizes ORT quantized models for the NPU device to only keep QDQs for supported ops and optimize for performance and accuracy.Generally this feature will give better performance/accuracy with ORT Optimizations disabled.
229231
Refer to [Configuration Options](#configuration-options) for more information about using these runtime options.
232+
233+
### Loading Custom JSON OV Config During Runtime
234+
This feature is developed to facilitate loading of OVEP parameters from a single JSON configuration file.
235+
The JSON input schema must be of format -
236+
```
237+
{
238+
"DEVICE_KEY": {"PROPERTY": "PROPERTY_VALUE"}
239+
}
240+
```
241+
where "DEVICE_KEY" can be CPU, NPU or GPU , "PROPERTY" must be a valid entity defined in OV from its properties.hpp sections and "PROPERTY_VALUE" must be passed in as a string. If we pass any other type like int/bool we encounter errors from ORT like below -
242+
243+
Exception during initialization: [json.exception.type_error.302] type must be string, but is a number.
244+
245+
While one can set the int/bool values like this "NPU_TILES": "2" which is valid (refer to the example given below).
246+
If someone passes incorrect keys, it will be skipped with a warning while incorrect values assigned to a valid key will result in an exception arising from OV framework.
230247
248+
The valid properties are of 2 types viz. MUTABLE (R/W) & IMMUTABLE (R ONLY) these are also governed while setting the same. If an IMMUTABLE property is being set, we skip setting the same with a similar warning.
249+
250+
Example:
251+
252+
The usage of this functionality using onnxruntime_perf_test application is as below –
253+
254+
```
255+
onnxruntime_perf_test.exe -e openvino -m times -r 1 -i "device_type|NPU load_config|npu_config.json" model.onnx
256+
```
257+
where the npu_config.json file is defined as below –
258+
259+
```bash
260+
{
261+
"NPU": {
262+
"PERFORMANCE_HINT": "THROUGHPUT",
263+
"WORKLOAD_TYPE": "Efficient",
264+
"NPU_TILES": "2",
265+
"LOG_LEVEL": "LOG_DEBUG",
266+
"NPU_COMPILATION_MODE_PARAMS": "enable-weights-swizzling=false enable-activation-swizzling=false enable-grouped-matmul=false"
267+
}
268+
}
269+
270+
```
271+
To explicitly enable logs one must use "LOG_LEVEL": "LOG_DEBUG" in the JSON device configuration property. The log verifies that the correct device parameters and properties are being set / populated during runtime with OVEP.
272+
273+
### OpenVINO Execution Provider Supports EP-Weight Sharing across sessions
274+
The OpenVINO Execution Provider (OVEP) in ONNX Runtime supports EP-Weight Sharing, enabling models to efficiently share weights across multiple inference sessions. This feature enhances the execution of Large Language Models (LLMs) with prefill and KV cache, reducing memory consumption and improving performance when running multiple inferences.
275+
276+
With EP-Weight Sharing, prefill and KV cache models can now reuse the same set of weights, minimizing redundancy and optimizing inference. Additionally, this ensures that EP Context nodes are still created even when the model undergoes subgraph partitioning.
277+
278+
These changes enable weight sharing between two models using the session context option: ep.share_ep_contexts.
279+
Refer to [Session Options](https://github.com/microsoft/onnxruntime/blob/5068ab9b190c549b546241aa7ffbe5007868f595/include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h#L319) for more details on configuring this runtime option.
280+
281+
### OVEP supports CreateSessionFromArray API
282+
The OpenVINO Execution Provider (OVEP) in ONNX Runtime supports creating sessions from memory using the CreateSessionFromArray API. This allows loading models directly from memory buffers instead of file paths. The CreateSessionFromArray loads the model in memory then creates a session from the in-memory byte array.
283+
284+
Note:
285+
Use the -l argument when running the inference with perf_test using CreateSessionFromArray API.
286+
231287
## Configuration Options
232288
233289
OpenVINO™ Execution Provider can be configured with certain options at runtime that control the behavior of the EP. These options can be set as key-value pairs as below:-
@@ -245,17 +301,20 @@ The session configuration options are passed to SessionOptionsAppendExecutionPro
245301
246302
```
247303
std::unordered_map<std::string, std::string> options;
248-
options["device_type"] = "GPU";
249-
options["precision"] = "FP32";
304+
options[device_type] = "GPU";
305+
options[precision] = "FP32";
250306
options[num_of_threads] = "8";
251307
options[num_streams] = "8";
252308
options[cache_dir] = "";
253309
options[context] = "0x123456ff";
254-
options[enable_opencl_throttling] = "false";
255-
session_options.AppendExecutionProvider("OpenVINO", options);
310+
options[enable_qdq_optimizer] = "True";
311+
options[load_config] = "config_path.json";
312+
session_options.AppendExecutionProvider_OpenVINO_V2(options);
256313
```
257314
258-
### C/C++ Legacy API
315+
### C/C++ Legacy API
316+
Note: This API is no longer officially supported. Users are requested to move to V2 API.
317+
259318
The session configuration options are passed to SessionOptionsAppendExecutionProvider_OpenVINO() API as shown in an example below for GPU device type:
260319
261320
```
@@ -269,7 +328,7 @@ SessionOptions.AppendExecutionProvider_OpenVINO(session_options, &options);
269328
```
270329
271330
### Onnxruntime Graph level Optimization
272-
OpenVINO™ backend performs hardware, dependent as well as independent optimizations on the graph to infer it on the target hardware with best possible performance. In most cases it has been observed that passing the ONNX input graph as is without explicit optimizations would lead to best possible optimizations at kernel level by OpenVINO™. For this reason, it is advised to turn off high level optimizations performed by ONNX Runtime for OpenVINO™ Execution Provider. This can be done using SessionOptions() as shown below:-
331+
OpenVINO™ backend performs hardware, dependent as well as independent optimizations on the graph to infer it on the target hardware with best possible performance. In most cases it has been observed that passing the ONNX input graph as it is without explicit optimizations would lead to best possible optimizations at kernel level by OpenVINO™. For this reason, it is advised to turn off high level optimizations performed by ONNX Runtime for OpenVINO™ Execution Provider. This can be done using SessionOptions() as shown below:-
273332
274333
* #### Python API
275334
```
@@ -289,27 +348,28 @@ The following table lists all the available configuration options for API 2.0 an
289348
290349
| **Key** | **Key type** | **Allowable Values** | **Value type** | **Description** |
291350
| --- | --- | --- | --- | --- |
292-
| device_type | string | CPU, NPU, GPU, GPU.0, GPU.1 based on the avaialable GPUs, NPU, Any valid Hetero combination, Any valid Multi or Auto devices combination | string | Overrides the accelerator hardware type with these values at runtime. If this option is not explicitly set, default hardware specified during build is used. |
351+
| device_type | string | CPU, NPU, GPU, GPU.0, GPU.1 based on the available GPUs, NPU, Any valid Hetero combination, Any valid Multi or Auto devices combination | string | Overrides the accelerator hardware type with these values at runtime. If this option is not explicitly set, default hardware specified during build is used. |
293352
| precision | string | FP32, FP16, ACCURACY based on the device_type chosen | string | Supported precisions for HW {CPU:FP32, GPU:[FP32, FP16, ACCURACY], NPU:FP16}. Default precision for HW for optimized performance {CPU:FP32, GPU:FP16, NPU:FP16}. To execute model with the default input precision, select ACCURACY precision type. |
294353
| num_of_threads | string | Any unsigned positive number other than 0 | size_t | Overrides the accelerator default value of number of threads with this value at runtime. If this option is not explicitly set, default value of 8 during build time will be used for inference. |
295354
| num_streams | string | Any unsigned positive number other than 0 | size_t | Overrides the accelerator default streams with this value at runtime. If this option is not explicitly set, default value of 1, performance for latency is used during build time will be used for inference. |
296355
| cache_dir | string | Any valid string path on the hardware target | string | Explicitly specify the path to save and load the blobs enabling model caching feature.|
297356
| context | string | OpenCL Context | void* | This option is only available when OpenVINO EP is built with OpenCL flags enabled. It takes in the remote context i.e the cl_context address as a void pointer.|
298357
| enable_opencl_throttling | string | True/False | boolean | This option enables OpenCL queue throttling for GPU devices (reduces CPU utilization when using GPU). |
299358
| enable_qdq_optimizer | string | True/False | boolean | This option enables QDQ Optimization to improve model performance and accuracy on NPU. |
359+
| load_config | string | Any custom JSON path | string | This option enables a feature for loading custom JSON OV config during runtime which sets OV parameters. |
300360
301361
302362
Valid Hetero or Multi or Auto Device combinations:
303363
HETERO:<DEVICE_TYPE_1>,<DEVICE_TYPE_2>,<DEVICE_TYPE_3>...
304364
The <DEVICE_TYPE> can be any of these devices from this list ['CPU','GPU', 'NPU']
305365
306-
A minimum of two DEVICE_TYPE'S should be specified for a valid HETERO or Multi-Device Build.
366+
A minimum of two DEVICE_TYPE'S should be specified for a valid HETERO, MULTI, or AUTO Device Build.
307367
308368
Example:
309369
HETERO:GPU,CPU AUTO:GPU,CPU MULTI:GPU,CPU
310370
311371
Deprecated device_type option :
312-
CPU_FP32, GPU_FP32, GPU_FP16 as still supported. It will be deprectaed in the future release. Kindly upgrade to latest device_type and precision option.
372+
CPU_FP32, GPU_FP32, GPU_FP16, NPU_FP16 are no more supported. They will be deprecated in the future release. Kindly upgrade to latest device_type and precision option.
313373
314374
## Support Coverage
315375
@@ -460,7 +520,7 @@ Atom, Core, and Xeon processors. GPU refers to the Intel Integrated Graphics. In
460520
### Topology Support
461521
462522
Below topologies from ONNX open model zoo are fully supported on OpenVINO™ Execution Provider and many more are supported through sub-graph partitioning.
463-
For NPU is model is not supported we fallback to CPU.
523+
For NPU if model is not supported we fallback to CPU.
464524
465525
### Image Classification Networks
466526
@@ -540,6 +600,31 @@ For NPU is model is not supported we fallback to CPU.
540600
| twitter-roberta-base-sentiment | Yes | Yes |
541601
| xlm-roberta-base | Yes | Yes |
542602
603+
### Models Supported on NPU
604+
605+
| **MODEL NAME** | **NPU** |
606+
| --- | --- |
607+
| yolov3 | Yes |
608+
| microsoft_resnet-50 | Yes |
609+
| realesrgan-x4 | Yes |
610+
| timm_inception_v4.tf_in1k | Yes |
611+
| squeezenet1.0-qdq | Yes |
612+
| vgg16 | Yes |
613+
| caffenet-qdq | Yes |
614+
| zfnet512 | Yes |
615+
| shufflenet-v2 | Yes |
616+
| zfnet512-qdq | Yes |
617+
| googlenet | Yes |
618+
| googlenet-qdq | Yes |
619+
| caffenet | Yes |
620+
| bvlcalexnet-qdq | Yes |
621+
| vgg16-qdq | Yes |
622+
| mnist | Yes |
623+
| ResNet101-DUC | Yes |
624+
| shufflenet-v2-qdq | Yes |
625+
| bvlcalexnet | Yes |
626+
| squeezenet1.0 | Yes |
627+
543628
**Note:** We have added support for INT8 models, quantized with Neural Network Compression Framework (NNCF). To know more about NNCF refer [here](https://github.com/openvinotoolkit/nncf).
544629
545630
## OpenVINO™ Execution Provider Samples Tutorials

0 commit comments

Comments
 (0)