You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### Description
This PR updates the ONNX Runtime documentation to reflect the changes
and aligns with ORT 1.21.0.
---------
Co-authored-by: sfatimar <[email protected]>
Co-authored-by: vthaniel <[email protected]>
Co-authored-by: TejalKhade28 <[email protected]>
Copy file name to clipboardExpand all lines: docs/build/eps.md
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -277,11 +277,11 @@ See more information on the OpenVINO™ Execution Provider [here](../execution-p
277
277
278
278
1. Install the OpenVINO™ offline/online installer from Intel<sup>®</sup> Distribution of OpenVINO™<sup>TM</sup> Toolkit **Release 2024.3**for the appropriate OS and target hardware:
Follow [documentation](https://docs.openvino.ai/2024/home.html) for detailed instructions.
283
283
284
-
*2024.3 is the current recommended OpenVINO™ version. [OpenVINO™ 2023.3](https://docs.openvino.ai/2023.3/home.html) is minimal OpenVINO™ version requirement.*
284
+
*2024.5 is the current recommended OpenVINO™ version. [OpenVINO™ 2024.5](https://docs.openvino.ai/2024/index.html) is minimal OpenVINO™ version requirement.*
285
285
286
286
2. Configure the target hardware with specific follow on instructions:
287
287
* To configure Intel<sup>®</sup> Processor Graphics(GPU) please follow these instructions: [Windows](https://docs.openvino.ai/2024/get-started/configurations/configurations-intel-gpu.html#windows), [Linux](https://docs.openvino.ai/2024/get-started/configurations/configurations-intel-gpu.html#linux)
@@ -227,7 +229,61 @@ Refer to [Session Options](https://github.com/microsoft/onnxruntime/blob/main/in
227
229
### Enable QDQ Optimizations Passes
228
230
Optimizes ORT quantized models for the NPU device to only keep QDQs for supported ops and optimize for performance and accuracy.Generally this feature will give better performance/accuracy with ORT Optimizations disabled.
229
231
Refer to [Configuration Options](#configuration-options) for more information about using these runtime options.
232
+
233
+
### Loading Custom JSON OV Config During Runtime
234
+
This feature is developed to facilitate loading of OVEP parameters from a single JSON configuration file.
235
+
The JSON input schema must be of format -
236
+
```
237
+
{
238
+
"DEVICE_KEY": {"PROPERTY": "PROPERTY_VALUE"}
239
+
}
240
+
```
241
+
where "DEVICE_KEY" can be CPU, NPU or GPU , "PROPERTY" must be a valid entity defined in OV from its properties.hpp sections and "PROPERTY_VALUE" must be passed in as a string. If we pass any other type like int/bool we encounter errors from ORT like below -
242
+
243
+
Exception during initialization: [json.exception.type_error.302] type must be string, but is a number.
244
+
245
+
While one can set the int/bool values like this "NPU_TILES": "2" which is valid (refer to the example given below).
246
+
If someone passes incorrect keys, it will be skipped with a warning while incorrect values assigned to a valid key will result in an exception arising from OV framework.
230
247
248
+
The valid properties are of 2 types viz. MUTABLE (R/W) & IMMUTABLE (R ONLY) these are also governed while setting the same. If an IMMUTABLE property is being set, we skip setting the same with a similar warning.
249
+
250
+
Example:
251
+
252
+
The usage of this functionality using onnxruntime_perf_test application is as below –
To explicitly enable logs one must use "LOG_LEVEL": "LOG_DEBUG" in the JSON device configuration property. The log verifies that the correct device parameters and properties are being set / populated during runtime with OVEP.
272
+
273
+
### OpenVINO Execution Provider Supports EP-Weight Sharing across sessions
274
+
The OpenVINO Execution Provider (OVEP) in ONNX Runtime supports EP-Weight Sharing, enabling models to efficiently share weights across multiple inference sessions. This feature enhances the execution of Large Language Models (LLMs) with prefill and KV cache, reducing memory consumption and improving performance when running multiple inferences.
275
+
276
+
With EP-Weight Sharing, prefill and KV cache models can now reuse the same set of weights, minimizing redundancy and optimizing inference. Additionally, this ensures that EP Context nodes are still created even when the model undergoes subgraph partitioning.
277
+
278
+
These changes enable weight sharing between two models using the session context option: ep.share_ep_contexts.
279
+
Refer to [Session Options](https://github.com/microsoft/onnxruntime/blob/5068ab9b190c549b546241aa7ffbe5007868f595/include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h#L319) for more details on configuring this runtime option.
280
+
281
+
### OVEP supports CreateSessionFromArray API
282
+
The OpenVINO Execution Provider (OVEP) in ONNX Runtime supports creating sessions from memory using the CreateSessionFromArray API. This allows loading models directly from memory buffers instead of file paths. The CreateSessionFromArray loads the model in memory then creates a session from the in-memory byte array.
283
+
284
+
Note:
285
+
Use the -l argument when running the inference with perf_test using CreateSessionFromArray API.
286
+
231
287
## Configuration Options
232
288
233
289
OpenVINO™ Execution Provider can be configured with certain options at runtime that control the behavior of the EP. These options can be set as key-value pairs as below:-
@@ -245,17 +301,20 @@ The session configuration options are passed to SessionOptionsAppendExecutionPro
OpenVINO™ backend performs hardware, dependent as well as independent optimizations on the graph to infer it on the target hardware with best possible performance. In most cases it has been observed that passing the ONNX input graph as is without explicit optimizations would lead to best possible optimizations at kernel level by OpenVINO™. For this reason, it is advised to turn off high level optimizations performed by ONNX Runtime for OpenVINO™ Execution Provider. This can be done using SessionOptions() as shown below:-
331
+
OpenVINO™ backend performs hardware, dependent as well as independent optimizations on the graph to infer it on the target hardware with best possible performance. In most cases it has been observed that passing the ONNX input graph as it is without explicit optimizations would lead to best possible optimizations at kernel level by OpenVINO™. For this reason, it is advised to turn off high level optimizations performed by ONNX Runtime for OpenVINO™ Execution Provider. This can be done using SessionOptions() as shown below:-
273
332
274
333
* #### Python API
275
334
```
@@ -289,27 +348,28 @@ The following table lists all the available configuration options for API 2.0 an
| device_type | string | CPU, NPU, GPU, GPU.0, GPU.1 based on the avaialable GPUs, NPU, Any valid Hetero combination, Any valid Multi or Auto devices combination | string | Overrides the accelerator hardware type with these values at runtime. If this option is not explicitly set, default hardware specified during build is used. |
351
+
| device_type | string | CPU, NPU, GPU, GPU.0, GPU.1 based on the available GPUs, NPU, Any valid Hetero combination, Any valid Multi or Auto devices combination | string | Overrides the accelerator hardware type with these values at runtime. If this option is not explicitly set, default hardware specified during build is used. |
293
352
| precision | string | FP32, FP16, ACCURACY based on the device_type chosen | string | Supported precisions for HW {CPU:FP32, GPU:[FP32, FP16, ACCURACY], NPU:FP16}. Default precision for HW for optimized performance {CPU:FP32, GPU:FP16, NPU:FP16}. To execute model with the default input precision, select ACCURACY precision type. |
294
353
| num_of_threads | string | Any unsigned positive number other than 0 | size_t | Overrides the accelerator default value of number of threads with this value at runtime. If this option is not explicitly set, default value of 8 during build time will be used for inference. |
295
354
| num_streams | string | Any unsigned positive number other than 0 | size_t | Overrides the accelerator default streams with this value at runtime. If this option is not explicitly set, default value of 1, performance for latency is used during build time will be used for inference. |
296
355
| cache_dir | string | Any valid string path on the hardware target | string | Explicitly specify the path to save and load the blobs enabling model caching feature.|
297
356
| context | string | OpenCL Context | void* | This option is only available when OpenVINO EP is built with OpenCL flags enabled. It takes in the remote context i.e the cl_context address as a void pointer.|
298
357
| enable_opencl_throttling | string | True/False | boolean | This option enables OpenCL queue throttling for GPU devices (reduces CPU utilization when using GPU). |
299
358
| enable_qdq_optimizer | string | True/False | boolean | This option enables QDQ Optimization to improve model performance and accuracy on NPU. |
359
+
| load_config | string | Any custom JSON path | string | This option enables a feature for loading custom JSON OV config during runtime which sets OV parameters. |
300
360
301
361
302
362
Valid Hetero or Multi or Auto Device combinations:
The <DEVICE_TYPE> can be any of these devices from this list ['CPU','GPU', 'NPU']
305
365
306
-
A minimum of two DEVICE_TYPE'S should be specified for a valid HETEROor Multi-Device Build.
366
+
A minimum of two DEVICE_TYPE'S should be specified for a valid HETERO, MULTI, or AUTO Device Build.
307
367
308
368
Example:
309
369
HETERO:GPU,CPU AUTO:GPU,CPU MULTI:GPU,CPU
310
370
311
371
Deprecated device_type option :
312
-
CPU_FP32, GPU_FP32, GPU_FP16 as still supported. It will be deprectaedin the future release. Kindly upgrade to latest device_type and precision option.
372
+
CPU_FP32, GPU_FP32, GPU_FP16, NPU_FP16 are no more supported. They will be deprecatedin the future release. Kindly upgrade to latest device_type and precision option.
313
373
314
374
## Support Coverage
315
375
@@ -460,7 +520,7 @@ Atom, Core, and Xeon processors. GPU refers to the Intel Integrated Graphics. In
460
520
### Topology Support
461
521
462
522
Below topologies from ONNX open model zoo are fully supported on OpenVINO™ Execution Provider and many more are supported through sub-graph partitioning.
463
-
For NPU is model is not supported we fallback to CPU.
523
+
For NPU if model is not supported we fallback to CPU.
464
524
465
525
### Image Classification Networks
466
526
@@ -540,6 +600,31 @@ For NPU is model is not supported we fallback to CPU.
540
600
| twitter-roberta-base-sentiment | Yes | Yes |
541
601
| xlm-roberta-base | Yes | Yes |
542
602
603
+
### Models Supported on NPU
604
+
605
+
|**MODEL NAME**|**NPU**|
606
+
| --- | --- |
607
+
| yolov3 | Yes |
608
+
| microsoft_resnet-50 | Yes |
609
+
| realesrgan-x4 | Yes |
610
+
| timm_inception_v4.tf_in1k | Yes |
611
+
| squeezenet1.0-qdq | Yes |
612
+
| vgg16 | Yes |
613
+
| caffenet-qdq | Yes |
614
+
| zfnet512 | Yes |
615
+
| shufflenet-v2 | Yes |
616
+
| zfnet512-qdq | Yes |
617
+
| googlenet | Yes |
618
+
| googlenet-qdq | Yes |
619
+
| caffenet | Yes |
620
+
| bvlcalexnet-qdq | Yes |
621
+
| vgg16-qdq | Yes |
622
+
| mnist | Yes |
623
+
| ResNet101-DUC | Yes |
624
+
| shufflenet-v2-qdq | Yes |
625
+
| bvlcalexnet | Yes |
626
+
| squeezenet1.0 | Yes |
627
+
543
628
**Note:** We have added support for INT8 models, quantized with Neural Network Compression Framework (NNCF). To know more about NNCF refer [here](https://github.com/openvinotoolkit/nncf).
0 commit comments