You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/guide/INT8_quantization.md
+2-4Lines changed: 2 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -5,9 +5,7 @@ Quantization is a very popular deep learning model optimization technique invent
5
5
6
6
Intel® Extension for TensorFlow\* co-works with [Intel® Neural Compressor](https://github.com/intel/neural-compressor) v2.0 or newer to provide INT8 quantization solutions for both GPU and CPU.
7
7
8
-
On GPU, Intel® Extension for Tensorflow\* uses legacy compatible TensorFlow INT8 quantization solution. Refer to [accelerating AlexNet quantization example](../../examples/accelerate_alexnet_by_quantization/README.md).
9
-
10
-
On CPU, Intel® Extension for Tensorflow\* integrates and adopts new [oneDNN Graph API](https://spec.oneapi.io/onednn-graph/latest/introduction.html) in INT8 quantization for better out of box performance as shown in the following example.
8
+
Intel® Extension for Tensorflow\* integrates and adopts new [oneDNN Graph API](https://spec.oneapi.io/onednn-graph/latest/introduction.html) in INT8 quantization for better out of box performance as shown in the following example.
11
9
12
10
## Workflow
13
11

@@ -19,7 +17,7 @@ The workflow contains 2 parts: graph optimization and graph executor.
19
17
20
18
21
19
## Usage
22
-
oneDNN Graph optimization pass is enabled on INT8 quantization by default on CPU.
20
+
oneDNN Graph optimization pass is enabled on INT8 quantization by default on GPU and CPU.
23
21
24
22
For better performance in INT8 models execution, TF grappler constant folding optimization must be disabled by the following environment variable setting.
Copy file name to clipboardExpand all lines: examples/README.md
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@ A wide variety of examples are provided to demonstrate the usage of Intel® Exte
10
10
|[Speed up Inference of Inception v4 by Advanced Automatic Mixed Precision via Docker Container or Bare Metal](./infer_inception_v4_amp)|Test and compare the performance of inference with FP32 and Advanced Automatic Mixed Precision (AMP) (mix BF16/FP16 and FP32).<br>Shows the acceleration of inference by Advanced AMP on Intel CPU and GPU via Docker Container or Bare Metal.|CPU & GPU|
11
11
|[Accelerate AlexNet by Quantization with Intel® Extension for TensorFlow*](./accelerate_alexnet_by_quantization)| An end-to-end example to show a pipeline to build up a CNN model to <br>recognize handwriting number and speed up AI model with quantization <br>by Intel® Neural Compressor and Intel® Extension for TensorFlow* on Intel GPU.|GPU|
12
12
|[Accelerate Deep Learning Inference for Model Zoo Workloads on Intel CPU and GPU](./model_zoo_example)|Examples on running Model Zoo workloads on Intel CPU and GPU with the optimizations from Intel® Extension for TensorFlow*, without any code changes.|CPU & GPU|
13
-
|[Quantize Inception V3 by Intel® Extension for TensorFlow* on Intel® Xeon®](./quantize_inception_v3)|An end-to-end example to show how Intel® Extension for TensorFlow* provides quantization feature by cooperating with Intel® Neural Compressor and oneDNN Graph. It will provide better quantization: better performance and accuracy loss is in controlled.|CPU|
13
+
|[Quantize Inception V3 by Intel® Extension for TensorFlow* on Intel® Xeon®](./quantize_inception_v3)|An end-to-end example to show how Intel® Extension for TensorFlow* provides quantization feature by cooperating with Intel® Neural Compressor and oneDNN Graph. It will provide better quantization: better performance and accuracy loss is in controlled.|CPU & GPU|
14
14
|[ResNet50 and Mnist training with Horovod](./train_horovod)|ResNet50 and Mnist distributed training examples on Intel GPU.|GPU|
15
15
|[Stable Diffusion Inference for Text2Image on Intel GPU](./stable_diffussion_inference)|Example for running Stable Diffusion Text2Image inference on Intel GPU with the optimizations from Intel® Extension for TensorFlow*.|GPU|
16
16
|[Accelerate ResNet50 Training by XPUAutoShard on Intel GPU](./train_resnet50_with_autoshard)|Example on running ResNet50 training on Intel GPU with the XPUAutoShard feature.|GPU|
Copy file name to clipboardExpand all lines: examples/examples.md
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@ A wide variety of examples are provided to demonstrate the usage of Intel® Exte
10
10
|[Speed up Inference of Inception v4 by Advanced Automatic Mixed Precision via Docker Container or Bare Metal](./infer_inception_v4_amp/README.html)|Test and compare the performance of inference with FP32 and Advanced Automatic Mixed Precision (AMP) (mix BF16/FP16 and FP32).<br>Shows the acceleration of inference by Advanced AMP on Intel® CPU and GPU via Docker Container or Bare Metal.|CPU & GPU|
11
11
|[Accelerate AlexNet by Quantization with Intel® Extension for TensorFlow*](./accelerate_alexnet_by_quantization/README.html)| An end-to-end example to show a pipeline to build up a CNN model to <br>recognize handwriting number and speed up AI model with quantization <br>by Intel® Neural Compressor and Intel® Extension for TensorFlow* on Intel GPU.|GPU|
12
12
|[Accelerate Deep Learning Inference for Model Zoo Workloads on Intel CPU and GPU](./model_zoo_example/README.html)|Examples on running Model Zoo workloads on Intel CPU and GPU with the optimizations from Intel® Extension for TensorFlow*, without any code changes.|CPU & GPU|
13
-
|[Quantize Inception V3 by Intel® Extension for TensorFlow* on Intel® Xeon®](./quantize_inception_v3/README.html)|An end-to-end example to show how Intel® Extension for TensorFlow* provides quantization feature by cooperating with Intel® Neural Compressor and oneDNN Graph. It will provide better quantization: better performance and accuracy loss is in controlled.|CPU|
13
+
|[Quantize Inception V3 by Intel® Extension for TensorFlow*](./quantize_inception_v3/README.html)|An end-to-end example to show how Intel® Extension for TensorFlow* provides quantization feature by cooperating with Intel® Neural Compressor and oneDNN Graph. It will provide better quantization: better performance and accuracy loss is in controlled.|CPU & GPU|
14
14
|[Mnist training with Intel® Optimization for Horovod*](./train_horovod/mnist/README.html)|Mnist distributed training example on Intel GPU. |GPU|
15
15
|[ResNet50 training with Intel® Optimization for Horovod*](./train_horovod/resnet50/README.html)|ResNet50 distributed training example on Intel GPU. |GPU|
16
16
|[Stable Diffusion Inference for Text2Image on Intel GPU](./stable_diffussion_inference/README.html)|Example for running Stable Diffusion Text2Image inference on Intel GPU with the optimizations from Intel® Extension for TensorFlow*. |GPU|
Support: Intel® Xeon® CPU & Intel® Data Center Flex Series GPU.
68
69
69
70
### CPU
70
71
@@ -91,9 +92,17 @@ lscpu | grep amx
91
92
```
92
93
You are expected to see `amx_bf16` and `amx_int8`, otherwise your processors do not support Intel® Advanced Matrix Extensions.
93
94
95
+
### GPU
96
+
97
+
Support: Intel® Data Center Flex Series GPU.
98
+
99
+
#### Local Server
100
+
101
+
Install the GPU driver and oneAPI packages by referring to [Intel GPU Software Installation](/docs/install/install_for_gpu.md).
102
+
94
103
### Intel® DevCloud
95
104
96
-
If you have no CPU support Intel® Deep Learning Boost or Intel® Advanced Matrix Extensions, you could register on Intel® DevCloud and try this example on an second generation Intel® Xeon based processors or newer. To learn more about working with Intel® DevCloud, refer to [Intel® DevCloud](https://www.intel.com/content/www/us/en/developer/tools/devcloud/overview.html)
105
+
If you have no CPU support Intel® Deep Learning Boost or Intel® Advanced Matrix Extensions or no Intel GPU support INT8, you could register on Intel® DevCloud and try this example on an second generation Intel® Xeon based processors or newer. To learn more about working with Intel® DevCloud, refer to [Intel® DevCloud](https://www.intel.com/content/www/us/en/developer/tools/devcloud/overview.html)
0 commit comments