Skip to content

Commit bd89231

Browse files
authored
Modify doc to describe oneDNN Graph INT8 for GPU (#2302)
1 parent 4196ea8 commit bd89231

File tree

4 files changed

+17
-10
lines changed

4 files changed

+17
-10
lines changed

docs/guide/INT8_quantization.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,7 @@ Quantization is a very popular deep learning model optimization technique invent
55

66
Intel® Extension for TensorFlow\* co-works with [Intel® Neural Compressor](https://github.com/intel/neural-compressor) v2.0 or newer to provide INT8 quantization solutions for both GPU and CPU.
77

8-
On GPU, Intel® Extension for Tensorflow\* uses legacy compatible TensorFlow INT8 quantization solution. Refer to [accelerating AlexNet quantization example](../../examples/accelerate_alexnet_by_quantization/README.md).
9-
10-
On CPU, Intel® Extension for Tensorflow\* integrates and adopts new [oneDNN Graph API](https://spec.oneapi.io/onednn-graph/latest/introduction.html) in INT8 quantization for better out of box performance as shown in the following example.
8+
Intel® Extension for Tensorflow\* integrates and adopts new [oneDNN Graph API](https://spec.oneapi.io/onednn-graph/latest/introduction.html) in INT8 quantization for better out of box performance as shown in the following example.
119

1210
## Workflow
1311
![](images/INT8_flow.png)
@@ -19,7 +17,7 @@ The workflow contains 2 parts: graph optimization and graph executor.
1917

2018

2119
## Usage
22-
oneDNN Graph optimization pass is enabled on INT8 quantization by default on CPU.
20+
oneDNN Graph optimization pass is enabled on INT8 quantization by default on GPU and CPU.
2321

2422
For better performance in INT8 models execution, TF grappler constant folding optimization must be disabled by the following environment variable setting.
2523

examples/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ A wide variety of examples are provided to demonstrate the usage of Intel® Exte
1010
|[Speed up Inference of Inception v4 by Advanced Automatic Mixed Precision via Docker Container or Bare Metal](./infer_inception_v4_amp)|Test and compare the performance of inference with FP32 and Advanced Automatic Mixed Precision (AMP) (mix BF16/FP16 and FP32).<br>Shows the acceleration of inference by Advanced AMP on Intel CPU and GPU via Docker Container or Bare Metal.|CPU & GPU|
1111
|[Accelerate AlexNet by Quantization with Intel® Extension for TensorFlow*](./accelerate_alexnet_by_quantization)| An end-to-end example to show a pipeline to build up a CNN model to <br>recognize handwriting number and speed up AI model with quantization <br>by Intel® Neural Compressor and Intel® Extension for TensorFlow* on Intel GPU.|GPU|
1212
|[Accelerate Deep Learning Inference for Model Zoo Workloads on Intel CPU and GPU](./model_zoo_example)|Examples on running Model Zoo workloads on Intel CPU and GPU with the optimizations from Intel® Extension for TensorFlow*, without any code changes.|CPU & GPU|
13-
|[Quantize Inception V3 by Intel® Extension for TensorFlow* on Intel® Xeon®](./quantize_inception_v3)|An end-to-end example to show how Intel® Extension for TensorFlow* provides quantization feature by cooperating with Intel® Neural Compressor and oneDNN Graph. It will provide better quantization: better performance and accuracy loss is in controlled.|CPU|
13+
|[Quantize Inception V3 by Intel® Extension for TensorFlow* on Intel® Xeon®](./quantize_inception_v3)|An end-to-end example to show how Intel® Extension for TensorFlow* provides quantization feature by cooperating with Intel® Neural Compressor and oneDNN Graph. It will provide better quantization: better performance and accuracy loss is in controlled.|CPU & GPU|
1414
|[ResNet50 and Mnist training with Horovod](./train_horovod)|ResNet50 and Mnist distributed training examples on Intel GPU.|GPU|
1515
|[Stable Diffusion Inference for Text2Image on Intel GPU](./stable_diffussion_inference)|Example for running Stable Diffusion Text2Image inference on Intel GPU with the optimizations from Intel® Extension for TensorFlow*.|GPU|
1616
|[Accelerate ResNet50 Training by XPUAutoShard on Intel GPU](./train_resnet50_with_autoshard)|Example on running ResNet50 training on Intel GPU with the XPUAutoShard feature.|GPU|

examples/examples.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ A wide variety of examples are provided to demonstrate the usage of Intel® Exte
1010
|[Speed up Inference of Inception v4 by Advanced Automatic Mixed Precision via Docker Container or Bare Metal](./infer_inception_v4_amp/README.html)|Test and compare the performance of inference with FP32 and Advanced Automatic Mixed Precision (AMP) (mix BF16/FP16 and FP32).<br>Shows the acceleration of inference by Advanced AMP on Intel® CPU and GPU via Docker Container or Bare Metal.|CPU & GPU|
1111
|[Accelerate AlexNet by Quantization with Intel® Extension for TensorFlow*](./accelerate_alexnet_by_quantization/README.html)| An end-to-end example to show a pipeline to build up a CNN model to <br>recognize handwriting number and speed up AI model with quantization <br>by Intel® Neural Compressor and Intel® Extension for TensorFlow* on Intel GPU.|GPU|
1212
|[Accelerate Deep Learning Inference for Model Zoo Workloads on Intel CPU and GPU](./model_zoo_example/README.html)|Examples on running Model Zoo workloads on Intel CPU and GPU with the optimizations from Intel® Extension for TensorFlow*, without any code changes.|CPU & GPU|
13-
|[Quantize Inception V3 by Intel® Extension for TensorFlow* on Intel® Xeon®](./quantize_inception_v3/README.html)|An end-to-end example to show how Intel® Extension for TensorFlow* provides quantization feature by cooperating with Intel® Neural Compressor and oneDNN Graph. It will provide better quantization: better performance and accuracy loss is in controlled.|CPU|
13+
|[Quantize Inception V3 by Intel® Extension for TensorFlow*](./quantize_inception_v3/README.html)|An end-to-end example to show how Intel® Extension for TensorFlow* provides quantization feature by cooperating with Intel® Neural Compressor and oneDNN Graph. It will provide better quantization: better performance and accuracy loss is in controlled.|CPU & GPU|
1414
|[Mnist training with Intel® Optimization for Horovod*](./train_horovod/mnist/README.html)|Mnist distributed training example on Intel GPU. |GPU|
1515
|[ResNet50 training with Intel® Optimization for Horovod*](./train_horovod/resnet50/README.html)|ResNet50 distributed training example on Intel GPU. |GPU|
1616
|[Stable Diffusion Inference for Text2Image on Intel GPU](./stable_diffussion_inference/README.html)|Example for running Stable Diffusion Text2Image inference on Intel GPU with the optimizations from Intel® Extension for TensorFlow*. |GPU|

examples/quantize_inception_v3/README.md

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,18 +17,18 @@ The example shows an end-to-end pipeline:
1717

1818
2. Execute the calibration by Intel® Neural Compressor.
1919

20-
3. Quantize and accelerate the inference by Intel® Extension for Tensorflow* for CPU.
20+
3. Quantize and accelerate the inference by Intel® Extension for Tensorflow* for GPU and CPU.
2121

2222

2323
## Configuration
2424

2525
### Intel® Extension for Tensorflow* Version
2626

27-
Install Intel® Extension for Tensorflow* > 1.1.0 for this feature.
27+
Install Intel® Extension for Tensorflow* > 2.13.0 for this feature.
2828

2929
### Enable oneDNN Graph
3030

31-
By default, oneDNN Graph is enabled in Intel® Extension for Tensorflow* on CPU for INT8 models.
31+
By default, oneDNN Graph is enabled in Intel® Extension for Tensorflow* for INT8 models.
3232

3333
Enable it explicitly by:
3434

@@ -65,6 +65,7 @@ tf.compat.v1.keras.backend.set_session(session)
6565
```
6666

6767
## Hardware Environment
68+
Support: Intel® Xeon® CPU & Intel® Data Center Flex Series GPU.
6869

6970
### CPU
7071

@@ -91,9 +92,17 @@ lscpu | grep amx
9192
```
9293
You are expected to see `amx_bf16` and `amx_int8`, otherwise your processors do not support Intel® Advanced Matrix Extensions.
9394

95+
### GPU
96+
97+
Support: Intel® Data Center Flex Series GPU.
98+
99+
#### Local Server
100+
101+
Install the GPU driver and oneAPI packages by referring to [Intel GPU Software Installation](/docs/install/install_for_gpu.md).
102+
94103
### Intel® DevCloud
95104

96-
If you have no CPU support Intel® Deep Learning Boost or Intel® Advanced Matrix Extensions, you could register on Intel® DevCloud and try this example on an second generation Intel® Xeon based processors or newer. To learn more about working with Intel® DevCloud, refer to [Intel® DevCloud](https://www.intel.com/content/www/us/en/developer/tools/devcloud/overview.html)
105+
If you have no CPU support Intel® Deep Learning Boost or Intel® Advanced Matrix Extensions or no Intel GPU support INT8, you could register on Intel® DevCloud and try this example on an second generation Intel® Xeon based processors or newer. To learn more about working with Intel® DevCloud, refer to [Intel® DevCloud](https://www.intel.com/content/www/us/en/developer/tools/devcloud/overview.html)
97106

98107

99108
## Running Environment

0 commit comments

Comments
 (0)