Modify doc to describe oneDNN Graph INT8 for GPU (#2302)

retonym · web-flow · commit bd89231eab7b · 2023-07-27T09:04:05.000+08:00
diff --git a/docs/guide/INT8_quantization.md b/docs/guide/INT8_quantization.md
@@ -5,9 +5,7 @@ Quantization is a very popular deep learning model optimization technique invent
 
 Intel® Extension for TensorFlow\* co-works with [Intel® Neural Compressor](https://github.com/intel/neural-compressor) v2.0 or newer to provide INT8 quantization solutions for both GPU and CPU. 
 
-On GPU, Intel® Extension for Tensorflow\* uses legacy compatible TensorFlow INT8 quantization solution. Refer to [accelerating AlexNet quantization example](../../examples/accelerate_alexnet_by_quantization/README.md). 
-
-On CPU, Intel® Extension for Tensorflow\* integrates and adopts new [oneDNN Graph API](https://spec.oneapi.io/onednn-graph/latest/introduction.html) in INT8 quantization for better out of box performance as shown in the following example. 
+Intel® Extension for Tensorflow\* integrates and adopts new [oneDNN Graph API](https://spec.oneapi.io/onednn-graph/latest/introduction.html) in INT8 quantization for better out of box performance as shown in the following example. 
 
 ## Workflow 
 ![](images/INT8_flow.png)
@@ -19,7 +17,7 @@ The workflow contains 2 parts: graph optimization and graph executor.
 
 
 ## Usage
-oneDNN Graph optimization pass is enabled on INT8 quantization by default on CPU. 
+oneDNN Graph optimization pass is enabled on INT8 quantization by default on GPU and CPU. 
 
 For better performance in INT8 models execution, TF grappler constant folding optimization must be disabled by the following environment variable setting.
 
diff --git a/examples/README.md b/examples/README.md
@@ -10,7 +10,7 @@ A wide variety of examples are provided to demonstrate the usage of Intel® Exte
 |[Speed up Inference of Inception v4 by Advanced Automatic Mixed Precision via Docker Container or Bare Metal](./infer_inception_v4_amp)|Test and compare the performance of inference with FP32 and Advanced Automatic Mixed Precision (AMP) (mix BF16/FP16 and FP32).<br>Shows the acceleration of inference by Advanced AMP on Intel CPU and GPU via Docker Container or Bare Metal.|CPU & GPU|
 |[Accelerate AlexNet by Quantization with Intel® Extension for TensorFlow*](./accelerate_alexnet_by_quantization)| An end-to-end example to show a pipeline to build up a CNN model to <br>recognize handwriting number and speed up AI model with quantization <br>by Intel® Neural Compressor and Intel® Extension for TensorFlow* on Intel GPU.|GPU|
 |[Accelerate Deep Learning Inference for Model Zoo Workloads on Intel CPU and GPU](./model_zoo_example)|Examples on running Model Zoo workloads on Intel CPU and GPU with the optimizations from Intel® Extension for TensorFlow*, without any code changes.|CPU & GPU|
-|[Quantize Inception V3 by Intel® Extension for TensorFlow* on Intel® Xeon®](./quantize_inception_v3)|An end-to-end example to show how Intel® Extension for TensorFlow* provides quantization feature by cooperating with Intel® Neural Compressor and oneDNN Graph. It will provide better quantization: better performance and accuracy loss is in controlled.|CPU|
+|[Quantize Inception V3 by Intel® Extension for TensorFlow* on Intel® Xeon®](./quantize_inception_v3)|An end-to-end example to show how Intel® Extension for TensorFlow* provides quantization feature by cooperating with Intel® Neural Compressor and oneDNN Graph. It will provide better quantization: better performance and accuracy loss is in controlled.|CPU & GPU|
 |[ResNet50 and Mnist training with Horovod](./train_horovod)|ResNet50 and Mnist distributed training examples on Intel GPU.|GPU|
 |[Stable Diffusion Inference for Text2Image on Intel GPU](./stable_diffussion_inference)|Example for running Stable Diffusion Text2Image inference on Intel GPU with the optimizations from Intel® Extension for TensorFlow*.|GPU|
 |[Accelerate ResNet50 Training by XPUAutoShard on Intel GPU](./train_resnet50_with_autoshard)|Example on running ResNet50 training on Intel GPU with the XPUAutoShard feature.|GPU|
diff --git a/examples/examples.md b/examples/examples.md
@@ -10,7 +10,7 @@ A wide variety of examples are provided to demonstrate the usage of Intel® Exte
 |[Speed up Inference of Inception v4 by Advanced Automatic Mixed Precision via Docker Container or Bare Metal](./infer_inception_v4_amp/README.html)|Test and compare the performance of inference with FP32 and Advanced Automatic Mixed Precision (AMP) (mix BF16/FP16 and FP32).<br>Shows the acceleration of inference by Advanced AMP on Intel® CPU and GPU via Docker Container or Bare Metal.|CPU & GPU|
 |[Accelerate AlexNet by Quantization with Intel® Extension for TensorFlow*](./accelerate_alexnet_by_quantization/README.html)| An end-to-end example to show a pipeline to build up a CNN model to <br>recognize handwriting number and speed up AI model with quantization <br>by Intel® Neural Compressor and Intel® Extension for TensorFlow* on Intel GPU.|GPU|
 |[Accelerate Deep Learning Inference for Model Zoo Workloads on Intel CPU and GPU](./model_zoo_example/README.html)|Examples on running Model Zoo workloads on Intel CPU and GPU with the optimizations from Intel® Extension for TensorFlow*, without any code changes.|CPU & GPU|
-|[Quantize Inception V3 by Intel® Extension for TensorFlow* on Intel® Xeon®](./quantize_inception_v3/README.html)|An end-to-end example to show how Intel® Extension for TensorFlow* provides quantization feature by cooperating with Intel® Neural Compressor and oneDNN Graph. It will provide better quantization: better performance and accuracy loss is in controlled.|CPU|
+|[Quantize Inception V3 by Intel® Extension for TensorFlow*](./quantize_inception_v3/README.html)|An end-to-end example to show how Intel® Extension for TensorFlow* provides quantization feature by cooperating with Intel® Neural Compressor and oneDNN Graph. It will provide better quantization: better performance and accuracy loss is in controlled.|CPU & GPU|
 |[Mnist training with Intel® Optimization for Horovod*](./train_horovod/mnist/README.html)|Mnist distributed training example on Intel GPU. |GPU|
 |[ResNet50 training with Intel® Optimization for Horovod*](./train_horovod/resnet50/README.html)|ResNet50 distributed training example on Intel GPU. |GPU|
 |[Stable Diffusion Inference for Text2Image on Intel GPU](./stable_diffussion_inference/README.html)|Example for running Stable Diffusion Text2Image inference on Intel GPU with the optimizations from Intel® Extension for TensorFlow*. |GPU|
diff --git a/examples/quantize_inception_v3/README.md b/examples/quantize_inception_v3/README.md
@@ -17,18 +17,18 @@ The example shows an end-to-end pipeline:
 
 2. Execute the calibration by Intel® Neural Compressor.
 
-3. Quantize and accelerate the inference by Intel® Extension for Tensorflow* for CPU.
+3. Quantize and accelerate the inference by Intel® Extension for Tensorflow* for GPU and CPU.
   
 
 ## Configuration
 
 ### Intel® Extension for Tensorflow* Version
 
-Install Intel® Extension for Tensorflow* > 1.1.0 for this feature.
+Install Intel® Extension for Tensorflow* > 2.13.0 for this feature.
 
 ### Enable oneDNN Graph
 
-By default, oneDNN Graph is enabled in Intel® Extension for Tensorflow* on CPU for INT8 models.
+By default, oneDNN Graph is enabled in Intel® Extension for Tensorflow* for INT8 models.
  
 Enable it explicitly by:
  
@@ -65,6 +65,7 @@ tf.compat.v1.keras.backend.set_session(session)
 ```
 
 ## Hardware Environment
+Support: Intel® Xeon® CPU & Intel® Data Center Flex Series GPU.
 
 ### CPU
 
@@ -91,9 +92,17 @@ lscpu | grep amx
 ```
 You are expected to see `amx_bf16` and `amx_int8`, otherwise your processors do not support Intel® Advanced Matrix Extensions.
 
+### GPU
+
+Support: Intel® Data Center Flex Series GPU.
+
+#### Local Server
+
+Install the GPU driver and oneAPI packages by referring to [Intel GPU Software Installation](/docs/install/install_for_gpu.md).
+
 ### Intel® DevCloud
 
-If you have no CPU support Intel® Deep Learning Boost or Intel® Advanced Matrix Extensions, you could register on Intel® DevCloud and try this example on an second generation Intel® Xeon based processors or newer. To learn more about working with Intel® DevCloud, refer to [Intel® DevCloud](https://www.intel.com/content/www/us/en/developer/tools/devcloud/overview.html)
+If you have no CPU support Intel® Deep Learning Boost or Intel® Advanced Matrix Extensions or no Intel GPU support INT8, you could register on Intel® DevCloud and try this example on an second generation Intel® Xeon based processors or newer. To learn more about working with Intel® DevCloud, refer to [Intel® DevCloud](https://www.intel.com/content/www/us/en/developer/tools/devcloud/overview.html)
 
 
 ## Running Environment