Modify example add running with docker example (#2301)

wangkl2 · xiguiw · web-flow · commit 47dd596c74a6 · 2023-07-26T19:58:46.000+08:00
Co-authored-by: Wang, Xigui &lt;xigui.wang@intel.com&gt;
diff --git a/examples/README.md b/examples/README.md
@@ -7,10 +7,10 @@ A wide variety of examples are provided to demonstrate the usage of Intel® Exte
 |[Quick Example](quick_example.md)|Quick example to verify Intel® Extension for TensorFlow* and running environment.|CPU & GPU|
 |[ResNet50 Inference](./infer_resnet50)|ResNet50 inference on Intel CPU or GPU without code changes.|CPU & GPU|
 |[BERT Training for Classifying Text](./train_bert)|BERT training with Intel® Extension for TensorFlow* on Intel CPU or GPU.<br>Use the TensorFlow official example without code change.|CPU & GPU|
-|[Speed up Inference of Inception v4 by Advanced Automatic Mixed Precision](./infer_inception_v4_amp)|Test and compare the performance of inference with FP32 and Advanced Automatic Mixed Precision (AMP) (mix BF16/FP16 and FP32).<br>Shows the acceleration of inference by Advanced AMP on Intel CPU and GPU.|CPU & GPU|
+|[Speed up Inference of Inception v4 by Advanced Automatic Mixed Precision via Docker Container or Bare Metal](./infer_inception_v4_amp)|Test and compare the performance of inference with FP32 and Advanced Automatic Mixed Precision (AMP) (mix BF16/FP16 and FP32).<br>Shows the acceleration of inference by Advanced AMP on Intel CPU and GPU via Docker Container or Bare Metal.|CPU & GPU|
 |[Accelerate AlexNet by Quantization with Intel® Extension for TensorFlow*](./accelerate_alexnet_by_quantization)| An end-to-end example to show a pipeline to build up a CNN model to <br>recognize handwriting number and speed up AI model with quantization <br>by Intel® Neural Compressor and Intel® Extension for TensorFlow* on Intel GPU.|GPU|
 |[Accelerate Deep Learning Inference for Model Zoo Workloads on Intel CPU and GPU](./model_zoo_example)|Examples on running Model Zoo workloads on Intel CPU and GPU with the optimizations from Intel® Extension for TensorFlow*, without any code changes.|CPU & GPU|
 |[Quantize Inception V3 by Intel® Extension for TensorFlow* on Intel® Xeon®](./quantize_inception_v3)|An end-to-end example to show how Intel® Extension for TensorFlow* provides quantization feature by cooperating with Intel® Neural Compressor and oneDNN Graph. It will provide better quantization: better performance and accuracy loss is in controlled.|CPU|
 |[ResNet50 and Mnist training with Horovod](./train_horovod)|ResNet50 and Mnist distributed training examples on Intel GPU.|GPU|
 |[Stable Diffusion Inference for Text2Image on Intel GPU](./stable_diffussion_inference)|Example for running Stable Diffusion Text2Image inference on Intel GPU with the optimizations from Intel® Extension for TensorFlow*.|GPU|
-|[Accelerate ResNet50 Training by XPUAutoShard on Intel GPU](./train_resnet50_with_autoshard)|Example on running ResNet50 training on Intel GPU with the XPUAutoShard feature.|GPU|
+|[Accelerate ResNet50 Training by XPUAutoShard on Intel GPU](./train_resnet50_with_autoshard)|Example on running ResNet50 training on Intel GPU with the XPUAutoShard feature.|GPU|
diff --git a/examples/examples.md b/examples/examples.md
@@ -7,11 +7,11 @@ A wide variety of examples are provided to demonstrate the usage of Intel® Exte
 |[Quick Example](quick_example.html)|Quick example to verify Intel® Extension for TensorFlow* and running environment.|CPU & GPU|
 |[ResNet50 Inference](./infer_resnet50/README.html)|ResNet50 inference on Intel CPU or GPU without code changes.|CPU & GPU|
 |[BERT Training for Classifying Text](./train_bert/README.html)|BERT training with Intel® Extension for TensorFlow* on Intel CPU or GPU.<br>Use the TensorFlow official example without code change.|CPU & GPU|
-|[Speed up Inference of Inception v4 by Advanced Automatic Mixed Precision](./infer_inception_v4_amp/README.html)|Test and compare the performance of inference with FP32 and Advanced Automatic Mixed Precision (AMP) (mix BF16/FP16 and FP32).<br>Shows the acceleration of inference by Advanced AMP on Intel® CPU and GPU.|CPU & GPU|
+|[Speed up Inference of Inception v4 by Advanced Automatic Mixed Precision via Docker Container or Bare Metal](./infer_inception_v4_amp/README.html)|Test and compare the performance of inference with FP32 and Advanced Automatic Mixed Precision (AMP) (mix BF16/FP16 and FP32).<br>Shows the acceleration of inference by Advanced AMP on Intel® CPU and GPU via Docker Container or Bare Metal.|CPU & GPU|
 |[Accelerate AlexNet by Quantization with Intel® Extension for TensorFlow*](./accelerate_alexnet_by_quantization/README.html)| An end-to-end example to show a pipeline to build up a CNN model to <br>recognize handwriting number and speed up AI model with quantization <br>by Intel® Neural Compressor and Intel® Extension for TensorFlow* on Intel GPU.|GPU|
 |[Accelerate Deep Learning Inference for Model Zoo Workloads on Intel CPU and GPU](./model_zoo_example/README.html)|Examples on running Model Zoo workloads on Intel CPU and GPU with the optimizations from Intel® Extension for TensorFlow*, without any code changes.|CPU & GPU|
 |[Quantize Inception V3 by Intel® Extension for TensorFlow* on Intel® Xeon®](./quantize_inception_v3/README.html)|An end-to-end example to show how Intel® Extension for TensorFlow* provides quantization feature by cooperating with Intel® Neural Compressor and oneDNN Graph. It will provide better quantization: better performance and accuracy loss is in controlled.|CPU|
 |[Mnist training with Intel® Optimization for Horovod*](./train_horovod/mnist/README.html)|Mnist distributed training example on Intel GPU. |GPU|
 |[ResNet50 training with Intel® Optimization for Horovod*](./train_horovod/resnet50/README.html)|ResNet50 distributed training example on Intel GPU. |GPU|
 |[Stable Diffusion Inference for Text2Image on Intel GPU](./stable_diffussion_inference/README.html)|Example for running Stable Diffusion Text2Image inference on Intel GPU with the optimizations from Intel® Extension for TensorFlow*. |GPU|
-|[Accelerate ResNet50 Training by XPUAutoShard on Intel GPU](./train_resnet50_with_autoshard/README.html)|Example on running ResNet50 training on Intel GPU with the XPUAutoShard feature. |GPU|
+|[Accelerate ResNet50 Training by XPUAutoShard on Intel GPU](./train_resnet50_with_autoshard/README.html)|Example on running ResNet50 training on Intel GPU with the XPUAutoShard feature. |GPU|
diff --git a/examples/infer_inception_v4_amp/README.md b/examples/infer_inception_v4_amp/README.md
@@ -1,18 +1,18 @@
-# Speed up Inference of Inception v4 by Advanced Automatic Mixed Precision on Intel CPU and GPU
+# Speed up Inference of Inception v4 by Advanced Automatic Mixed Precision on Intel CPU and GPU via Docker Container or Bare Metal
 
 ## Introduction
 Advanced Automatic Mixed Precision (Advanced AMP) uses lower-precision data types (such as float16 or bfloat16) to make model run with 16-bit and 32-bit mixed floating-point types during training and inference to make it run faster with less memory consumption in CPU and GPU.
 
 For detailed info, please refer to [Advanced Automatic Mixed Precision](../../docs/guide/advanced_auto_mixed_precision.md)
 
-This example shows the acceleration of inference by Advanced AMP on Intel CPU or GPU.
+This example shows the acceleration of inference by Advanced AMP on Intel CPU or GPU via Docker container or bare metal.
 
 In this example, we will test and compare the performance of FP32 and Advanced AMP (mix BF16/FP16 and FP32) on Intel CPU or GPU.
 
 
 ## Step
 
-1. Download the Inception v4 model from internet.
+1. Download the Inception v4 model from the internet.
 2. Test the performance of original model (FP32) on Intel CPU or GPU.
 2. Test the performance of original model by Advanced AMP (BF16 or FP16) on Intel CPU or GPU.
 3. Compare the latency and throughputs of above two cases; print the result.
@@ -26,91 +26,88 @@ Advanced AMP supports two 16 bit floating-point types: BF16 and FP16.
 
 |Data Type|GPU|CPU|
 |-|-|-|
-|BF16|Intel® Data Center GPU Flex Series 170<br>Needs to be checked for your Intel GPU|Intel® 4th Generation Intel® Xeon® Scalable Processor (Sapphire Rapids)|
-|FP16|Intel® Data Center GPU Flex Series 170<br>Supported by most of Intel GPU||
+|BF16|Intel® Data Center GPU Max Series<br>Intel® Data Center GPU Flex Series 170<br>Intel® Arc™ A-Series<br>Needs to be checked for your Intel GPU|Intel® 4th Generation Intel® Xeon® Scalable Processor (Sapphire Rapids)|
+|FP16|Intel® Data Center GPU Max Series<br>Intel® Data Center GPU Flex Series 170<br>Intel® Arc™ A-Series<br>Supported by most of Intel GPU||
 
 
 This example supports both types. Set the parameter according to the requirement and hardware support.
 
-### Prepare for GPU
+### Prepare for GPU (Skip this Step for CPU)
 
-Refer to [Prepare](../common_guide_running.md##Prepare)
+* If Running via Docker Container,
 
-### Setup Running Environment
-
-```
-./set_env_cpu.sh
-
-or
+    Refer to [Install GPU Drivers](../../docs/install/install_for_gpu.md#install-gpu-drivers).
 
-./set_env_gpu.sh
-```
-
-### Enable Running Environment
+* If Running on Bare Metal,
 
-* For GPU, refer to [Running](../common_guide_running.md##Running)
+    Refer to [Prepare](../common_guide_running.md#prepare) to install both Intel GPU driver and Intel® oneAPI Base Toolkit.
 
-* For CPU,
+### Clone the Repository
 ```
-source env_itex/bin/activate
+git clone https://github.com/intel/intel-extension-for-tensorflow 
+cd intel-extension-for-tensorflow
+export ITEX_REPO=${PWD}
 ```
 
-### Enable Advanced AMP Method
-
-There are two methods to enable Advanced AMP based on Intel® Extension for TensorFlow*: Python API & Environment Variable Configuration.
-
-1. Python API
-
-Add code in the beginning of Python code:
-
-For BF16:
+### Download the Pretrained-model
 ```
-import intel_extension_for_tensorflow as itex
-
-
-auto_mixed_precision_options = itex.AutoMixedPrecisionOptions()
-auto_mixed_precision_options.data_type = itex.BFLOAT16
-
-
-graph_options = itex.GraphOptions(auto_mixed_precision_options=auto_mixed_precision_options)
-graph_options.auto_mixed_precision = itex.ON
-
-config = itex.ConfigProto(graph_options=graph_options)
-itex.set_config(config)
+cd examples/infer_inception_v4_amp
+wget https://storage.googleapis.com/intel-optimized-tensorflow/models/v1_8/inceptionv4_fp32_pretrained_model.pb
 ```
 
-For FP16, modify one line above:
-```
-auto_mixed_precision_options.data_type = itex.BFLOAT16
-->
-auto_mixed_precision_options.data_type = itex.FLOAT16
-```
+### Setup Running Environment
 
+* If Running via Docker Container,
+
+  * For GPU,
+    ```
+    docker pull intel/intel-extension-for-tensorflow:gpu
+    ```
+    
+  * For CPU,
+    ```
+    docker pull intel/intel-extension-for-tensorflow:cpu
+    ```
+
+* If Running on Bare Metal,
+  
+  * For GPU,
+    ```
+    ./set_env_gpu.sh
+    ```
+    
+  * For CPU,
+    ```
+    ./set_env_gpu.sh
+    ```
 
-2. Environment Variable Configuration
+### Enable Running Environment
 
-Execute commands in bash:
+* If Running via Docker Container,
 
-```
-export ITEX_AUTO_MIXED_PRECISION=1
+  * For GPU,
+    ```
+    docker run -it --rm -p 8888:8888 --device /dev/dri -v /dev/dri/by-path:/dev/dri/by-path -v $ITEX_REPO:/ws1 --ipc host --privileged intel/intel-extension-for-tensorflow:gpu
+    cd /ws1/examples/infer_inception_v4_amp
+    ```
 
-export ITEX_AUTO_MIXED_PRECISION_DATA_TYPE=BFLOAT16
-#export ITEX_AUTO_MIXED_PRECISION_DATA_TYPE=FLOAT16
-```
-For FP16, modify one line above:
-```
-export ITEX_AUTO_MIXED_PRECISION_DATA_TYPE=BFLOAT16
-->
-export ITEX_AUTO_MIXED_PRECISION_DATA_TYPE=FLOAT16
-```
+  * For CPU,
+    ```
+    docker run -it --rm -p 8888:8888 -v $ITEX_REPO:/ws1 --ipc host --privileged intel/intel-extension-for-tensorflow:cpu
+    cd /ws1/examples/infer_inception_v4_amp
+    ```
+    
+* If Running on Bare Metal,
 
-## Download Model
+  * For GPU, refer to [Running](../common_guide_running.md#running)
 
-```
-wget https://storage.googleapis.com/intel-optimized-tensorflow/models/v1_8/inceptionv4_fp32_pretrained_model.pb
-```
+  * For CPU,
+  ```
+  source env_itex/bin/activate
+  ```
 
-## Execute Testing and Comparing the Performance of FP32 and Advanced AMP on CPU and GPU
+
+## Execute Testing and Comparing the Performance of FP32 and Advanced AMP on CPU and GPU in Docker Container or Bare Metal
 
 The example supports both by two scripts:
 - Use Python API : **infer_fp32_vs_amp.py**
@@ -167,6 +164,61 @@ Throughputs Normalized          1                       X.867908472383153
 
 **Note, if the data type (BF16, FP16) is not supported by the hardware, the training will be executed by converting to FP32. That will make the performance worse than FP32 case.**
 
+
+## Advanced: Enable Advanced AMP Method
+
+There are two methods to enable Advanced AMP based on Intel® Extension for TensorFlow*: Python API & Environment Variable Configuration.
+
+1. Python API
+
+    Add code in the beginning of Python code:
+
+    For BF16:
+
+    ```
+    import intel_extension_for_tensorflow as itex
+
+
+    auto_mixed_precision_options = itex.AutoMixedPrecisionOptions()
+    auto_mixed_precision_options.data_type = itex.BFLOAT16
+
+
+    graph_options = itex.GraphOptions(auto_mixed_precision_options=auto_mixed_precision_options)
+    graph_options.auto_mixed_precision = itex.ON
+
+    config = itex.ConfigProto(graph_options=graph_options)
+    itex.set_config(config)
+    ```
+
+    For FP16, modify one line above:
+
+    ```
+    auto_mixed_precision_options.data_type = itex.BFLOAT16
+    ->
+    auto_mixed_precision_options.data_type = itex.FLOAT16
+    ```
+
+
+2. Environment Variable Configuration
+
+    Execute commands in bash:
+
+    ```
+    export ITEX_AUTO_MIXED_PRECISION=1
+
+    export ITEX_AUTO_MIXED_PRECISION_DATA_TYPE=BFLOAT16
+    #export ITEX_AUTO_MIXED_PRECISION_DATA_TYPE=FLOAT16
+    ```
+
+    For FP16, modify one line above:
+
+    ```
+    export ITEX_AUTO_MIXED_PRECISION_DATA_TYPE=BFLOAT16
+    ->
+    export ITEX_AUTO_MIXED_PRECISION_DATA_TYPE=FLOAT16
+    ```
+
+
 ## FAQ
 
 1. If you get the following error log, refer to [Enable Running Environment](#Enable-Running-Environment) to Enable oneAPI running environment.
diff --git a/examples/infer_inception_v4_amp/infer_fp32_vs_amp.py b/examples/infer_inception_v4_amp/infer_fp32_vs_amp.py
@@ -30,9 +30,7 @@
 print("intel_extension_for_tensorflow {}".format(itex.__version__))
 
 def set_itex_fp32(device):
-    backend = device
-    itex.set_backend(backend)
-    print("Set itex for FP32 with backend {}".format(backend))
+    print("Set itex for FP32 with backend {}".format(device))
 
 def set_itex_amp(amp_target):
     # set configure for auto mixed precision.
@@ -47,8 +45,6 @@ def set_itex_amp(amp_target):
     graph_options.auto_mixed_precision = itex.ON
 
     config = itex.ConfigProto(graph_options=graph_options)
-    # set GPU backend.
-
     itex.set_config(config)
 
     print("Set itex for AMP (auto_mixed_precision, {}_FP32) with backend {}".format(amp_target, device))
diff --git a/examples/infer_inception_v4_amp/infer_fp32_vs_amp.sh b/examples/infer_inception_v4_amp/infer_fp32_vs_amp.sh
@@ -43,7 +43,7 @@ if [ ! -d $ENV_NAME ]; then
     echo "Create env $ENV_NAME ..."
     bash set_env_${device_type}.sh
 else
-    echo "Already created env $ENV_NAME, skip craete env"
+    echo "Already created env $ENV_NAME, skip creating env"
 fi
 
 source $ENV_NAME/bin/activate
diff --git a/examples/infer_inception_v4_amp/set_env_cpu.sh b/examples/infer_inception_v4_amp/set_env_cpu.sh
@@ -23,5 +23,5 @@ rm -rf $ENV_NAME
 ${PYTHON} -m venv $ENV_NAME
 source $ENV_NAME/bin/activate
 pip install --upgrade pip
-pip install tensorflow tensorflow_hub
+pip install tensorflow
 pip install --upgrade intel-extension-for-tensorflow[cpu]
diff --git a/examples/infer_inception_v4_amp/set_env_gpu.sh b/examples/infer_inception_v4_amp/set_env_gpu.sh
@@ -22,5 +22,5 @@ rm -rf $ENV_NAME
 python -m venv $ENV_NAME
 source $ENV_NAME/bin/activate
 pip install --upgrade pip
-pip install tensorflow tensorflow_hub
+pip install tensorflow
 pip install --upgrade intel-extension-for-tensorflow[gpu]