Skip to content

Commit 47dd596

Browse files
wangkl2xiguiw
andauthored
Modify example add running with docker example (#2301)
Co-authored-by: Wang, Xigui <[email protected]>
1 parent 8be9bf4 commit 47dd596

File tree

7 files changed

+125
-77
lines changed

7 files changed

+125
-77
lines changed

examples/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,10 @@ A wide variety of examples are provided to demonstrate the usage of Intel® Exte
77
|[Quick Example](quick_example.md)|Quick example to verify Intel® Extension for TensorFlow* and running environment.|CPU & GPU|
88
|[ResNet50 Inference](./infer_resnet50)|ResNet50 inference on Intel CPU or GPU without code changes.|CPU & GPU|
99
|[BERT Training for Classifying Text](./train_bert)|BERT training with Intel® Extension for TensorFlow* on Intel CPU or GPU.<br>Use the TensorFlow official example without code change.|CPU & GPU|
10-
|[Speed up Inference of Inception v4 by Advanced Automatic Mixed Precision](./infer_inception_v4_amp)|Test and compare the performance of inference with FP32 and Advanced Automatic Mixed Precision (AMP) (mix BF16/FP16 and FP32).<br>Shows the acceleration of inference by Advanced AMP on Intel CPU and GPU.|CPU & GPU|
10+
|[Speed up Inference of Inception v4 by Advanced Automatic Mixed Precision via Docker Container or Bare Metal](./infer_inception_v4_amp)|Test and compare the performance of inference with FP32 and Advanced Automatic Mixed Precision (AMP) (mix BF16/FP16 and FP32).<br>Shows the acceleration of inference by Advanced AMP on Intel CPU and GPU via Docker Container or Bare Metal.|CPU & GPU|
1111
|[Accelerate AlexNet by Quantization with Intel® Extension for TensorFlow*](./accelerate_alexnet_by_quantization)| An end-to-end example to show a pipeline to build up a CNN model to <br>recognize handwriting number and speed up AI model with quantization <br>by Intel® Neural Compressor and Intel® Extension for TensorFlow* on Intel GPU.|GPU|
1212
|[Accelerate Deep Learning Inference for Model Zoo Workloads on Intel CPU and GPU](./model_zoo_example)|Examples on running Model Zoo workloads on Intel CPU and GPU with the optimizations from Intel® Extension for TensorFlow*, without any code changes.|CPU & GPU|
1313
|[Quantize Inception V3 by Intel® Extension for TensorFlow* on Intel® Xeon®](./quantize_inception_v3)|An end-to-end example to show how Intel® Extension for TensorFlow* provides quantization feature by cooperating with Intel® Neural Compressor and oneDNN Graph. It will provide better quantization: better performance and accuracy loss is in controlled.|CPU|
1414
|[ResNet50 and Mnist training with Horovod](./train_horovod)|ResNet50 and Mnist distributed training examples on Intel GPU.|GPU|
1515
|[Stable Diffusion Inference for Text2Image on Intel GPU](./stable_diffussion_inference)|Example for running Stable Diffusion Text2Image inference on Intel GPU with the optimizations from Intel® Extension for TensorFlow*.|GPU|
16-
|[Accelerate ResNet50 Training by XPUAutoShard on Intel GPU](./train_resnet50_with_autoshard)|Example on running ResNet50 training on Intel GPU with the XPUAutoShard feature.|GPU|
16+
|[Accelerate ResNet50 Training by XPUAutoShard on Intel GPU](./train_resnet50_with_autoshard)|Example on running ResNet50 training on Intel GPU with the XPUAutoShard feature.|GPU|

examples/examples.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,11 @@ A wide variety of examples are provided to demonstrate the usage of Intel® Exte
77
|[Quick Example](quick_example.html)|Quick example to verify Intel® Extension for TensorFlow* and running environment.|CPU & GPU|
88
|[ResNet50 Inference](./infer_resnet50/README.html)|ResNet50 inference on Intel CPU or GPU without code changes.|CPU & GPU|
99
|[BERT Training for Classifying Text](./train_bert/README.html)|BERT training with Intel® Extension for TensorFlow* on Intel CPU or GPU.<br>Use the TensorFlow official example without code change.|CPU & GPU|
10-
|[Speed up Inference of Inception v4 by Advanced Automatic Mixed Precision](./infer_inception_v4_amp/README.html)|Test and compare the performance of inference with FP32 and Advanced Automatic Mixed Precision (AMP) (mix BF16/FP16 and FP32).<br>Shows the acceleration of inference by Advanced AMP on Intel® CPU and GPU.|CPU & GPU|
10+
|[Speed up Inference of Inception v4 by Advanced Automatic Mixed Precision via Docker Container or Bare Metal](./infer_inception_v4_amp/README.html)|Test and compare the performance of inference with FP32 and Advanced Automatic Mixed Precision (AMP) (mix BF16/FP16 and FP32).<br>Shows the acceleration of inference by Advanced AMP on Intel® CPU and GPU via Docker Container or Bare Metal.|CPU & GPU|
1111
|[Accelerate AlexNet by Quantization with Intel® Extension for TensorFlow*](./accelerate_alexnet_by_quantization/README.html)| An end-to-end example to show a pipeline to build up a CNN model to <br>recognize handwriting number and speed up AI model with quantization <br>by Intel® Neural Compressor and Intel® Extension for TensorFlow* on Intel GPU.|GPU|
1212
|[Accelerate Deep Learning Inference for Model Zoo Workloads on Intel CPU and GPU](./model_zoo_example/README.html)|Examples on running Model Zoo workloads on Intel CPU and GPU with the optimizations from Intel® Extension for TensorFlow*, without any code changes.|CPU & GPU|
1313
|[Quantize Inception V3 by Intel® Extension for TensorFlow* on Intel® Xeon®](./quantize_inception_v3/README.html)|An end-to-end example to show how Intel® Extension for TensorFlow* provides quantization feature by cooperating with Intel® Neural Compressor and oneDNN Graph. It will provide better quantization: better performance and accuracy loss is in controlled.|CPU|
1414
|[Mnist training with Intel® Optimization for Horovod*](./train_horovod/mnist/README.html)|Mnist distributed training example on Intel GPU. |GPU|
1515
|[ResNet50 training with Intel® Optimization for Horovod*](./train_horovod/resnet50/README.html)|ResNet50 distributed training example on Intel GPU. |GPU|
1616
|[Stable Diffusion Inference for Text2Image on Intel GPU](./stable_diffussion_inference/README.html)|Example for running Stable Diffusion Text2Image inference on Intel GPU with the optimizations from Intel® Extension for TensorFlow*. |GPU|
17-
|[Accelerate ResNet50 Training by XPUAutoShard on Intel GPU](./train_resnet50_with_autoshard/README.html)|Example on running ResNet50 training on Intel GPU with the XPUAutoShard feature. |GPU|
17+
|[Accelerate ResNet50 Training by XPUAutoShard on Intel GPU](./train_resnet50_with_autoshard/README.html)|Example on running ResNet50 training on Intel GPU with the XPUAutoShard feature. |GPU|

examples/infer_inception_v4_amp/README.md

Lines changed: 117 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,18 @@
1-
# Speed up Inference of Inception v4 by Advanced Automatic Mixed Precision on Intel CPU and GPU
1+
# Speed up Inference of Inception v4 by Advanced Automatic Mixed Precision on Intel CPU and GPU via Docker Container or Bare Metal
22

33
## Introduction
44
Advanced Automatic Mixed Precision (Advanced AMP) uses lower-precision data types (such as float16 or bfloat16) to make model run with 16-bit and 32-bit mixed floating-point types during training and inference to make it run faster with less memory consumption in CPU and GPU.
55

66
For detailed info, please refer to [Advanced Automatic Mixed Precision](../../docs/guide/advanced_auto_mixed_precision.md)
77

8-
This example shows the acceleration of inference by Advanced AMP on Intel CPU or GPU.
8+
This example shows the acceleration of inference by Advanced AMP on Intel CPU or GPU via Docker container or bare metal.
99

1010
In this example, we will test and compare the performance of FP32 and Advanced AMP (mix BF16/FP16 and FP32) on Intel CPU or GPU.
1111

1212

1313
## Step
1414

15-
1. Download the Inception v4 model from internet.
15+
1. Download the Inception v4 model from the internet.
1616
2. Test the performance of original model (FP32) on Intel CPU or GPU.
1717
2. Test the performance of original model by Advanced AMP (BF16 or FP16) on Intel CPU or GPU.
1818
3. Compare the latency and throughputs of above two cases; print the result.
@@ -26,91 +26,88 @@ Advanced AMP supports two 16 bit floating-point types: BF16 and FP16.
2626

2727
|Data Type|GPU|CPU|
2828
|-|-|-|
29-
|BF16|Intel® Data Center GPU Flex Series 170<br>Needs to be checked for your Intel GPU|Intel® 4th Generation Intel® Xeon® Scalable Processor (Sapphire Rapids)|
30-
|FP16|Intel® Data Center GPU Flex Series 170<br>Supported by most of Intel GPU||
29+
|BF16|Intel® Data Center GPU Max Series<br>Intel® Data Center GPU Flex Series 170<br>Intel® Arc™ A-Series<br>Needs to be checked for your Intel GPU|Intel® 4th Generation Intel® Xeon® Scalable Processor (Sapphire Rapids)|
30+
|FP16|Intel® Data Center GPU Max Series<br>Intel® Data Center GPU Flex Series 170<br>Intel® Arc™ A-Series<br>Supported by most of Intel GPU||
3131

3232

3333
This example supports both types. Set the parameter according to the requirement and hardware support.
3434

35-
### Prepare for GPU
35+
### Prepare for GPU (Skip this Step for CPU)
3636

37-
Refer to [Prepare](../common_guide_running.md##Prepare)
37+
* If Running via Docker Container,
3838

39-
### Setup Running Environment
40-
41-
```
42-
./set_env_cpu.sh
43-
44-
or
39+
Refer to [Install GPU Drivers](../../docs/install/install_for_gpu.md#install-gpu-drivers).
4540

46-
./set_env_gpu.sh
47-
```
48-
49-
### Enable Running Environment
41+
* If Running on Bare Metal,
5042

51-
* For GPU, refer to [Running](../common_guide_running.md##Running)
43+
Refer to [Prepare](../common_guide_running.md#prepare) to install both Intel GPU driver and Intel® oneAPI Base Toolkit.
5244

53-
* For CPU,
45+
### Clone the Repository
5446
```
55-
source env_itex/bin/activate
47+
git clone https://github.com/intel/intel-extension-for-tensorflow
48+
cd intel-extension-for-tensorflow
49+
export ITEX_REPO=${PWD}
5650
```
5751

58-
### Enable Advanced AMP Method
59-
60-
There are two methods to enable Advanced AMP based on Intel® Extension for TensorFlow*: Python API & Environment Variable Configuration.
61-
62-
1. Python API
63-
64-
Add code in the beginning of Python code:
65-
66-
For BF16:
52+
### Download the Pretrained-model
6753
```
68-
import intel_extension_for_tensorflow as itex
69-
70-
71-
auto_mixed_precision_options = itex.AutoMixedPrecisionOptions()
72-
auto_mixed_precision_options.data_type = itex.BFLOAT16
73-
74-
75-
graph_options = itex.GraphOptions(auto_mixed_precision_options=auto_mixed_precision_options)
76-
graph_options.auto_mixed_precision = itex.ON
77-
78-
config = itex.ConfigProto(graph_options=graph_options)
79-
itex.set_config(config)
54+
cd examples/infer_inception_v4_amp
55+
wget https://storage.googleapis.com/intel-optimized-tensorflow/models/v1_8/inceptionv4_fp32_pretrained_model.pb
8056
```
8157

82-
For FP16, modify one line above:
83-
```
84-
auto_mixed_precision_options.data_type = itex.BFLOAT16
85-
->
86-
auto_mixed_precision_options.data_type = itex.FLOAT16
87-
```
58+
### Setup Running Environment
8859

60+
* If Running via Docker Container,
61+
62+
* For GPU,
63+
```
64+
docker pull intel/intel-extension-for-tensorflow:gpu
65+
```
66+
67+
* For CPU,
68+
```
69+
docker pull intel/intel-extension-for-tensorflow:cpu
70+
```
71+
72+
* If Running on Bare Metal,
73+
74+
* For GPU,
75+
```
76+
./set_env_gpu.sh
77+
```
78+
79+
* For CPU,
80+
```
81+
./set_env_gpu.sh
82+
```
8983
90-
2. Environment Variable Configuration
84+
### Enable Running Environment
9185
92-
Execute commands in bash:
86+
* If Running via Docker Container,
9387
94-
```
95-
export ITEX_AUTO_MIXED_PRECISION=1
88+
* For GPU,
89+
```
90+
docker run -it --rm -p 8888:8888 --device /dev/dri -v /dev/dri/by-path:/dev/dri/by-path -v $ITEX_REPO:/ws1 --ipc host --privileged intel/intel-extension-for-tensorflow:gpu
91+
cd /ws1/examples/infer_inception_v4_amp
92+
```
9693
97-
export ITEX_AUTO_MIXED_PRECISION_DATA_TYPE=BFLOAT16
98-
#export ITEX_AUTO_MIXED_PRECISION_DATA_TYPE=FLOAT16
99-
```
100-
For FP16, modify one line above:
101-
```
102-
export ITEX_AUTO_MIXED_PRECISION_DATA_TYPE=BFLOAT16
103-
->
104-
export ITEX_AUTO_MIXED_PRECISION_DATA_TYPE=FLOAT16
105-
```
94+
* For CPU,
95+
```
96+
docker run -it --rm -p 8888:8888 -v $ITEX_REPO:/ws1 --ipc host --privileged intel/intel-extension-for-tensorflow:cpu
97+
cd /ws1/examples/infer_inception_v4_amp
98+
```
99+
100+
* If Running on Bare Metal,
106101
107-
## Download Model
102+
* For GPU, refer to [Running](../common_guide_running.md#running)
108103
109-
```
110-
wget https://storage.googleapis.com/intel-optimized-tensorflow/models/v1_8/inceptionv4_fp32_pretrained_model.pb
111-
```
104+
* For CPU,
105+
```
106+
source env_itex/bin/activate
107+
```
112108
113-
## Execute Testing and Comparing the Performance of FP32 and Advanced AMP on CPU and GPU
109+
110+
## Execute Testing and Comparing the Performance of FP32 and Advanced AMP on CPU and GPU in Docker Container or Bare Metal
114111
115112
The example supports both by two scripts:
116113
- Use Python API : **infer_fp32_vs_amp.py**
@@ -167,6 +164,61 @@ Throughputs Normalized 1 X.867908472383153
167164
168165
**Note, if the data type (BF16, FP16) is not supported by the hardware, the training will be executed by converting to FP32. That will make the performance worse than FP32 case.**
169166
167+
168+
## Advanced: Enable Advanced AMP Method
169+
170+
There are two methods to enable Advanced AMP based on Intel® Extension for TensorFlow*: Python API & Environment Variable Configuration.
171+
172+
1. Python API
173+
174+
Add code in the beginning of Python code:
175+
176+
For BF16:
177+
178+
```
179+
import intel_extension_for_tensorflow as itex
180+
181+
182+
auto_mixed_precision_options = itex.AutoMixedPrecisionOptions()
183+
auto_mixed_precision_options.data_type = itex.BFLOAT16
184+
185+
186+
graph_options = itex.GraphOptions(auto_mixed_precision_options=auto_mixed_precision_options)
187+
graph_options.auto_mixed_precision = itex.ON
188+
189+
config = itex.ConfigProto(graph_options=graph_options)
190+
itex.set_config(config)
191+
```
192+
193+
For FP16, modify one line above:
194+
195+
```
196+
auto_mixed_precision_options.data_type = itex.BFLOAT16
197+
->
198+
auto_mixed_precision_options.data_type = itex.FLOAT16
199+
```
200+
201+
202+
2. Environment Variable Configuration
203+
204+
Execute commands in bash:
205+
206+
```
207+
export ITEX_AUTO_MIXED_PRECISION=1
208+
209+
export ITEX_AUTO_MIXED_PRECISION_DATA_TYPE=BFLOAT16
210+
#export ITEX_AUTO_MIXED_PRECISION_DATA_TYPE=FLOAT16
211+
```
212+
213+
For FP16, modify one line above:
214+
215+
```
216+
export ITEX_AUTO_MIXED_PRECISION_DATA_TYPE=BFLOAT16
217+
->
218+
export ITEX_AUTO_MIXED_PRECISION_DATA_TYPE=FLOAT16
219+
```
220+
221+
170222
## FAQ
171223
172224
1. If you get the following error log, refer to [Enable Running Environment](#Enable-Running-Environment) to Enable oneAPI running environment.

examples/infer_inception_v4_amp/infer_fp32_vs_amp.py

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -30,9 +30,7 @@
3030
print("intel_extension_for_tensorflow {}".format(itex.__version__))
3131

3232
def set_itex_fp32(device):
33-
backend = device
34-
itex.set_backend(backend)
35-
print("Set itex for FP32 with backend {}".format(backend))
33+
print("Set itex for FP32 with backend {}".format(device))
3634

3735
def set_itex_amp(amp_target):
3836
# set configure for auto mixed precision.
@@ -47,8 +45,6 @@ def set_itex_amp(amp_target):
4745
graph_options.auto_mixed_precision = itex.ON
4846

4947
config = itex.ConfigProto(graph_options=graph_options)
50-
# set GPU backend.
51-
5248
itex.set_config(config)
5349

5450
print("Set itex for AMP (auto_mixed_precision, {}_FP32) with backend {}".format(amp_target, device))

examples/infer_inception_v4_amp/infer_fp32_vs_amp.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ if [ ! -d $ENV_NAME ]; then
4343
echo "Create env $ENV_NAME ..."
4444
bash set_env_${device_type}.sh
4545
else
46-
echo "Already created env $ENV_NAME, skip craete env"
46+
echo "Already created env $ENV_NAME, skip creating env"
4747
fi
4848

4949
source $ENV_NAME/bin/activate

examples/infer_inception_v4_amp/set_env_cpu.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,5 +23,5 @@ rm -rf $ENV_NAME
2323
${PYTHON} -m venv $ENV_NAME
2424
source $ENV_NAME/bin/activate
2525
pip install --upgrade pip
26-
pip install tensorflow tensorflow_hub
26+
pip install tensorflow
2727
pip install --upgrade intel-extension-for-tensorflow[cpu]

examples/infer_inception_v4_amp/set_env_gpu.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,5 +22,5 @@ rm -rf $ENV_NAME
2222
python -m venv $ENV_NAME
2323
source $ENV_NAME/bin/activate
2424
pip install --upgrade pip
25-
pip install tensorflow tensorflow_hub
25+
pip install tensorflow
2626
pip install --upgrade intel-extension-for-tensorflow[gpu]

0 commit comments

Comments
 (0)