You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/README.md
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -7,10 +7,10 @@ A wide variety of examples are provided to demonstrate the usage of Intel® Exte
7
7
|[Quick Example](quick_example.md)|Quick example to verify Intel® Extension for TensorFlow* and running environment.|CPU & GPU|
8
8
|[ResNet50 Inference](./infer_resnet50)|ResNet50 inference on Intel CPU or GPU without code changes.|CPU & GPU|
9
9
|[BERT Training for Classifying Text](./train_bert)|BERT training with Intel® Extension for TensorFlow* on Intel CPU or GPU.<br>Use the TensorFlow official example without code change.|CPU & GPU|
10
-
|[Speed up Inference of Inception v4 by Advanced Automatic Mixed Precision](./infer_inception_v4_amp)|Test and compare the performance of inference with FP32 and Advanced Automatic Mixed Precision (AMP) (mix BF16/FP16 and FP32).<br>Shows the acceleration of inference by Advanced AMP on Intel CPU and GPU.|CPU & GPU|
10
+
|[Speed up Inference of Inception v4 by Advanced Automatic Mixed Precision via Docker Container or Bare Metal](./infer_inception_v4_amp)|Test and compare the performance of inference with FP32 and Advanced Automatic Mixed Precision (AMP) (mix BF16/FP16 and FP32).<br>Shows the acceleration of inference by Advanced AMP on Intel CPU and GPU via Docker Container or Bare Metal.|CPU & GPU|
11
11
|[Accelerate AlexNet by Quantization with Intel® Extension for TensorFlow*](./accelerate_alexnet_by_quantization)| An end-to-end example to show a pipeline to build up a CNN model to <br>recognize handwriting number and speed up AI model with quantization <br>by Intel® Neural Compressor and Intel® Extension for TensorFlow* on Intel GPU.|GPU|
12
12
|[Accelerate Deep Learning Inference for Model Zoo Workloads on Intel CPU and GPU](./model_zoo_example)|Examples on running Model Zoo workloads on Intel CPU and GPU with the optimizations from Intel® Extension for TensorFlow*, without any code changes.|CPU & GPU|
13
13
|[Quantize Inception V3 by Intel® Extension for TensorFlow* on Intel® Xeon®](./quantize_inception_v3)|An end-to-end example to show how Intel® Extension for TensorFlow* provides quantization feature by cooperating with Intel® Neural Compressor and oneDNN Graph. It will provide better quantization: better performance and accuracy loss is in controlled.|CPU|
14
14
|[ResNet50 and Mnist training with Horovod](./train_horovod)|ResNet50 and Mnist distributed training examples on Intel GPU.|GPU|
15
15
|[Stable Diffusion Inference for Text2Image on Intel GPU](./stable_diffussion_inference)|Example for running Stable Diffusion Text2Image inference on Intel GPU with the optimizations from Intel® Extension for TensorFlow*.|GPU|
16
-
|[Accelerate ResNet50 Training by XPUAutoShard on Intel GPU](./train_resnet50_with_autoshard)|Example on running ResNet50 training on Intel GPU with the XPUAutoShard feature.|GPU|
16
+
|[Accelerate ResNet50 Training by XPUAutoShard on Intel GPU](./train_resnet50_with_autoshard)|Example on running ResNet50 training on Intel GPU with the XPUAutoShard feature.|GPU|
Copy file name to clipboardExpand all lines: examples/examples.md
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -7,11 +7,11 @@ A wide variety of examples are provided to demonstrate the usage of Intel® Exte
7
7
|[Quick Example](quick_example.html)|Quick example to verify Intel® Extension for TensorFlow* and running environment.|CPU & GPU|
8
8
|[ResNet50 Inference](./infer_resnet50/README.html)|ResNet50 inference on Intel CPU or GPU without code changes.|CPU & GPU|
9
9
|[BERT Training for Classifying Text](./train_bert/README.html)|BERT training with Intel® Extension for TensorFlow* on Intel CPU or GPU.<br>Use the TensorFlow official example without code change.|CPU & GPU|
10
-
|[Speed up Inference of Inception v4 by Advanced Automatic Mixed Precision](./infer_inception_v4_amp/README.html)|Test and compare the performance of inference with FP32 and Advanced Automatic Mixed Precision (AMP) (mix BF16/FP16 and FP32).<br>Shows the acceleration of inference by Advanced AMP on Intel® CPU and GPU.|CPU & GPU|
10
+
|[Speed up Inference of Inception v4 by Advanced Automatic Mixed Precision via Docker Container or Bare Metal](./infer_inception_v4_amp/README.html)|Test and compare the performance of inference with FP32 and Advanced Automatic Mixed Precision (AMP) (mix BF16/FP16 and FP32).<br>Shows the acceleration of inference by Advanced AMP on Intel® CPU and GPU via Docker Container or Bare Metal.|CPU & GPU|
11
11
|[Accelerate AlexNet by Quantization with Intel® Extension for TensorFlow*](./accelerate_alexnet_by_quantization/README.html)| An end-to-end example to show a pipeline to build up a CNN model to <br>recognize handwriting number and speed up AI model with quantization <br>by Intel® Neural Compressor and Intel® Extension for TensorFlow* on Intel GPU.|GPU|
12
12
|[Accelerate Deep Learning Inference for Model Zoo Workloads on Intel CPU and GPU](./model_zoo_example/README.html)|Examples on running Model Zoo workloads on Intel CPU and GPU with the optimizations from Intel® Extension for TensorFlow*, without any code changes.|CPU & GPU|
13
13
|[Quantize Inception V3 by Intel® Extension for TensorFlow* on Intel® Xeon®](./quantize_inception_v3/README.html)|An end-to-end example to show how Intel® Extension for TensorFlow* provides quantization feature by cooperating with Intel® Neural Compressor and oneDNN Graph. It will provide better quantization: better performance and accuracy loss is in controlled.|CPU|
14
14
|[Mnist training with Intel® Optimization for Horovod*](./train_horovod/mnist/README.html)|Mnist distributed training example on Intel GPU. |GPU|
15
15
|[ResNet50 training with Intel® Optimization for Horovod*](./train_horovod/resnet50/README.html)|ResNet50 distributed training example on Intel GPU. |GPU|
16
16
|[Stable Diffusion Inference for Text2Image on Intel GPU](./stable_diffussion_inference/README.html)|Example for running Stable Diffusion Text2Image inference on Intel GPU with the optimizations from Intel® Extension for TensorFlow*. |GPU|
17
-
|[Accelerate ResNet50 Training by XPUAutoShard on Intel GPU](./train_resnet50_with_autoshard/README.html)|Example on running ResNet50 training on Intel GPU with the XPUAutoShard feature. |GPU|
17
+
|[Accelerate ResNet50 Training by XPUAutoShard on Intel GPU](./train_resnet50_with_autoshard/README.html)|Example on running ResNet50 training on Intel GPU with the XPUAutoShard feature. |GPU|
# Speed up Inference of Inception v4 by Advanced Automatic Mixed Precision on Intel CPU and GPU
1
+
# Speed up Inference of Inception v4 by Advanced Automatic Mixed Precision on Intel CPU and GPU via Docker Container or Bare Metal
2
2
3
3
## Introduction
4
4
Advanced Automatic Mixed Precision (Advanced AMP) uses lower-precision data types (such as float16 or bfloat16) to make model run with 16-bit and 32-bit mixed floating-point types during training and inference to make it run faster with less memory consumption in CPU and GPU.
5
5
6
6
For detailed info, please refer to [Advanced Automatic Mixed Precision](../../docs/guide/advanced_auto_mixed_precision.md)
7
7
8
-
This example shows the acceleration of inference by Advanced AMP on Intel CPU or GPU.
8
+
This example shows the acceleration of inference by Advanced AMP on Intel CPU or GPU via Docker container or bare metal.
9
9
10
10
In this example, we will test and compare the performance of FP32 and Advanced AMP (mix BF16/FP16 and FP32) on Intel CPU or GPU.
11
11
12
12
13
13
## Step
14
14
15
-
1. Download the Inception v4 model from internet.
15
+
1. Download the Inception v4 model from the internet.
16
16
2. Test the performance of original model (FP32) on Intel CPU or GPU.
17
17
2. Test the performance of original model by Advanced AMP (BF16 or FP16) on Intel CPU or GPU.
18
18
3. Compare the latency and throughputs of above two cases; print the result.
@@ -26,91 +26,88 @@ Advanced AMP supports two 16 bit floating-point types: BF16 and FP16.
26
26
27
27
|Data Type|GPU|CPU|
28
28
|-|-|-|
29
-
|BF16|Intel® Data Center GPU Flex Series 170<br>Needs to be checked for your Intel GPU|Intel® 4th Generation Intel® Xeon® Scalable Processor (Sapphire Rapids)|
30
-
|FP16|Intel® Data Center GPU Flex Series 170<br>Supported by most of Intel GPU||
29
+
|BF16|Intel® Data Center GPU Max Series<br>Intel® Data Center GPU Flex Series 170<br>Intel® Arc™ A-Series<br>Needs to be checked for your Intel GPU|Intel® 4th Generation Intel® Xeon® Scalable Processor (Sapphire Rapids)|
30
+
|FP16|Intel® Data Center GPU Max Series<br>Intel® Data Center GPU Flex Series 170<br>Intel® Arc™ A-Series<br>Supported by most of Intel GPU||
31
31
32
32
33
33
This example supports both types. Set the parameter according to the requirement and hardware support.
34
34
35
-
### Prepare for GPU
35
+
### Prepare for GPU (Skip this Step for CPU)
36
36
37
-
Refer to [Prepare](../common_guide_running.md##Prepare)
37
+
* If Running via Docker Container,
38
38
39
-
### Setup Running Environment
40
-
41
-
```
42
-
./set_env_cpu.sh
43
-
44
-
or
39
+
Refer to [Install GPU Drivers](../../docs/install/install_for_gpu.md#install-gpu-drivers).
45
40
46
-
./set_env_gpu.sh
47
-
```
48
-
49
-
### Enable Running Environment
41
+
* If Running on Bare Metal,
50
42
51
-
* For GPU, refer to [Running](../common_guide_running.md##Running)
43
+
Refer to [Prepare](../common_guide_running.md#prepare) to install both Intel GPU driver and Intel® oneAPI Base Toolkit.
**Note, if the data type (BF16, FP16) is not supported by the hardware, the training will be executed by converting to FP32. That will make the performance worse than FP32 case.**
169
166
167
+
168
+
## Advanced: Enable Advanced AMP Method
169
+
170
+
There are two methods to enable Advanced AMP based on Intel® Extension for TensorFlow*: Python API & Environment Variable Configuration.
0 commit comments