Skip to content

Commit ffd00d9

Browse files
committed
Deploy SqueezeNet 1.0 INT8 model with ONNX Runtime on Azure Cobalt 100
Signed-off-by: odidev <[email protected]>
1 parent d0308de commit ffd00d9

File tree

12 files changed

+406
-0
lines changed

12 files changed

+406
-0
lines changed
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
---
2+
title: Deploy SqueezeNet 1.0 INT8 model with ONNX Runtime on Azure Cobalt 100
3+
4+
minutes_to_complete: 60
5+
6+
who_is_this_for: This Learning Path introduces ONNX deployment on Microsoft Azure Cobalt 100 (Arm-based) virtual machines. It is designed for developers migrating ONNX-based applications from x86_64 to Arm with minimal or no changes.
7+
8+
learning_objectives:
9+
- Provision an Azure Arm64 virtual machine using Azure console, with Ubuntu Pro 24.04 LTS as the base image.
10+
- Deploy ONNX on the Ubuntu Pro virtual machine.
11+
- Perform ONNX baseline testing and benchmarking on both x86_64 and Arm64 virtual machines.
12+
13+
prerequisites:
14+
- A [Microsoft Azure](https://azure.microsoft.com/) account with access to Cobalt 100 based instances (Dpsv6).
15+
- Basic understanding of Python and machine learning concepts.
16+
- Familiarity with [ONNX Runtime](https://onnxruntime.ai/docs/) and Azure cloud services.
17+
18+
author: Jason Andrews
19+
20+
### Tags
21+
skilllevels: Advanced
22+
subjects: ML
23+
cloud_service_providers: Microsoft Azure
24+
25+
armips:
26+
- Neoverse
27+
28+
tools_software_languages:
29+
- Python
30+
- ONNX Runtime
31+
32+
operatingsystems:
33+
- Linux
34+
35+
further_reading:
36+
- resource:
37+
title: Azure Virtual Machines documentation
38+
link: https://learn.microsoft.com/en-us/azure/virtual-machines/
39+
type: documentation
40+
- resource:
41+
title: ONNX Runtime Docs
42+
link: https://onnxruntime.ai/docs/
43+
type: documentation
44+
- resource:
45+
title: ONNX (Open Neural Network Exchange) documentation
46+
link: https://onnx.ai/
47+
type: documentation
48+
- resource:
49+
title: onnxruntime_perf_test tool - ONNX Runtime performance benchmarking
50+
link: https://onnxruntime.ai/docs/performance/tune-performance/profiling-tools.html#in-code-performance-profiling
51+
type: documentation
52+
53+
54+
### FIXED, DO NOT MODIFY
55+
# ================================================================================
56+
weight: 1 # _index.md always has weight of 1 to order correctly
57+
layout: "learningpathall" # All files under learning paths have this same wrapper
58+
learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
59+
---
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
# ================================================================================
3+
# FIXED, DO NOT MODIFY THIS FILE
4+
# ================================================================================
5+
weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation.
6+
title: "Next Steps" # Always the same, html page title.
7+
layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing.
8+
---
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
---
2+
title: "Overview"
3+
4+
weight: 2
5+
6+
layout: "learningpathall"
7+
---
8+
9+
## Cobalt 100 Arm-based processor
10+
11+
Azure’s Cobalt 100 is built on Microsoft's first-generation, in-house Arm-based processor: the Cobalt 100. Designed entirely by Microsoft and based on Arm’s Neoverse N2 architecture, this 64-bit CPU delivers improved performance and energy efficiency across a broad spectrum of cloud-native, scale-out Linux workloads. These include web and application servers, data analytics, open-source databases, caching systems, and more. Running at 3.4 GHz, the Cobalt 100 processor allocates a dedicated physical core for each vCPU, ensuring consistent and predictable performance.
12+
13+
To learn more about Cobalt 100, refer to the blog [Announcing the preview of new Azure virtual machine based on the Azure Cobalt 100 processor](https://techcommunity.microsoft.com/blog/azurecompute/announcing-the-preview-of-new-azure-vms-based-on-the-azure-cobalt-100-processor/4146353).
14+
15+
## ONNX
16+
ONNX (Open Neural Network Exchange) is an open-source format designed for representing machine learning models.
17+
It provides interoperability between different deep learning frameworks, enabling models trained in one framework (such as PyTorch or TensorFlow) to be deployed and run in another.
18+
19+
ONNX models are serialized into a standardized format that can be executed by the **ONNX Runtime**, a high-performance inference engine optimized for CPU, GPU, and specialized hardware accelerators. This separation of model training and inference allows developers to build flexible, portable, and production-ready AI workflows.
20+
21+
ONNX is widely used in cloud, edge, and mobile environments to deliver efficient and scalable inference for deep learning models. Learn more from the [ONNX official website](https://onnx.ai/) and the [ONNX Runtime documentation](https://onnxruntime.ai/docs/).
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
---
2+
title: Baseline Testing
3+
weight: 5
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
10+
## Baseline testing using ONNX Runtime:
11+
12+
This test measures the inference latency of the ONNX Runtime by timing how long it takes to process a single input using the `squeezenet-int8.onnx model`. It helps evaluate how efficiently the model runs on the target hardware.
13+
14+
Create a **baseline.py** file with the below code for baseline test of ONNX:
15+
16+
```python
17+
import onnxruntime as ort
18+
import numpy as np
19+
import time
20+
21+
session = ort.InferenceSession("squeezenet-int8.onnx")
22+
input_name = session.get_inputs()[0].name
23+
data = np.random.rand(1, 3, 224, 224).astype(np.float32)
24+
25+
start = time.time()
26+
outputs = session.run(None, {input_name: data})
27+
end = time.time()
28+
29+
print("Inference time:", end - start)
30+
```
31+
32+
Run the baseline test:
33+
34+
```console
35+
python3 baseline.py
36+
```
37+
You should see an output similar to:
38+
```output
39+
Inference time: 0.0026061534881591797
40+
```
41+
{{% notice Note %}}Inference time is the amount of time it takes for a trained machine learning model to make a prediction (i.e., produce output) after receiving input data.
42+
input tensor of shape (1, 3, 224, 224):
43+
- 1: batch size
44+
- 3: color channels (RGB)
45+
- 224 x 224: image resolution (common for models like SqueezeNet)
46+
{{% /notice %}}
47+
48+
#### Output summary:
49+
50+
- Single inference latency: ~2.60 milliseconds (0.00260 sec)
51+
- This shows the initial (cold-start) inference performance of ONNX Runtime on CPU using an optimized int8 quantized model.
52+
- This demonstrates that the setup is fully working, and ONNX Runtime efficiently executes quantized models on Arm64.
Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
---
2+
title: Benchmarking via onnxruntime_perf_test
3+
weight: 6
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
Now that you’ve set up and run the ONNX model (e.g., SqueezeNet), you can use it to benchmark inference performance using Python-based timing or tools like **onnxruntime_perf_test**. This helps evaluate the ONNX Runtime efficiency on Azure Arm64-based Cobalt 100 instances.
10+
11+
You can also compare the inference time between Cobalt 100 (Arm64) and similar D-series x86_64-based virtual machine on Azure.
12+
13+
## Run the performance tests using onnxruntime_perf_test
14+
The **onnxruntime_perf_test** is a performance benchmarking tool included in the ONNX Runtime source code. It is used to measure the inference performance of ONNX models under various runtime conditions (like CPU, GPU, or other execution providers).
15+
16+
### Install Required Build Tools
17+
18+
```console
19+
sudo apt update
20+
sudo apt install -y build-essential cmake git unzip pkg-config
21+
sudo apt install -y protobuf-compiler libprotobuf-dev libprotoc-dev git
22+
```
23+
Then verify:
24+
```console
25+
protoc --version
26+
```
27+
You should see an output similar to:
28+
29+
```output
30+
libprotoc 3.21.12
31+
```
32+
### Build ONNX Runtime from Source:
33+
34+
The benchmarking tool, **onnxruntime_perf_test**, isn’t available as a pre-built binary artifact for any platform. So, you have to build it from the source, which is expected to take around 40-50 minutes.
35+
36+
Clone onnxruntime:
37+
```console
38+
git clone --recursive https://github.com/microsoft/onnxruntime
39+
cd onnxruntime
40+
```
41+
Now, build the benchmark as below:
42+
43+
```console
44+
./build.sh --config Release --build_dir build/Linux --build_shared_lib --parallel --build --update --skip_tests
45+
```
46+
This will build the benchmark tool inside ./build/Linux/Release/onnxruntime_perf_test.
47+
48+
### Run the benchmark
49+
Now that the benchmarking tool has been built, you can benchmark the **squeezenet-int8.onnx** model, as below:
50+
51+
```console
52+
./build/Linux/Release/onnxruntime_perf_test -e cpu -r 100 -m times -s -Z -I <path-to-squeezenet-int8.onnx>
53+
```
54+
- **e cpu**: Use the CPU execution provider (not GPU or any other backend).
55+
- **r 100**: Run 100 inferences.
56+
- **m times**: Use "repeat N times" mode.
57+
- **s**: Show detailed statistics.
58+
- **Z**: Disable intra-op thread spinning (reduces CPU usage when idle between runs).
59+
- **I**: Input the ONNX model path without using input/output test data.
60+
61+
You should see an output similar to:
62+
63+
```output
64+
Disabling intra-op thread spinning between runs
65+
Session creation time cost: 0.0102016 s
66+
First inference time cost: 2 ms
67+
Total inference time cost: 0.185739 s
68+
Total inference requests: 100
69+
Average inference time cost: 1.85739 ms
70+
Total inference run time: 0.18581 s
71+
Number of inferences per second: 538.184
72+
Avg CPU usage: 96 %
73+
Peak working set size: 36696064 bytes
74+
Avg CPU usage:96
75+
Peak working set size:36696064
76+
Runs:100
77+
Min Latency: 0.00183404 s
78+
Max Latency: 0.00190312 s
79+
P50 Latency: 0.00185674 s
80+
P90 Latency: 0.00187215 s
81+
P95 Latency: 0.00187393 s
82+
P99 Latency: 0.00190312 s
83+
P999 Latency: 0.00190312 s
84+
```
85+
### Benchmark Metrics Explained
86+
87+
- **Average Inference Time**: The mean time taken to process a single inference request across all runs. Lower values indicate faster model execution.
88+
- **Throughput**: The number of inference requests processed per second. Higher throughput reflects the model’s ability to handle larger workloads efficiently.
89+
- **CPU Utilization**: The percentage of CPU resources used during inference. A value close to 100% indicates full CPU usage, which is expected during performance benchmarking.
90+
- **Peak Memory Usage**: The maximum amount of system memory (RAM) consumed during inference. Lower memory usage is beneficial for resource-constrained environments.
91+
- **P50 Latency (Median Latency)**: The time below which 50% of inference requests complete. Represents typical latency under normal load.
92+
- **Latency Consistency**: Describes the stability of latency values across all runs. "Consistent" indicates predictable inference performance with minimal jitter.
93+
94+
### Benchmark summary on Arm64:
95+
Here is a summary of benchmark results collected on an Arm64 **D4ps_v6 Ubuntu Pro 24.04 LTS virtual machine**.
96+
97+
| **Metric** | **Value** |
98+
|----------------------------|-------------------------------|
99+
| **Average Inference Time** | 1.857 ms |
100+
| **Throughput** | 538.18 inferences/sec |
101+
| **CPU Utilization** | 96% |
102+
| **Peak Memory Usage** | 36.70 MB |
103+
| **P50 Latency** | 1.857 ms |
104+
| **P90 Latency** | 1.872 ms |
105+
| **P95 Latency** | 1.874 ms |
106+
| **P99 Latency** | 1.903 ms |
107+
| **P999 Latency** | 1.903 ms |
108+
| **Max Latency** | 1.903 ms |
109+
| **Latency Consistency** | Consistent |
110+
111+
112+
### Benchmark summary on x86
113+
Here is a summary of benchmark results collected on x86 **D4s_v6 Ubuntu Pro 24.04 LTS virtual machine**.
114+
115+
| **Metric** | **Value on Virtual Machine** |
116+
|----------------------------|-------------------------------|
117+
| **Average Inference Time** | 1.413 ms |
118+
| **Throughput** | 707.48 inferences/sec |
119+
| **CPU Utilization** | 100% |
120+
| **Peak Memory Usage** | 38.80 MB |
121+
| **P50 Latency** | 1.396 ms |
122+
| **P90 Latency** | 1.501 ms |
123+
| **P95 Latency** | 1.520 ms |
124+
| **P99 Latency** | 1.794 ms |
125+
| **P999 Latency** | 1.794 ms |
126+
| **Max Latency** | 1.794 ms |
127+
| **Latency Consistency** | Consistent |
128+
129+
130+
### Highlights from Ubuntu Pro 24.04 Arm64 Benchmarking
131+
132+
When comparing the results on Arm64 vs x86_64 virtual machines:
133+
- **Low-Latency Inference:** Achieved consistent average inference times of ~1.86 ms on Arm64.
134+
- **Strong and Stable Throughput:** Sustained throughput of over 538 inferences/sec using the `squeezenet-int8.onnx` model on D4ps_v6 instances.
135+
- **Lightweight Resource Footprint:** Peak memory usage stayed below 37 MB, with CPU utilization around 96%, ideal for efficient edge or cloud inference.
136+
- **Consistent Performance:** P50, P95, and Max latency remained tightly bound, showcasing reliable performance on Azure Cobalt 100 Arm-based infrastructure.
137+
138+
You have now benchmarked ONNX on an Azure Cobalt 100 Arm64 virtual machine and compared results with x86_64.
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
---
2+
title: Create an Arm based cloud virtual machine using Microsoft Cobalt 100 CPU
3+
weight: 3
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Introduction
10+
11+
There are several ways to create an Arm-based Cobalt 100 virtual machine : the Microsoft Azure console, the Azure CLI tool, or using your choice of IaC (Infrastructure as Code). This guide will use the Azure console to create a virtual machine with Arm-based Cobalt 100 Processor.
12+
13+
This learning path focuses on the general-purpose virtual machine of the D series. Please read the guide on [Dpsv6 size series](https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/general-purpose/dpsv6-series) offered by Microsoft Azure.
14+
15+
If you have never used the Microsoft Cloud Platform before, please review the microsoft [guide to Create a Linux virtual machine in the Azure portal](https://learn.microsoft.com/en-us/azure/virtual-machines/linux/quick-create-portal?tabs=ubuntu).
16+
17+
#### Create an Arm-based Azure Virtual Machine
18+
19+
Creating a virtual machine based on Azure Cobalt 100 is no different from creating any other virtual machine in Azure. To create an Azure virtual machine, launch the Azure portal and navigate to "Virtual Machines".
20+
1. Select "Create", and click on "Virtual Machine" from the drop-down list.
21+
2. Inside the "Basic" tab, fill in the Instance details such as "Virtual machine name" and "Region".
22+
3. Choose the image for your virtual machine (for example, Ubuntu Pro 24.04 LTS) and select “Arm64” as the VM architecture.
23+
4. In the “Size” field, click on “See all sizes” and select the D-Series v6 family of virtual machines. Select “D4ps_v6” from the list.
24+
25+
![Azure portal VM creation — Azure Cobalt 100 Arm64 virtual machine (D4ps_v6) alt-text#center](images/instance.png "Figure 1: Select the D-Series v6 family of virtual machines")
26+
27+
5. Select "SSH public key" as an Authentication type. Azure will automatically generate an SSH key pair for you and allow you to store it for future use. It is a fast, simple, and secure way to connect to your virtual machine.
28+
6. Fill in the Administrator username for your VM.
29+
7. Select "Generate new key pair", and select "RSA SSH Format" as the SSH Key Type. RSA could offer better security with keys longer than 3072 bits. Give a Key pair name to your SSH key.
30+
8. In the "Inbound port rules", select HTTP (80) and SSH (22) as the inbound ports.
31+
32+
![Azure portal VM creation — Azure Cobalt 100 Arm64 virtual machine (D4ps_v6) alt-text#center](images/instance1.png "Figure 2: Allow inbound port rules")
33+
34+
9. Click on the "Review + Create" tab and review the configuration for your virtual machine. It should look like the following:
35+
36+
![Azure portal VM creation — Azure Cobalt 100 Arm64 virtual machine (D4ps_v6) alt-text#center](images/ubuntu-pro.png "Figure 3: Review and Create an Azure Cobalt 100 Arm64 VM")
37+
38+
10. Finally, when you are confident about your selection, click on the "Create" button, and click on the "Download Private key and Create Resources" button.
39+
40+
![Azure portal VM creation — Azure Cobalt 100 Arm64 virtual machine (D4ps_v6) alt-text#center](images/instance4.png "Figure 4: Download Private key and Create Resources")
41+
42+
11. Your virtual machine should be ready and running within no time. You can SSH into the virtual machine using the private key, along with the Public IP details.
43+
44+
![Azure portal VM creation — Azure Cobalt 100 Arm64 virtual machine (D4ps_v6) alt-text#center](images/final-vm.png "Figure 5: VM deployment confirmation in Azure portal")
45+
46+
{{% notice Note %}}
47+
48+
To learn more about Arm-based virtual machine in Azure, refer to “Getting Started with Microsoft Azure” in [Get started with Arm-based cloud instances](/learning-paths/servers-and-cloud-computing/csp/azure).
49+
50+
{{% /notice %}}

0 commit comments

Comments
 (0)