Skip to content

Deploy SqueezeNet 1.0 INT8 (opset 12) model with ONNX Runtime on Azure Cobalt 100 #2139

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
---
title: Deploy SqueezeNet 1.0 INT8 (opset 12) model with ONNX Runtime on Azure Cobalt 100

minutes_to_complete: 60

who_is_this_for: This is an introductory topic for the software developers who are willing to migrate their ONNX based applications from x86_64 platforms to Arm based platforms, or on Microsoft Azure - Cobalt 100 CPU based VMs specifically.

learning_objectives:
- Provision an Azure Arm64 VM using Azure console, with Ubuntu as the base image.
- Learn how to create Azure Linux 3.0 Docker container.
- Deploy an ONNX-based application inside an Azure Linux 3.0 Arm based Docker container, as well as Azure Linux 3.0 custom-image based Azure VM.
- Perform ONNX benchmarking inside the container as well as the custom VM.

prerequisites:
- A [Microsoft Azure](https://azure.microsoft.com/) account with access to Cobalt 100 based instances (Dpsv6).
- A machine with [Docker](/install-guides/docker/) installed.
- Basic understanding of Python and machine learning concepts.
- Familiarity with ONNX Runtime and Azure cloud services.

author: Zach Lasiuk

### Tags
skilllevels: Advanced
subjects: ML
cloud_service_providers: Microsoft Azure

armips:
- Neoverse-N2

tools_software_languages:
- Python
- Docker
- ONNX Runtime

operatingsystems:
- Linux

further_reading:
- resource:
title: Azure Virtual Machines documentation
link: https://learn.microsoft.com/en-us/azure/virtual-machines/
type: documentation
- resource:
title: Azure Container Instances documentation
link: https://learn.microsoft.com/en-us/azure/container-instances/
type: documentation
- resource:
title: ONNX Runtime Docs
link: https://onnxruntime.ai/docs/
type: documentation
- resource:
title: ONNX (Open Neural Network Exchange) documentation
link: https://onnx.ai/
type: documentation
- resource:
title: onnxruntime_perf_test tool - ONNX Runtime performance benchmarking
link: https://onnxruntime.ai/docs/performance/tune-performance/profiling-tools.html#in-code-performance-profiling
type: documentation


### FIXED, DO NOT MODIFY
# ================================================================================
weight: 1 # _index.md always has weight of 1 to order correctly
layout: "learningpathall" # All files under learning paths have this same wrapper
learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
# ================================================================================
# FIXED, DO NOT MODIFY THIS FILE
# ================================================================================
weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation.
title: "Next Steps" # Always the same, html page title.
layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing.
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
title: "Background"

weight: 2

layout: "learningpathall"
---

## What is Cobalt 100 Arm-based processor?

Azure’s Cobalt 100 is built on Microsoft's first-generation, in-house Arm-based processor: the Cobalt 100. Designed entirely by Microsoft and based on Arm’s Neoverse N2 architecture, this 64-bit CPU delivers improved performance and energy efficiency across a broad spectrum of cloud-native, scale-out Linux workloads. These include web and application servers, data analytics, open-source databases, caching systems, and more. Running at 3.4 GHz, the Cobalt 100 processor allocates a dedicated physical core for each vCPU, ensuring consistent and predictable performance.

To learn more about Cobalt 100, refer to the blog [Announcing the preview of new Azure VMs based on the Azure Cobalt 100 processor](https://techcommunity.microsoft.com/blog/azurecompute/announcing-the-preview-of-new-azure-vms-based-on-the-azure-cobalt-100-processor/4146353).

## Introduction to Azure Linux 3.0

Azure Linux 3.0 is Microsoft's in-house, lightweight Linux distribution optimized for running cloud-native workloads on Azure. Designed with performance, security, and reliability in mind, it is fully supported by Microsoft and tailored for containers, microservices, and Kubernetes. With native support for Arm64 (Aarch64) architecture, Azure Linux 3.0 enables efficient execution of workloads on energy-efficient ARM-based infrastructure, making it a powerful choice for scalable and cost-effective cloud deployments.

As of now, the Azure Marketplace offers official VM images of Azure Linux 3.0 only for x64-based architectures, published by Ntegral Inc. However, native Arm64 (Aarch64) images are not yet officially available. Hence, for this Learning Path, we create our custom Azure Linux 3.0 VM image for Aarch64, using the [Aarch64 ISO for Azure Linux 3.0](https://github.com/microsoft/azurelinux#iso).

Alternatively, use the [Azure Linux 3.0 Docker container](https://learn.microsoft.com/en-us/azure/azure-linux/intro-azure-linux) on any supported platform.

For this Learning Path, we perform the deployment and benchmarking on both the Azure Linux 3.0 environments, the Docker container, as well as the custom-image-based VM.

## Introduction to ONNX

ONNX (Open Neural Network Exchange) is an open standard for representing machine learning models, enabling interoperability between different AI frameworks. It allows you to train a model in one framework (like PyTorch or TensorFlow) and run it using ONNX Runtime for optimized inference.

In this Learning Path, we deploy ONNX on Azure Linux 3.0 (Arm64) and benchmark its performance using the[ onnxruntime_perf_test tool](https://onnxruntime.ai/docs/performance/tune-performance/profiling-tools.html#in-code-performance-profiling) on both a custom VM image and a Docker container, showcasing its efficiency on ARM-based infrastructure.
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
---
title: Benchmarking via onnxruntime_perf_test
weight: 5

### FIXED, DO NOT MODIFY
layout: learningpathall
---

## Test the ONNX based application for Performance
Now that you’ve set up and run the ONNX model (e.g., SqueezeNet), you can use it to benchmark inference performance using Python-based timing or tools like onnxruntime_perf_test. This helps evaluate the ONNX Runtime efficiency on Azure Arm64-based Cobalt 100 instances.
You can also compare the inference time between Cobalt 100 (Arm64) and similar D-series x86_64-based VMs on Azure.
As noted before, the steps to benchmark remain the same, whether it's a Docker container or a custom VM.

## Run the performance tests using onnxruntime_perf_test
The onnxruntime_perf_test is a performance benchmarking tool included in the ONNX Runtime source code. It is used to measure the inference performance of ONNX models under various runtime conditions (like CPU, GPU, or other execution providers).

### Install Required Build Tools

```console
$ tdnf install -y cmake make gcc-c++ git
```
#### Install Protobuf

```console
$ tdnf install -y protobuf protobuf-devel
```
Then verify:
```console
$ protoc --version
```
You should see something like:

```output
libprotoc x.x
```
If installation fails, or the version is too old for ONNX Runtime; then proceed with [installing Protobuf using the Aarch64 pre-built zip artifact](https://github.com/protocolbuffers/protobuf/releases), as discussed below.

#### Install Protobuf using the Pre-built Aarch64 ZIP Artifact

```console
$ wget https://github.com/protocolbuffers/protobuf/releases/download/vx.x/protoc-x.x-linux-aarch_64.zip -O protoc-x.x.zip
$ mkdir -p $HOME/tools/protoc-x.x
$ unzip protoc-x.x.zip -d $HOME/tools/protoc-x.x
$ echo 'export PATH="$HOME/tools/protoc-x.x/bin:$PATH"' >> ~/.bashrc
$ source ~/.bashrc
```

Then verify:
```console
$ protoc --version
```
Output:
```output
libprotoc x.x
```

### Clone and Build ONNX Runtime from Source:

The benchmarking tool, onnxruntime_perf_test, isn’t available as a pre-built binary artifact for any platform. So, you have to build it from the source, which is expected to take around 40-50 minutes.
Follow the below to build onnxruntime from the source code, and hence benchmark the module.

Install the required tools and clone onnxruntime:
```console
$ tdnf install -y protobuf-compiler libprotobuf-dev libprotoc-dev
$ git clone --recursive https://github.com/microsoft/onnxruntime
$ cd onnxruntime
```
Now, build the benchmark as below:

```console
$ ./build.sh --config Release --build_dir build/Linux --build_shared_lib --parallel --build --update --skip_tests
```
This will build the benchmark tool inside `./build/Linux/Release/onnxruntime_perf_test`.

### Run the benchmark
Now that the benchmarking tool has been built, you can benchmark the **squeezenet-int8.onnx** model, as below:

```console
$ ./build/Linux/Release/onnxruntime_perf_test -e cpu -r 100 -m times -s -Z -I <path-to-squeezenet-int8.onnx>
```

- **e cpu**: Use the CPU execution provider (not GPU or any other backend).
- **r 100**: Run 100 inferences.
- **m times**: Use "repeat N times" mode.
- **s**: Show detailed statistics.
- **Z**: Disable intra-op thread spinning (reduces CPU usage when idle between runs).
- **I**: Input the ONNX model path without using input/output test data.

### Benchmark summary on x86_64:

The following benchmark results are collected on two different x86_64 environments: a **Docker container running Azure Linux 3.0 hosted on a D4s_v6 Ubuntu-based Azure VM**, and a **D4s_v4 Azure VM created from the Azure Linux 3.0 image published by Ntegral Inc**.

| **Metric** | **Value on Docker Container** | **Value on Virtual Machine** |
|--------------------------|----------------------------------------|-----------------------------------------|
| **Average Inference Time** | 1.4713 ms | 1.8961 ms |
| **Throughput** | 679.48 inferences/sec | 527.25 inferences/sec |
| **CPU Utilization** | 100% | 95% |
| **Peak Memory Usage** | 39.8 MB | 36.1 MB |
| **P50 Latency** | 1.4622 ms | 1.8709 ms |
| **Max Latency** | 2.3384 ms | 2.7826 ms |
| **Latency Consistency** | Consistent | Consistent |


### Benchmark summary on Arm64:

The following benchmark results are collected on two different Arm64 environments: a **Docker container running Azure Linux 3.0 hosted on a D4ps_v6 Ubuntu-based Azure VM**, and a **D4ps_v6 Azure VM created from the Azure Linux 3.0 custom image using the Aarch64 ISO**.

| **Metric** | **Value on Docker Container** | **Value on Virtual Machine** |
|---------------------------|---------------------------------------|---------------------------------------------|
| **Average Inference Time**| 1.9183 ms | 1.9169 ms |
| **Throughput** | 521.09 inferences/sec | 521.41 inferences/sec |
| **CPU Utilization** | 98% | 100% |
| **Peak Memory Usage** | 35.36 MB | 33.57 MB |
| **P50 Latency** | 1.9165 ms | 1.9168 ms |
| **Max Latency** | 2.0142 ms | 1.9979 ms |
| **Latency Consistency** | Consistent | Consistent |


### Highlights from Azure Linux Arm64 Benchmarking (ONNX Runtime with SqueezeNet)
- **Low-Latency Inference:** Achieved consistent average inference times of ~1.92 ms across both Docker and VM environments on Arm64.
- **Strong and Stable Throughput:** Sustained throughput of over 521 inferences/sec using the squeezenet-int8.onnx model on D4ps_v6 instances.
- **Lightweight Resource Footprint:** Peak memory usage stayed below 36 MB, with CPU utilization reaching ~98–100%, ideal for efficient edge or cloud inference.
- **Consistent Performance:** P50 and Max latency remained tightly bound across both setups, showcasing reliable performance on Azure Cobalt 100 Arm-based infrastructure.
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
---
title: Create an Arm based cloud VM using Microsoft Cobalt 100 CPU
weight: 3

### FIXED, DO NOT MODIFY
layout: learningpathall
---

## Introduction

There are several ways to create an Arm-based Cobalt 100 VM : the Microsoft Azure console, the Azure CLI tool, or using your choice of IaC (Infrastructure as Code). This guide will use the Azure console to create a VM with Arm-based Cobalt 100 Processor.

This learning path focuses on the general-purpose VMs of the D series. Please read the guide on [Dpsv6 size series](https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/general-purpose/dpsv6-series) offered by Microsoft Azure.

If you have never used the Microsoft Cloud Platform before, please review the microsoft [guide to Create a Linux virtual machine in the Azure portal](https://learn.microsoft.com/en-us/azure/virtual-machines/linux/quick-create-portal?tabs=ubuntu).

#### Create an Arm-based Azure Virtual Machine

Creating a virtual machine based on Azure Cobalt 100 is no different from creating any other VM in Azure. To create an Azure virtual machine, launch the Azure portal and navigate to Virtual Machines.

Select “Create”, and fill in the details such as Name, and Region. Choose the image for your VM (for example – Ubuntu 24.04) and select “Arm64” as the VM architecture.

In the “Size” field, click on “See all sizes” and select the D-Series v6 family of VMs. Select “D4ps_v6” from the list and create the VM.

![Instance Screenshot](./instance.png)

The VM should be ready and running; you can SSH into the VM using the PEM key, along with the Public IP details.

{{% notice Note %}}

To learn more about Arm-based VMs in Azure, refer to “Getting Started with Microsoft Azure” in [Get started with Arm-based cloud instances](https://learn.arm.com/learning-paths/servers-and-cloud-computing/csp/azure) .

{{% /notice %}}
Loading