Skip to content

Commit d25c35a

Browse files
authored
Arm backend: Split Arm tutorial into ethosu and vgf (#14299)
Align with minimal examples with regards to content and code. Signed-off-by: Erik Lundell <[email protected]>
1 parent 6e63c47 commit d25c35a

File tree

4 files changed

+443
-468
lines changed

4 files changed

+443
-468
lines changed

docs/source/index.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -149,7 +149,8 @@ using-executorch-faqs
149149
150150
Building an ExecuTorch Android Demo App <https://github.com/pytorch-labs/executorch-examples/tree/main/dl3/android/DeepLabV3Demo#executorch-android-demo-app>
151151
Building an ExecuTorch iOS Demo App <https://github.com/meta-pytorch/executorch-examples/tree/main/mv3/apple/ExecuTorchDemo>
152-
tutorial-arm.md
152+
tutorial-arm-ethos-u
153+
tutorial-arm-vgf
153154
```
154155

155156
```{toctree}
@@ -164,6 +165,7 @@ backends-coreml
164165
backends-mps
165166
backends-vulkan
166167
backends-arm-ethos-u
168+
backends-arm-vgf
167169
backends-qualcomm
168170
backends-mediatek
169171
backends-cadence
Lines changed: 220 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,220 @@
1+
# Arm Ethos-U NPU Backend Tutorial
2+
3+
<!----This will show a grid card on the page----->
4+
::::{grid} 2
5+
6+
:::{grid-item-card} Tutorials we recommend you complete before this:
7+
:class-card: card-prerequisites
8+
* [Introduction to ExecuTorch](intro-how-it-works.md)
9+
* [Getting Started](getting-started.md)
10+
* [Building ExecuTorch with CMake](using-executorch-building-from-source.md)
11+
:::
12+
13+
:::{grid-item-card} What you will learn in this tutorial:
14+
:class-card: card-prerequisites
15+
In this tutorial you will learn how to export a simple PyTorch model for the ExecuTorch Ethos-U backend.
16+
:::
17+
18+
::::
19+
20+
```{warning}
21+
This delegate is under active development, to get best results please use a recent version.
22+
The TOSA and Ethos-U backend support is reasonably mature and used in production by some users.
23+
You may encounter some rough edges and features which may be documented or planned but not implemented, please refer to the in-tree documentation for the latest status of features.
24+
```
25+
26+
```{tip}
27+
If you are already familiar with this delegate, you may want to jump directly to the examples:
28+
* [Examples in the ExecuTorch repository](https://github.com/pytorch/executorch/tree/main/examples/arm)
29+
* [A commandline compiler for example models](https://github.com/pytorch/executorch/blob/main/examples/arm/aot_arm_compiler.py)
30+
```
31+
32+
This tutorial serves as an introduction to using ExecuTorch to deploy PyTorch models on Arm&reg; Ethos&trade;-U targets. It is based on `ethos_u_minimal_example.ipynb`, provided in Arm’s examples folder.
33+
34+
## Prerequisites
35+
36+
### Hardware
37+
38+
To successfully complete this tutorial, you will need a Linux machine with aarch64 or x86_64 processor architecture, or a macOS&trade; machine with Apple&reg; Silicon.
39+
40+
To enable development without a specific development board, we will be using a [Fixed Virtual Platform (FVP)](https://www.arm.com/products/development-tools/simulation/fixed-virtual-platforms), simulating [Arm&reg; Corstone&trade;-300](https://developer.arm.com/Processors/Corstone-300)(cs300) and [Arm&reg; Corstone&trade;-300](https://developer.arm.com/Processors/Corstone-320)(cs320)systems. Think of it as virtual hardware.
41+
42+
### Software
43+
44+
First, you will need to install ExecuTorch. Please follow the recommended tutorials to set up a working ExecuTorch development environment.
45+
46+
In addition to this, you need to install a number of SDK dependencies for generating Ethos-U command streams. Scripts to automate this are available in the main [ExecuTorch repository](https://github.com/pytorch/executorch/tree/main/examples/arm/).
47+
To install Ethos-U dependencies, run
48+
```bash
49+
./examples/arm/setup.sh --i-agree-to-the-contained-eula
50+
```
51+
This will install:
52+
- [TOSA Serialization Library](https://www.mlplatform.org/tosa/software.html) for serializing the Exir IR graph into TOSA IR.
53+
- [Ethos-U Vela graph compiler](https://pypi.org/project/ethos-u-vela/) for compiling TOSA flatbuffers into a Ethos-U command stream.
54+
- [Arm GNU Toolchain](https://developer.arm.com/Tools%20and%20Software/GNU%20Toolchain) for cross compilation.
55+
- [Corstone SSE-300 FVP](https://developer.arm.com/documentation/100966/1128/Arm--Corstone-SSE-300-FVP) for testing on Ethos-U55 reference design.
56+
- [Corstone SSE-320 FVP](https://developer.arm.com/documentation/109760/0000/SSE-320-FVP) for testing on Ethos-U85 reference design.
57+
58+
## Set Up the Developer Environment
59+
60+
The setup.sh script generates a setup_path.sh script that you need to source whenever you restart your shell. Run:
61+
62+
```{bash}
63+
source examples/arm/ethos-u-scratch/setup_path.sh
64+
```
65+
66+
As a simple check that your environment is set up correctly, run `which FVP_Corstone_SSE-320` and make sure that the executable is located where you expect, in the `examples/arm` tree.
67+
68+
## Build
69+
70+
### Ahead-of-Time (AOT) components
71+
72+
The ExecuTorch Ahead-of-Time (AOT) pipeline takes a PyTorch Model (a `torch.nn.Module`) and produces a `.pte` binary file, which is then consumed by the ExecuTorch Runtime. This [document](getting-started-architecture.md) goes in much more depth about the ExecuTorch software stack for both AoT as well as Runtime.
73+
74+
The example below shows how to quantize a model consisting of a single addition, and export it it through the AOT flow using the EthosU backend. For more details, see `examples/arm/ethos_u_minimal_example.ipynb`.
75+
76+
```python
77+
import torch
78+
79+
class Add(torch.nn.Module):
80+
def forward(self, x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:
81+
return x + y
82+
83+
example_inputs = (torch.ones(1,1,1,1),torch.ones(1,1,1,1))
84+
85+
model = Add()
86+
model = model.eval()
87+
exported_program = torch.export.export(model, example_inputs)
88+
graph_module = exported_program.module()
89+
90+
91+
from executorch.backends.arm.ethosu import EthosUCompileSpec
92+
from executorch.backends.arm.quantizer import (
93+
EthosUQuantizer,
94+
get_symmetric_quantization_config,
95+
)
96+
from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_pt2e
97+
98+
# Create a compilation spec describing the target for configuring the quantizer
99+
# Some args are used by the Arm Vela graph compiler later in the example. Refer to Arm Vela documentation for an
100+
# explanation of its flags: https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-vela/-/blob/main/OPTIONS.md
101+
compile_spec = EthosUCompileSpec(
102+
target="ethos-u55-128",
103+
system_config="Ethos_U55_High_End_Embedded",
104+
memory_mode="Shared_Sram",
105+
extra_flags=["--output-format=raw", "--debug-force-regor"]
106+
)
107+
108+
# Create and configure quantizer to use a symmetric quantization config globally on all nodes
109+
quantizer = EthosUQuantizer(compile_spec)
110+
operator_config = get_symmetric_quantization_config()
111+
quantizer.set_global(operator_config)
112+
113+
# Post training quantization
114+
quantized_graph_module = prepare_pt2e(graph_module, quantizer)
115+
quantized_graph_module(*example_inputs) # Calibrate the graph module with the example input
116+
quantized_graph_module = convert_pt2e(quantized_graph_module)
117+
118+
119+
# Create a new exported program using the quantized_graph_module
120+
quantized_exported_program = torch.export.export(quantized_graph_module, example_inputs)
121+
from executorch.backends.arm.ethosu import EthosUPartitioner
122+
from executorch.exir import (
123+
EdgeCompileConfig,
124+
ExecutorchBackendConfig,
125+
to_edge_transform_and_lower,
126+
)
127+
from executorch.extension.export_util.utils import save_pte_program
128+
129+
# Create partitioner from compile spec
130+
partitioner = EthosUPartitioner(compile_spec)
131+
132+
# Lower the exported program to the Ethos-U backend
133+
edge_program_manager = to_edge_transform_and_lower(
134+
quantized_exported_program,
135+
partitioner=[partitioner],
136+
compile_config=EdgeCompileConfig(
137+
_check_ir_validity=False,
138+
),
139+
)
140+
141+
# Convert edge program to executorch
142+
executorch_program_manager = edge_program_manager.to_executorch(
143+
config=ExecutorchBackendConfig(extract_delegate_segments=False)
144+
)
145+
146+
147+
# Save pte file
148+
save_pte_program(executorch_program_manager, "ethos_u_minimal_example.pte")
149+
```
150+
151+
152+
```{tip}
153+
For a quick start, you can use the script `examples/arm/aot_arm_compiler.py` to produce the pte.
154+
To produce a pte file equivalent to the one above, run
155+
`python -m examples.arm.aot_arm_compiler --model_name=add --delegate --quantize --output=ethos_u_minimal_example.pte`
156+
```
157+
158+
### Runtime:
159+
160+
After the AOT compilation flow is done, the runtime can be cross compiled and linked to the produced `.pte`-file using the Arm cross-compilation toolchain. This is done in two steps:
161+
162+
First, build and install the ExecuTorch libraries and EthosUDelegate:
163+
```
164+
# In ExecuTorch top-level, with sourced setup_path.sh
165+
cmake -DCMAKE_BUILD_TYPE=Release --preset arm-baremetal -B cmake-out-arm .
166+
cmake --build cmake-out-arm --target install -j$(nproc)
167+
```
168+
Second, build and link the `arm_executor_runner` and generate kernel bindings for any non delegated ops. This is the actual program that will run on target.
169+
170+
```
171+
# In ExecuTorch top-level, with sourced setup_path.sh
172+
cmake -DCMAKE_TOOLCHAIN_FILE=`pwd`/examples/arm/ethos-u-setup/arm-none-eabi-gcc.cmake \
173+
-DCMAKE_BUILD_TYPE=Release \
174+
-DET_PTE_FILE_PATH=ethos_u_minimal_example.pte \
175+
-DTARGET_CPU=cortex-m55 \
176+
-DETHOSU_TARGET_NPU_CONFIG=ethos-u55-128 \
177+
-DMEMORY_MODE=Shared_Sram \
178+
-DSYSTEM_CONFIG=Ethos_U55_High_End_Embedded \
179+
-Bethos_u_minimal_example \
180+
examples/arm/executor_runner
181+
cmake --build ethos_u_minimal_example -j$(nproc) -- arm_executor_runner
182+
```
183+
184+
```{tip}
185+
For a quick start, you can use the script `backends/arm/scripts/build_executor_runner.sh` to build the runner.
186+
To build a runner equivalent to the one above, run
187+
`./backends/arm/scripts/build_executor_runner.sh --pte=ethos_u_minimal_example.pte`
188+
````
189+
190+
The block diagram below shows, at the high level, how the various build artifacts are generated and are linked together to generate the final bare-metal executable.
191+
192+
![](arm-delegate-runtime-build.svg)
193+
194+
195+
196+
## Running on Corstone FVP Platforms
197+
198+
Finally, use the `backends/arm/scripts/run_fvp.sh` utility script to run the .elf-file on simulated Arm hardware.
199+
```
200+
backends/arm/scripts/run_fvp.sh --elf=$(find ethos_u_minimal_example -name arm_executor_runner) --target=ethos-u55-128
201+
```
202+
The example application is by default built with an input of ones, so the expected result of the quantized addition should be close to 2.
203+
204+
205+
## Takeaways
206+
207+
In this tutorial you have learned how to use ExecuTorch to export a PyTorch model to an executable that can run on an embedded target, and then run that executable on simulated hardware.
208+
To learn more, check out these learning paths:
209+
210+
https://learn.arm.com/learning-paths/embedded-and-microcontrollers/rpi-llama3/
211+
https://learn.arm.com/learning-paths/embedded-and-microcontrollers/visualizing-ethos-u-performance/
212+
213+
## FAQs
214+
215+
If you encountered any bugs or issues following this tutorial please file a bug/issue here on [Github](https://github.com/pytorch/executorch/issues/new).
216+
217+
218+
```
219+
Arm is a registered trademark of Arm Limited (or its subsidiaries or affiliates).
220+
```

0 commit comments

Comments
 (0)