pytorch
diff --git a/‎docs/source/index.md‎
Lines changed: 3 additions & 1 deletion b/‎docs/source/index.md‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎docs/source/tutorial-arm-ethos-u.md‎
Lines changed: 220 additions & 0 deletions b/‎docs/source/tutorial-arm-ethos-u.md‎
Lines changed: 220 additions & 0 deletions
@@ -149,7 +149,8 @@ using-executorch-faqs
 
 Building an ExecuTorch Android Demo App <https://github.com/pytorch-labs/executorch-examples/tree/main/dl3/android/DeepLabV3Demo#executorch-android-demo-app>
 Building an ExecuTorch iOS Demo App <https://github.com/meta-pytorch/executorch-examples/tree/main/mv3/apple/ExecuTorchDemo>
-tutorial-arm.md
+tutorial-arm-ethos-u
+tutorial-arm-vgf
 ```
 
 ```{toctree}
@@ -164,6 +165,7 @@ backends-coreml
 backends-mps
 backends-vulkan
 backends-arm-ethos-u
+backends-arm-vgf
 backends-qualcomm
 backends-mediatek
 backends-cadence
 
@@ -0,0 +1,220 @@
+# Arm Ethos-U NPU Backend Tutorial
+
+<!----This will show a grid card on the page----->
+::::{grid} 2
+
+:::{grid-item-card}  Tutorials we recommend you complete before this:
+:class-card: card-prerequisites
+* [Introduction to ExecuTorch](intro-how-it-works.md)
+* [Getting Started](getting-started.md)
+* [Building ExecuTorch with CMake](using-executorch-building-from-source.md)
+:::
+
+:::{grid-item-card}  What you will learn in this tutorial:
+:class-card: card-prerequisites
+In this tutorial you will learn how to export a simple PyTorch model for the ExecuTorch Ethos-U backend.
+:::
+
+::::
+
+```{warning}
+This delegate is under active development, to get best results please use a recent version.
+The TOSA and Ethos-U backend support is reasonably mature and used in production by some users.
+You may encounter some rough edges and features which may be documented or planned but not implemented, please refer to the in-tree documentation for the latest status of features.
+```
+
+```{tip}
+If you are already familiar with this delegate, you may want to jump directly to the examples:
+* [Examples in the ExecuTorch repository](https://github.com/pytorch/executorch/tree/main/examples/arm)
+* [A commandline compiler for example models](https://github.com/pytorch/executorch/blob/main/examples/arm/aot_arm_compiler.py)
+```
+
+This tutorial serves as an introduction to using ExecuTorch to deploy PyTorch models on Arm&reg; Ethos&trade;-U targets. It is based on `ethos_u_minimal_example.ipynb`, provided in Arm’s examples folder.
+
+## Prerequisites
+
+### Hardware
+
+To successfully complete this tutorial, you will need a Linux machine with aarch64 or x86_64 processor architecture, or a macOS&trade; machine with Apple&reg; Silicon.
+
+To enable development without a specific development board, we will be using a [Fixed Virtual Platform (FVP)](https://www.arm.com/products/development-tools/simulation/fixed-virtual-platforms), simulating [Arm&reg; Corstone&trade;-300](https://developer.arm.com/Processors/Corstone-300)(cs300) and [Arm&reg; Corstone&trade;-300](https://developer.arm.com/Processors/Corstone-320)(cs320)systems. Think of it as virtual hardware.
+
+### Software
+
+First, you will need to install ExecuTorch. Please follow the recommended tutorials to set up a working ExecuTorch development environment.
+
+In addition to this, you need to install a number of SDK dependencies for generating Ethos-U command streams. Scripts to automate this are available in the main [ExecuTorch repository](https://github.com/pytorch/executorch/tree/main/examples/arm/).
+To install Ethos-U dependencies, run
+```bash
+./examples/arm/setup.sh --i-agree-to-the-contained-eula
+```
+This will install:
+- [TOSA Serialization Library](https://www.mlplatform.org/tosa/software.html) for serializing the Exir IR graph into TOSA IR.
+- [Ethos-U Vela graph compiler](https://pypi.org/project/ethos-u-vela/) for compiling TOSA flatbuffers into a Ethos-U command stream.
+- [Arm GNU Toolchain](https://developer.arm.com/Tools%20and%20Software/GNU%20Toolchain) for cross compilation.
+- [Corstone SSE-300 FVP](https://developer.arm.com/documentation/100966/1128/Arm--Corstone-SSE-300-FVP) for testing on Ethos-U55 reference design.
+- [Corstone SSE-320 FVP](https://developer.arm.com/documentation/109760/0000/SSE-320-FVP) for testing on Ethos-U85 reference design.
+
+## Set Up the Developer Environment
+
+The setup.sh script generates a setup_path.sh script that you need to source whenever you restart your shell. Run:
+
+```{bash}
+source  examples/arm/ethos-u-scratch/setup_path.sh
+```
+
+As a simple check that your environment is set up correctly, run `which FVP_Corstone_SSE-320` and make sure that the executable is located where you expect, in the `examples/arm` tree.
+
+## Build
+
+### Ahead-of-Time (AOT) components
+
+The ExecuTorch Ahead-of-Time (AOT) pipeline takes a PyTorch Model (a `torch.nn.Module`) and produces a `.pte` binary file, which is then consumed by the ExecuTorch Runtime. This [document](getting-started-architecture.md) goes in much more depth about the ExecuTorch software stack for both AoT as well as Runtime.
+
+The example below shows how to quantize a model consisting of a single addition, and export it it through the AOT flow using the EthosU backend. For more details, see `examples/arm/ethos_u_minimal_example.ipynb`.
+
+```python
+import torch
+
+class Add(torch.nn.Module):
+    def forward(self, x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:
+        return x + y
+
+example_inputs = (torch.ones(1,1,1,1),torch.ones(1,1,1,1))
+
+model = Add()
+model = model.eval()
+exported_program = torch.export.export(model, example_inputs)
+graph_module = exported_program.module()
+
+
+from executorch.backends.arm.ethosu import EthosUCompileSpec
+from executorch.backends.arm.quantizer import (
+    EthosUQuantizer,
+    get_symmetric_quantization_config,
+)
+from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_pt2e
+
+# Create a compilation spec describing the target for configuring the quantizer
+# Some args are used by the Arm Vela graph compiler later in the example. Refer to Arm Vela documentation for an
+# explanation of its flags: https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-vela/-/blob/main/OPTIONS.md
+compile_spec = EthosUCompileSpec(
+            target="ethos-u55-128",
+            system_config="Ethos_U55_High_End_Embedded",
+            memory_mode="Shared_Sram",
+            extra_flags=["--output-format=raw", "--debug-force-regor"]
+        )
+
+# Create and configure quantizer to use a symmetric quantization config globally on all nodes
+quantizer = EthosUQuantizer(compile_spec)
+operator_config = get_symmetric_quantization_config()
+quantizer.set_global(operator_config)
+
+# Post training quantization
+quantized_graph_module = prepare_pt2e(graph_module, quantizer)
+quantized_graph_module(*example_inputs) # Calibrate the graph module with the example input
+quantized_graph_module = convert_pt2e(quantized_graph_module)
+
+
+# Create a new exported program using the quantized_graph_module
+quantized_exported_program = torch.export.export(quantized_graph_module, example_inputs)
+from executorch.backends.arm.ethosu import EthosUPartitioner
+from executorch.exir import (
+    EdgeCompileConfig,
+    ExecutorchBackendConfig,
+    to_edge_transform_and_lower,
+)
+from executorch.extension.export_util.utils import save_pte_program
+
+# Create partitioner from compile spec
+partitioner = EthosUPartitioner(compile_spec)
+
+# Lower the exported program to the Ethos-U backend
+edge_program_manager = to_edge_transform_and_lower(
+            quantized_exported_program,
+            partitioner=[partitioner],
+            compile_config=EdgeCompileConfig(
+                _check_ir_validity=False,
+            ),
+        )
+
+# Convert edge program to executorch
+executorch_program_manager = edge_program_manager.to_executorch(
+            config=ExecutorchBackendConfig(extract_delegate_segments=False)
+        )
+
+
+# Save pte file
+save_pte_program(executorch_program_manager, "ethos_u_minimal_example.pte")
+```
+
+
+```{tip}
+For a quick start, you can use the script `examples/arm/aot_arm_compiler.py` to produce the pte.
+To produce a pte file equivalent to the one above, run
+`python -m examples.arm.aot_arm_compiler --model_name=add --delegate --quantize --output=ethos_u_minimal_example.pte`
+```
+
+### Runtime:
+
+After the AOT compilation flow is done, the runtime can be cross compiled and linked to the produced `.pte`-file using the Arm cross-compilation toolchain. This is done in two steps:
+
+First, build and install the ExecuTorch libraries and EthosUDelegate:
+```
+# In ExecuTorch top-level, with sourced setup_path.sh
+cmake -DCMAKE_BUILD_TYPE=Release --preset arm-baremetal -B cmake-out-arm .
+cmake --build cmake-out-arm --target install -j$(nproc)
+```
+Second, build and link the `arm_executor_runner` and generate kernel bindings for any non delegated ops. This is the actual program that will run on target.
+
+```
+# In ExecuTorch top-level, with sourced setup_path.sh
+cmake -DCMAKE_TOOLCHAIN_FILE=`pwd`/examples/arm/ethos-u-setup/arm-none-eabi-gcc.cmake \
+      -DCMAKE_BUILD_TYPE=Release \
+      -DET_PTE_FILE_PATH=ethos_u_minimal_example.pte \
+      -DTARGET_CPU=cortex-m55 \
+      -DETHOSU_TARGET_NPU_CONFIG=ethos-u55-128 \
+      -DMEMORY_MODE=Shared_Sram \
+      -DSYSTEM_CONFIG=Ethos_U55_High_End_Embedded \
+      -Bethos_u_minimal_example \
+      examples/arm/executor_runner
+cmake --build ethos_u_minimal_example -j$(nproc) -- arm_executor_runner
+```
+
+```{tip}
+For a quick start, you can use the script `backends/arm/scripts/build_executor_runner.sh` to build the runner.
+To build a runner equivalent to the one above, run
+`./backends/arm/scripts/build_executor_runner.sh --pte=ethos_u_minimal_example.pte`
+````
+
+The block diagram below shows, at the high level, how the various build artifacts are generated and are linked together to generate the final bare-metal executable.
+
+![](arm-delegate-runtime-build.svg)
+
+
+
+## Running on Corstone FVP Platforms
+
+Finally, use the `backends/arm/scripts/run_fvp.sh` utility script to run the .elf-file on simulated Arm hardware.
+```
+backends/arm/scripts/run_fvp.sh --elf=$(find ethos_u_minimal_example -name arm_executor_runner) --target=ethos-u55-128
+```
+The example application is by default built with an input of ones, so the expected result of the quantized addition should be close to 2.
+
+
+## Takeaways
+
+In this tutorial you have learned how to use ExecuTorch to export a PyTorch model to an executable that can run on an embedded target, and then run that executable on simulated hardware.
+To learn more, check out these learning paths:
+
+https://learn.arm.com/learning-paths/embedded-and-microcontrollers/rpi-llama3/
+https://learn.arm.com/learning-paths/embedded-and-microcontrollers/visualizing-ethos-u-performance/
+
+## FAQs
+
+If you encountered any bugs or issues following this tutorial please file a bug/issue here on [Github](https://github.com/pytorch/executorch/issues/new).
+
+
+```
+Arm is a registered trademark of Arm Limited (or its subsidiaries or affiliates).
+```