Skip to content

Disable torchao kernels by default #12794

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 24, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/trunk.yml
Original file line number Diff line number Diff line change
Expand Up @@ -325,7 +325,7 @@ jobs:
eval "$(conda shell.bash hook)"

# Install requirements
${CONDA_RUN} python install_executorch.py
${CONDA_RUN} EXECUTORCH_BUILD_TORCHAO=1 python install_executorch.py
${CONDA_RUN} sh examples/models/llama/install_requirements.sh

# Run test
Expand Down
7 changes: 6 additions & 1 deletion examples/models/llama/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -338,7 +338,12 @@ Please refer to [this tutorial](https://pytorch.org/executorch/main/llm/llama-de

## Running with low-bit kernels

We now give instructions for quantizating and running your model with low-bit kernels. These are still experimental, and require you do development on an Arm-based Mac. Also note that low-bit quantization often requires QAT (quantization-aware training) to give good quality results.
We now give instructions for quantizating and running your model with low-bit kernels. These are still experimental, and require you do development on an Arm-based Mac, and install executorch from source with the environment variable EXECUTORCH_BUILD_TORCHAO=1 defined:
```
EXECUTORCH_BUILD_TORCHAO=1 python install_executorch.py
```

Also note that low-bit quantization often requires QAT (quantization-aware training) to give good quality results.

First export your model for lowbit quantization (step 2 above):

Expand Down
9 changes: 8 additions & 1 deletion install_requirements.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,14 @@ def install_requirements(use_pytorch_nightly):
# Install packages directly from local copy instead of pypi.
# This is usually not recommended.
new_env = os.environ.copy()
new_env["USE_CPP"] = "1" # install torchao kernels
if ("EXECUTORCH_BUILD_TORCHAO" not in new_env) or (
new_env["EXECUTORCH_BUILD_TORCHAO"] == "0"
):
new_env["USE_CPP"] = "0"
else:
assert new_env["EXECUTORCH_BUILD_TORCHAO"] == "1"
new_env["USE_CPP"] = "1"
new_env["CMAKE_POLICY_VERSION_MINIMUM"] = "3.5"
subprocess.run(
[
sys.executable,
Expand Down
3 changes: 3 additions & 0 deletions tools/cmake/preset/llm.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,9 @@ set_overridable_option(EXECUTORCH_BUILD_XNNPACK ON)
if(CMAKE_SYSTEM_NAME STREQUAL "Darwin")
set_overridable_option(EXECUTORCH_BUILD_COREML ON)
set_overridable_option(EXECUTORCH_BUILD_MPS ON)
if(CMAKE_SYSTEM_PROCESSOR STREQUAL "arm64")
set_overridable_option(EXECUTORCH_BUILD_TORCHAO ON)
endif()
elseif(CMAKE_SYSTEM_NAME STREQUAL "Linux")
# Linux-specific code here
elseif(CMAKE_SYSTEM_NAME STREQUAL "Windows" OR CMAKE_SYSTEM_NAME STREQUAL "WIN32")
Expand Down
Loading