Description
🐛 Describe the bug
The selective build process allows for a reduction in binary size by pruning operators and dtypes unused by a model. In this flow, an exported model is passed in as a .pte
and a YAML file is generated that specifies what operators the model uses in it's kernels, and more specifically, which dtypes each kernel use. This process is accomplished from gen_oplist.py. However, it seems that this feature is incomplete, where if kernels depend on scalar types or mixing of types, this information is not reflected in the generated YAML. As a result, directly using an exported model fails to build a binary that can successfully run the model.
Example: MV2
Reproducing the Error
-
Pull the changes from PR dtype selective build from model API in OSS #11760
-
Change the model from
add_mul
tomv2
inexamples/selective_build/test_selective_build.sh:test_cmake_select_ops_in_model
by updating themodel_name
variable. It's also helpful to comment running the other known working examples (i.e. comment out calls totest_cmake_select_all_ops
,test_cmake_select_ops_in_list
, andtest_cmake_select_ops_in_yaml
at the bottom of the file). -
Run
CMAKE_BUILD_TYPE=Debug bash examples/selective_build/test_selective_build.sh cmake
Erroneous Output
Running selective build test
I 00:00:00.007215 executorch:executor_runner.cpp:166] Model file ./mv2.pte is loaded.
I 00:00:00.007265 executorch:executor_runner.cpp:175] Using method forward
I 00:00:00.007268 executorch:executor_runner.cpp:226] Setting up planned buffer 0, size 9936896.
I 00:00:00.013956 executorch:executor_runner.cpp:251] Method loaded.
E 00:00:00.119792 executorch:op_hardtanh.cpp:49] dtype '7' not selected for operator hardtanh.out
examples/selective_build/test_selective_build.sh: line 188: 97834 Aborted (core dumped) ${build_dir}/selective_build_test --model_path="./${model_export_name}"
Expected Correct Output
Running selective build test
I 00:00:00.006460 executorch:executor_runner.cpp:166] Model file ./mv2.pte is loaded.
I 00:00:00.006514 executorch:executor_runner.cpp:175] Using method forward
I 00:00:00.006518 executorch:executor_runner.cpp:226] Setting up planned buffer 0, size 9936896.
I 00:00:00.012543 executorch:executor_runner.cpp:251] Method loaded.
I 00:00:02.040406 executorch:executor_runner.cpp:286] Model executed successfully 1 time(s) in 2027.465383 ms.
I 00:00:02.040460 executorch:executor_runner.cpp:295] 1 outputs:
Output 0: tensor(sizes=[1, 1000], [
-0.50986, 0.300638, 0.0953863, 0.147721, 0.231201, 0.338555, 0.20689, -0.0575741, -0.389267, -0.0606858,
-0.0213996, -0.121034, -0.288955, 0.134052, -0.171977, -0.060362, 0.0203591, -0.0585306, 0.337859, -0.0718654,
0.490758, 0.524143, 0.197859, 0.122067, -0.35913, 0.10946, 0.347745, 0.478512, 0.226557, 0.0363519,
0.0159222, 0.351968, 0.259108, -0.0542904, 0.285078, -0.221401, 0.237158, -0.37855, 0.395099, -0.0668773,
0.357144, 0.400389, 0.389972, -0.189018, 0.243556, -0.103936, 0.59233, 0.00743124, -0.183807, -0.446251,
-0.182806, -0.679565, 0.663799, 0.560698, 0.36292, -0.0855703, 0.142371, 0.172887, 0.593105, 0.305173,
0.447632, -0.138463, -0.149108, 0.0632436, -0.123253, 0.511503, 0.519203, 0.392346, 0.731631, 0.765339,
0.460779, 0.611433, -0.209274, 0.328234, -0.142376, 0.699485, 0.0476216, 0.562073, 1.51457, 0.82576,
0.126681, 0.0498374, -0.0896502, -0.142817, -0.0252687, 0.00359075, 0.081921, -0.214227, 0.0404567, 0.105458,
-0.26851, -0.0829341, 0.331348, -0.345984, -0.134045, -0.291839, -0.11803, -0.102925, 0.158997, -0.0496262,
...,
-0.188095, -0.694422, 0.220409, -0.0921088, 0.761138, 0.212514, 0.0171788, 0.461986, 0.68566, -0.12282,
0.352448, 2.10309, 0.211247, 0.0732217, -0.366486, -0.500694, -0.00568692, -0.186638, 0.256018, 0.101071,
-0.112591, 0.0633926, 0.519903, -0.54318, -0.223358, 0.155168, -0.230606, -0.1803, -0.402723, -0.102211,
0.331329, -0.0324419, 0.428074, -0.253914, -0.192847, -0.207004, 0.521813, 0.121381, 0.284393, -0.160643,
0.0179822, 0.290285, 0.32836, 0.154162, 0.193863, 0.287697, -0.0284052, -0.119623, 0.955583, 0.581977,
0.808394, 0.669403, 0.272966, 0.16154, 0.379886, 0.212432, -0.325236, 0.100538, 0.292686, -0.382238,
-0.389105, 0.447179, -0.124381, 0.214349, 0.592604, -0.367158, 0.191234, 0.423559, 0.349306, 0.0348439,
-0.227163, 0.567011, 0.202894, 0.710074, 0.421646, -0.00655031, 0.0114807, 0.398907, 0.0349879, -0.163214,
0.187845, -0.154384, -0.227154, 0.150878, 0.265108, 0.0874923, -0.188225, 0.0213076, -0.0293802, -0.279631,
0.421222, 0.100449, -0.506771, -0.115821, -0.693017, -0.18326, 0.154781, -0.410681, 0.0119343, 0.449715,
])
Removing mv2.pte
Versions
Collecting environment information...
PyTorch version: 2.8.0.dev20250601+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: Ubuntu 24.04.2 LTS (x86_64)
GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
Clang version: Could not collect
CMake version: version 3.31.6
Libc version: glibc-2.39
Python version: 3.10.0 (default, Mar 3 2022, 09:58:08) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-6.6.87.1-microsoft-standard-WSL2-x86_64-with-glibc2.39
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 10
On-line CPU(s) list: 0-9
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) Ultra 7 165U
CPU family: 6
Model: 170
Thread(s) per core: 2
Core(s) per socket: 5
Socket(s): 1
Stepping: 4
BogoMIPS: 5376.00
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves avx_vnni vnmi umip waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize flush_l1d arch_capabilities
Virtualization: VT-x
Hypervisor vendor: Microsoft
Virtualization type: full
L1d cache: 240 KiB (5 instances)
L1i cache: 320 KiB (5 instances)
L2 cache: 10 MiB (5 instances)
L3 cache: 12 MiB (1 instance)
NUMA node(s): 1
NUMA node0 CPU(s): 0-9
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Mitigation; Enhanced IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI BHI_DIS_S
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] executorch==0.7.0a0+64e04ea
[pip3] flake8==6.1.0
[pip3] flake8-breakpoint==1.1.0
[pip3] flake8-bugbear==24.4.26
[pip3] flake8-comprehensions==3.14.0
[pip3] flake8-plugin-utils==1.3.3
[pip3] flake8-pyi==23.5.0
[pip3] mypy==1.14.1
[pip3] mypy_extensions==1.1.0
[pip3] numpy==2.2.6
[pip3] nvidia-cuda-cupti-cu12==12.6.80
[pip3] nvidia-cuda-nvrtc-cu12==12.6.77
[pip3] nvidia-cuda-runtime-cu12==12.6.77
[pip3] nvidia-curand-cu12==10.3.7.77
[pip3] nvidia-cusparselt-cu12==0.6.3
[pip3] nvidia-nccl-cu12==2.26.2
[pip3] nvidia-nvjitlink-cu12==12.6.85
[pip3] nvidia-nvtx-cu12==12.6.77
[pip3] pytorch_tokenizers==0.1.0
[pip3] torch==2.8.0.dev20250601+cpu
[pip3] torchao==0.12.0+gitbc68b11f
[pip3] torchaudio==2.8.0.dev20250601+cpu
[pip3] torchdata==0.11.0
[pip3] torchsr==1.0.4
[pip3] torchtune==0.6.1
[pip3] torchvision==0.23.0.dev20250601+cpu
[pip3] triton==3.3.0
[conda] executorch 0.7.0a0+64e04ea pypi_0 pypi
[conda] numpy 2.2.6 pypi_0 pypi
[conda] nvidia-cuda-cupti-cu12 12.6.80 pypi_0 pypi
[conda] nvidia-cuda-nvrtc-cu12 12.6.77 pypi_0 pypi
[conda] nvidia-cuda-runtime-cu12 12.6.77 pypi_0 pypi
[conda] nvidia-curand-cu12 10.3.7.77 pypi_0 pypi
[conda] nvidia-cusparselt-cu12 0.6.3 pypi_0 pypi
[conda] nvidia-nccl-cu12 2.26.2 pypi_0 pypi
[conda] nvidia-nvjitlink-cu12 12.6.85 pypi_0 pypi
[conda] nvidia-nvtx-cu12 12.6.77 pypi_0 pypi
[conda] pytorch-tokenizers 0.1.0 pypi_0 pypi
[conda] torch 2.8.0.dev20250601+cpu pypi_0 pypi
[conda] torchao 0.12.0+gitbc68b11f pypi_0 pypi
[conda] torchaudio 2.8.0.dev20250601+cpu pypi_0 pypi
[conda] torchdata 0.11.0 pypi_0 pypi
[conda] torchfix 0.6.0 pypi_0 pypi
[conda] torchsr 1.0.4 pypi_0 pypi
[conda] torchtune 0.6.1 pypi_0 pypi
[conda] torchvision 0.23.0.dev20250601+cpu pypi_0 pypi
[conda] triton 3.3.0 pypi_0 pypi
Metadata
Metadata
Assignees
Labels
Type
Projects
Status