Dtype not selected from exported pte via gen_oplist.py

### 🐛 Describe the bug

The selective build process allows for a reduction in binary size by pruning operators and dtypes unused by a model.  In this flow, an exported model is passed in as a `.pte` and a YAML file is generated that specifies what operators the model uses in it's kernels, and more specifically, which dtypes each kernel use. This process is accomplished from [gen_oplist.py](https://github.com/pytorch/executorch/blob/main/codegen/tools/gen_oplist.py). However, it seems that this feature is incomplete, where if kernels depend on [scalar types](https://github.com/pytorch/executorch/blob/main/codegen/tools/gen_oplist.py#L129-L131) or mixing of types, this information is not reflected in the generated YAML. As a result, directly using an exported model fails to build a binary that can successfully run the model.

# Example: MV2

## Reproducing the Error

1. Pull the changes from PR #11760

2. Change the model from `add_mul` to `mv2` in `examples/selective_build/test_selective_build.sh:test_cmake_select_ops_in_model` by updating the `model_name` variable. It's also helpful to comment running the other known working examples (i.e. comment out calls to `test_cmake_select_all_ops`, `test_cmake_select_ops_in_list`, and `test_cmake_select_ops_in_yaml` at the bottom of the file).
3. Run `CMAKE_BUILD_TYPE=Debug bash examples/selective_build/test_selective_build.sh cmake`

## Erroneous Output

```
Running selective build test
I 00:00:00.007215 executorch:executor_runner.cpp:166] Model file ./mv2.pte is loaded.
I 00:00:00.007265 executorch:executor_runner.cpp:175] Using method forward
I 00:00:00.007268 executorch:executor_runner.cpp:226] Setting up planned buffer 0, size 9936896.
I 00:00:00.013956 executorch:executor_runner.cpp:251] Method loaded.
E 00:00:00.119792 executorch:op_hardtanh.cpp:49] dtype '7' not selected for operator hardtanh.out
examples/selective_build/test_selective_build.sh: line 188: 97834 Aborted                 (core dumped) ${build_dir}/selective_build_test --model_path="./${model_export_name}"
```

## Expected Correct Output

```
Running selective build test
I 00:00:00.006460 executorch:executor_runner.cpp:166] Model file ./mv2.pte is loaded.
I 00:00:00.006514 executorch:executor_runner.cpp:175] Using method forward
I 00:00:00.006518 executorch:executor_runner.cpp:226] Setting up planned buffer 0, size 9936896.
I 00:00:00.012543 executorch:executor_runner.cpp:251] Method loaded.
I 00:00:02.040406 executorch:executor_runner.cpp:286] Model executed successfully 1 time(s) in 2027.465383 ms.
I 00:00:02.040460 executorch:executor_runner.cpp:295] 1 outputs:
Output 0: tensor(sizes=[1, 1000], [
  -0.50986, 0.300638, 0.0953863, 0.147721, 0.231201, 0.338555, 0.20689, -0.0575741, -0.389267, -0.0606858,
  -0.0213996, -0.121034, -0.288955, 0.134052, -0.171977, -0.060362, 0.0203591, -0.0585306, 0.337859, -0.0718654,
  0.490758, 0.524143, 0.197859, 0.122067, -0.35913, 0.10946, 0.347745, 0.478512, 0.226557, 0.0363519,
  0.0159222, 0.351968, 0.259108, -0.0542904, 0.285078, -0.221401, 0.237158, -0.37855, 0.395099, -0.0668773,
  0.357144, 0.400389, 0.389972, -0.189018, 0.243556, -0.103936, 0.59233, 0.00743124, -0.183807, -0.446251,
  -0.182806, -0.679565, 0.663799, 0.560698, 0.36292, -0.0855703, 0.142371, 0.172887, 0.593105, 0.305173,
  0.447632, -0.138463, -0.149108, 0.0632436, -0.123253, 0.511503, 0.519203, 0.392346, 0.731631, 0.765339,
  0.460779, 0.611433, -0.209274, 0.328234, -0.142376, 0.699485, 0.0476216, 0.562073, 1.51457, 0.82576,
  0.126681, 0.0498374, -0.0896502, -0.142817, -0.0252687, 0.00359075, 0.081921, -0.214227, 0.0404567, 0.105458,
  -0.26851, -0.0829341, 0.331348, -0.345984, -0.134045, -0.291839, -0.11803, -0.102925, 0.158997, -0.0496262,
  ...,
  -0.188095, -0.694422, 0.220409, -0.0921088, 0.761138, 0.212514, 0.0171788, 0.461986, 0.68566, -0.12282,
  0.352448, 2.10309, 0.211247, 0.0732217, -0.366486, -0.500694, -0.00568692, -0.186638, 0.256018, 0.101071,
  -0.112591, 0.0633926, 0.519903, -0.54318, -0.223358, 0.155168, -0.230606, -0.1803, -0.402723, -0.102211,
  0.331329, -0.0324419, 0.428074, -0.253914, -0.192847, -0.207004, 0.521813, 0.121381, 0.284393, -0.160643,
  0.0179822, 0.290285, 0.32836, 0.154162, 0.193863, 0.287697, -0.0284052, -0.119623, 0.955583, 0.581977,
  0.808394, 0.669403, 0.272966, 0.16154, 0.379886, 0.212432, -0.325236, 0.100538, 0.292686, -0.382238,
  -0.389105, 0.447179, -0.124381, 0.214349, 0.592604, -0.367158, 0.191234, 0.423559, 0.349306, 0.0348439,
  -0.227163, 0.567011, 0.202894, 0.710074, 0.421646, -0.00655031, 0.0114807, 0.398907, 0.0349879, -0.163214,
  0.187845, -0.154384, -0.227154, 0.150878, 0.265108, 0.0874923, -0.188225, 0.0213076, -0.0293802, -0.279631,
  0.421222, 0.100449, -0.506771, -0.115821, -0.693017, -0.18326, 0.154781, -0.410681, 0.0119343, 0.449715,
])
Removing mv2.pte
```

### Versions

```
Collecting environment information...
PyTorch version: 2.8.0.dev20250601+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: Ubuntu 24.04.2 LTS (x86_64)
GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
Clang version: Could not collect
CMake version: version 3.31.6
Libc version: glibc-2.39

Python version: 3.10.0 (default, Mar  3 2022, 09:58:08) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-6.6.87.1-microsoft-standard-WSL2-x86_64-with-glibc2.39
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        46 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               10
On-line CPU(s) list:                  0-9
Vendor ID:                            GenuineIntel
Model name:                           Intel(R) Core(TM) Ultra 7 165U
CPU family:                           6
Model:                                170
Thread(s) per core:                   2
Core(s) per socket:                   5
Socket(s):                            1
Stepping:                             4
BogoMIPS:                             5376.00
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves avx_vnni vnmi umip waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize flush_l1d arch_capabilities
Virtualization:                       VT-x
Hypervisor vendor:                    Microsoft
Virtualization type:                  full
L1d cache:                            240 KiB (5 instances)
L1i cache:                            320 KiB (5 instances)
L2 cache:                             10 MiB (5 instances)
L3 cache:                             12 MiB (1 instance)
NUMA node(s):                         1
NUMA node0 CPU(s):                    0-9
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Mitigation; Enhanced IBRS
Vulnerability Spec rstack overflow:   Not affected
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI BHI_DIS_S
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

Versions of relevant libraries:
[pip3] executorch==0.7.0a0+64e04ea
[pip3] flake8==6.1.0
[pip3] flake8-breakpoint==1.1.0
[pip3] flake8-bugbear==24.4.26
[pip3] flake8-comprehensions==3.14.0
[pip3] flake8-plugin-utils==1.3.3
[pip3] flake8-pyi==23.5.0
[pip3] mypy==1.14.1
[pip3] mypy_extensions==1.1.0
[pip3] numpy==2.2.6
[pip3] nvidia-cuda-cupti-cu12==12.6.80
[pip3] nvidia-cuda-nvrtc-cu12==12.6.77
[pip3] nvidia-cuda-runtime-cu12==12.6.77
[pip3] nvidia-curand-cu12==10.3.7.77
[pip3] nvidia-cusparselt-cu12==0.6.3
[pip3] nvidia-nccl-cu12==2.26.2
[pip3] nvidia-nvjitlink-cu12==12.6.85
[pip3] nvidia-nvtx-cu12==12.6.77
[pip3] pytorch_tokenizers==0.1.0
[pip3] torch==2.8.0.dev20250601+cpu
[pip3] torchao==0.12.0+gitbc68b11f
[pip3] torchaudio==2.8.0.dev20250601+cpu
[pip3] torchdata==0.11.0
[pip3] torchsr==1.0.4
[pip3] torchtune==0.6.1
[pip3] torchvision==0.23.0.dev20250601+cpu
[pip3] triton==3.3.0
[conda] executorch                0.7.0a0+64e04ea          pypi_0    pypi
[conda] numpy                     2.2.6                    pypi_0    pypi
[conda] nvidia-cuda-cupti-cu12    12.6.80                  pypi_0    pypi
[conda] nvidia-cuda-nvrtc-cu12    12.6.77                  pypi_0    pypi
[conda] nvidia-cuda-runtime-cu12  12.6.77                  pypi_0    pypi
[conda] nvidia-curand-cu12        10.3.7.77                pypi_0    pypi
[conda] nvidia-cusparselt-cu12    0.6.3                    pypi_0    pypi
[conda] nvidia-nccl-cu12          2.26.2                   pypi_0    pypi
[conda] nvidia-nvjitlink-cu12     12.6.85                  pypi_0    pypi
[conda] nvidia-nvtx-cu12          12.6.77                  pypi_0    pypi
[conda] pytorch-tokenizers        0.1.0                    pypi_0    pypi
[conda] torch                     2.8.0.dev20250601+cpu          pypi_0    pypi
[conda] torchao                   0.12.0+gitbc68b11f          pypi_0    pypi
[conda] torchaudio                2.8.0.dev20250601+cpu          pypi_0    pypi
[conda] torchdata                 0.11.0                   pypi_0    pypi
[conda] torchfix                  0.6.0                    pypi_0    pypi
[conda] torchsr                   1.0.4                    pypi_0    pypi
[conda] torchtune                 0.6.1                    pypi_0    pypi
[conda] torchvision               0.23.0.dev20250601+cpu          pypi_0    pypi
[conda] triton                    3.3.0                    pypi_0    pypi
```

cc @larryliu0820 @manuelcandales

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dtype not selected from exported pte via gen_oplist.py #11762

🐛 Describe the bug

Example: MV2

Reproducing the Error

Erroneous Output

Expected Correct Output

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dtype not selected from exported pte via gen_oplist.py #11762

Description

🐛 Describe the bug

Example: MV2

Reproducing the Error

Erroneous Output

Expected Correct Output

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions