Skip to content

[Android/XNNPACK] SIGSEGV in XNNWeightsCache::look_up_or_insert during memcmp on MediaTek Dimensity 6100+ (Galaxy M15) #17669

@lordlugo

Description

@lordlugo

🐛 Describe the Bug

1. System Information

Field Detail
ExecuTorch Version 1.1.0
OS Platform Android 14 (One UI 6)
Target Architecture arm64-v8a
Crashing Hardware Samsung Galaxy M15 5G (MediaTek Dimensity 6100+ / Mali-G57 MC2)
Working Control Hardware Samsung Galaxy A16 4G (MediaTek Helio G99 / Mali-G57 MC2)
RAM Variants Tested Issue occurs on 4 GB / 6 GB / 8 GB variants of the M15

2. Expected Behavior

When calling module.forward() on a background HandlerThread, the XNNPACK delegate should successfully parse the memory-mapped .pte file, repack the convolution weights into the XNNWeightsCache, and execute the inference graph. This exact behavior succeeds flawlessly over 100 consecutive times on the control device (Galaxy A16).


3. Actual Behavior & Crash Log

On the Galaxy M15, the application instantly terminates with a native Segmentation Fault (SIGSEGV) during the XNNPACK backend initialization phase. The crash specifically occurs deep within the C standard library's memory comparison function (__memcmp_aarch64) while XNNPACK is attempting to index the weights cache.

Backtrace

pid: 0, tid: 13533 >>> com.borzai.vu <<<

backtrace:
#00 pc 0x00000000000a398c /apex/com.android.runtime/lib64/bionic/libc.so (__memcmp_aarch64+12)
#01 pc 0x0000000000362fa8 /data/app/.../split_config.arm64_v8a.apk!libexecutorch_jni.so
(executorch::backends::xnnpack::delegate::XNNWeightsCache::look_up_or_insert(
executorch::backends::xnnpack::delegate::XNNWeightsCache*,
xnn_weights_cache_look_up_key const*, void*, unsigned long)+92)
#02 pc 0x0000000000403dc8 /data/app/.../split_config.arm64_v8a.apk!libexecutorch_jni.so ...
#04 pc 0x00000000004019cc /data/app/.../split_config.arm64_v8a.apk!libexecutorch_jni.so
(create_convolution2d_nhwc_f32+700)
#05 pc 0x0000000000401b20 /data/app/.../split_config.arm64_v8a.apk!libexecutorch_jni.so
(xnn_create_convolution2d_nhwc_f32+264)
...
#08 pc 0x00000000003603b0 /data/app/.../split_config.arm64_v8a.apk!libexecutorch_jni.so
(executorch::backends::xnnpack::delegate::XNNCompiler::compileModel(
void const*, unsigned long,
executorch::backends::xnnpack::delegate::XNNExecutor*,
executorch::backends::xnnpack::delegate::XNNWeightsCache*,
xnn_workspace*, executorch::runtime::NamedDataMap const*)+1340)
#09 pc 0x0000000000362274 /data/app/.../split_config.arm64_v8a.apk!libexecutorch_jni.so
(executorch::backends::XnnpackBackend::init(
executorch::runtime::BackendInitContext&,
executorch::runtime::FreeableBuffer*,
executorch::runtime::ArrayRef<executorch::runtime::CompileSpec>) const+176)
...
#14 pc 0x0000000000393c50 /data/app/.../split_config.arm64_v8a.apk!libexecutorch_jni.so
(executorch::extension::module::Module::execute(...))
#20 pc 0x000000000044a0fa /data/app/.../base.apk
(org.pytorch.executorch.Module.execute+58)
#24 pc 0x0000000000d404da /data/app/.../base.apk
(com.borzai.vu.MainActivity.runInferenceOnThread+90)


4. Context and Architectural Analysis

The underlying cause might be hardware/kernel-specific rather than a universal memory leak, shown by the successful execution on the structurally similar Helio G99 (A16). Both processors utilize the identical ARM Cortex-A76/A55 Big.LITTLE architecture and share the 4 GB RAM baseline.

Given that memcmp triggers the fault, a null pointer or misaligned memory read is being passed into look_up_or_insert.

cc @GregoryComer @digantdesai @cbilgin @kirklandsign

Metadata

Metadata

Assignees

Labels

module: androidIssues related to Android code, build, and executionmodule: xnnpackIssues related to xnnpack delegation and the code under backends/xnnpack/

Type

No type

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions