Skip to content

Commit df4e08c

Browse files
committed
how-to: Android Debugging Crash Course
1 parent 75b96e1 commit df4e08c

File tree

1 file changed

+347
-0
lines changed

1 file changed

+347
-0
lines changed
Lines changed: 347 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,347 @@
1+
---
2+
layout: post
3+
title: Android Debugging Crash Course
4+
author: Nicholas Lim (niclimcy) & Nolen Johnson (npjohnson)
5+
---
6+
7+
## Understanding the different tools used during debugging
8+
9+
![hero](https://lineageos.org/images/engineering/hero_debugging.jpg)
10+
11+
## Glossary
12+
* ADB: Android Debug Bridge.
13+
* Buffer: A fixed size storage area in memory.
14+
* CLI: Command-line interface.
15+
* Commits: An atomic change to a codebase, used for version control.
16+
* Debugging: The process of finding and fixing errors, bugs, and unintended behavior.
17+
* Device block files: Special files in the /dev directory that allow for standardized interaction with kernel drivers.
18+
* DTS: Device Tree Source.
19+
* EDL: Qualcomm's Emergency Download mode.
20+
* gdb: GNU Debugger.
21+
* HAL: Hardware Abstraction Layer.
22+
* Kernel Space: Space where kernel runs and interacts with device drivers.
23+
* Logging: Recording and storing events that occur when running software, such as error messages, warnings, and debugging information.
24+
* Memory Address: A unique identifier that specifies the location in memory where data or instructions are stored.
25+
* OEM: Original equipment manufacturer (e.g., Google, Fairphone, Samsung, etc.).
26+
* PID: Process ID.
27+
* pstore: Persistent Store.
28+
* Rebase: The process of moving commits from one branch to another branch.
29+
* Stack Trace: Shows the sequence of function calls that led to an error or exception in a program.
30+
* TID: Thread ID.
31+
* UART: Universal Asynchronous Receiver / Transmitter.
32+
* User Space: Space where normal user processes run, such as applications.
33+
34+
## What is Debugging?
35+
To understand Android debugging, it is important to understand the different parts of the Android system. At a high level, the Android system is made up of three main components: apps, the platform, and the kernel.
36+
37+
![Android Stack](https://lineageos.org/images/engineering/content_android_stack.png){: .blog_post_image_content }
38+
39+
## User Space Debugging
40+
User space debugging allows us to find and fix app and platform issues. This process can be fairly straightforward on Android if we use the right tools.
41+
42+
### ADB
43+
The Android Debug Bridge (ADB) allows us to access a device’s CLI (or shell), letting us use native debugging tools like Logcat. See our wiki on how you can start [Using ADB and fastboot](https://wiki.lineageos.org/adb_fastboot_guide) on your devices.
44+
45+
### logcat
46+
`logcat` is a CLI tool that outputs a log of system messages, including messages that you have written from your app with the Log class.
47+
48+
There are various circular buffers stored by the `logcat` process, and they can be accessed using the `-b` option, with the following options available:
49+
50+
* `radio`: Views the buffer that contains radio/telephony related messages.
51+
* `events`: Views the interpreted binary system event buffer messages.
52+
* `main`: Views the main log buffer (default), which doesn't contain system and crash log messages.
53+
* `system`: Views the system log buffer (default).
54+
* `crash`: Views the crash log buffer (default).
55+
* `all`: Views all buffers.
56+
* `default`: Reports main, system, and crash buffers.
57+
58+
You can find out more about how to use `logcat` on [Android Developers](https://developer.android.com/tools/logcat).
59+
60+
Interpreting a crash buffer from `logcat`:
61+
62+
```
63+
$ adb logcat -b crash
64+
+--------------------+-----+-----+-------+-----------------------------------------------------------------------+
65+
| Date Time | PID | TID | Level | ProcessName : Message |
66+
+--------------------+-----+-----+-------+-----------------------------------------------------------------------+
67+
| 04-14 11:22:34.256 | 5199| 5199| E | AndroidRuntime: FATAL EXCEPTION: main |
68+
| 04-14 11:22:34.256 | 5199| 5199| E | AndroidRuntime: Process: com.android.settings, PID: 5199 |
69+
| 04-14 11:22:34.256 | 5199| 5199| E | AndroidRuntime: java.lang.RuntimeException: Unable to resume activity |
70+
| | | | | {com.android.settings/com.android.settings.SubSettings}: |
71+
| | | | | java.lang.ArrayIndexOutOfBoundsException length=7; index=7 |
72+
+--------------------+-----+-----+-------+-----------------------------------------------------------------------+
73+
```
74+
75+
The crash buffer is particularly useful for debugging app crashes (e.g., Settings has stopped), identifying runtime errors.
76+
77+
### Tombstones
78+
Sometimes the ADB service may not be running (possible reasons include a system process causing a reboot before adb has started). In such a case, we are not able to access the logcat command. Fret not, a tombstone file is written to `/data/tombstones`, which contains a stack trace leading up to the crash.
79+
80+
Tombstones are also more detailed, providing a longer stack trace if a logcat output is insufficient. It is hence also possible to export the tombstone from a running process using the following command:
81+
82+
```
83+
$ adb shell debuggerd {PID}
84+
```
85+
86+
Hint: Replace `{PID}` with the actual Process ID of the process you want.
87+
88+
### Stack
89+
`stack` is a Python script that represents crash dumps in a human readable format (symbolizes native crash dumps). You can find `stack` in any local sync of LineageOS repositories at [~/android/lineage/development/scripts/stack](https://android.googlesource.com/platform/development/+/refs/heads/main/scripts/stack). You can run stack on an extracted tombstone using `stack < /path/to/tombstone_0`.
90+
91+
A native crash dump will usually look like this:
92+
93+
```
94+
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
95+
Build fingerprint: 'Android/aosp_angler/angler:7.1.1/NYC/enh12211018:eng/test-keys'
96+
Revision: '0'
97+
ABI: 'arm'
98+
pid: 17946, tid: 17949, name: crasher >>> crasher <<<
99+
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xc
100+
r0 0000000c r1 00000000 r2 00000000 r3 00000000
101+
r4 00000000 r5 0000000c r6 eccdd920 r7 00000078
102+
r8 0000461a r9 ffc78c19 sl ab209441 fp fffff924
103+
ip ed01b834 sp eccdd800 lr ecfa9a1f pc ecfd693e cpsr 600e0030
104+
105+
backtrace:
106+
#00 pc 0004793e /system/lib/libc.so (pthread_mutex_lock+1)
107+
#01 pc 0001aa1b /system/lib/libc.so (readdir+10)
108+
#02 pc 00001b91 /system/xbin/crasher (readdir_null+20)
109+
#03 pc 0000184b /system/xbin/crasher (do_action+978)
110+
#04 pc 00001459 /system/xbin/crasher (thread_callback+24)
111+
#05 pc 00047317 /system/lib/libc.so (_ZL15__pthread_startPv+22)
112+
#06 pc 0001a7e5 /system/lib/libc.so (__start_thread+34)
113+
Tombstone written to: /data/tombstones/tombstone_06
114+
```
115+
116+
Running `stack < /data/tombstones/tombstone_06` will show the following:
117+
118+
```
119+
Revision: '0'
120+
pid: 17946, tid: 17949, name: crasher >>> crasher <<<
121+
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xc
122+
r0 0000000c r1 00000000 r2 00000000 r3 00000000
123+
r4 00000000 r5 0000000c r6 eccdd920 r7 00000078
124+
r8 0000461a r9 ffc78c19 sl ab209441 fp fffff924
125+
ip ed01b834 sp eccdd800 lr ecfa9a1f pc ecfd693e cpsr 600e0030
126+
Using arm toolchain from: ~/android/lineage/prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.9/bin/
127+
128+
Stack Trace:
129+
RELADDR FUNCTION FILE:LINE
130+
0004793e pthread_mutex_lock+2 bionic/libc/bionic/pthread_mutex.cpp:515
131+
v------> ScopedPthreadMutexLocker bionic/libc/private/ScopedPthreadMutexLocker.h:27
132+
0001aa1b readdir+10 bionic/libc/bionic/dirent.cpp:120
133+
00001b91 readdir_null+20 system/core/debuggerd/crasher.cpp:131
134+
0000184b do_action+978 system/core/debuggerd/crasher.cpp:228
135+
00001459 thread_callback+24 system/core/debuggerd/crasher.cpp:90
136+
00047317 __pthread_start(void*)+22 bionic/libc/bionic/pthread_create.cpp:202 (discriminator 1)
137+
0001a7e5 __start_thread+34 bionic/libc/bionic/clone.cpp:46 (discriminator 1)
138+
```
139+
140+
`stack` works quite similarly to a Kernel Space Debugging tool, `decode_stacktrace.sh`. They both provide which exact file and line of the original code the Stack Trace is referencing. Keep on reading to find out more about how to use `decode_stacktrace.sh`.
141+
142+
### ramoops-pmsg
143+
ramoops-pmsg is a user space accessible version of ramoops. To access these logs prior to last reboot from pstore, you can run:
144+
145+
```
146+
$ adb logcat -b all -L
147+
```
148+
149+
A more detailed explanation of the ramoops kernel feature can be found below.
150+
151+
## Kernel Space Debugging
152+
Kernel space debugging helps us identify issues within the kernel. Device manufacturers, on top of providing device kernel drivers, may customize other parts of the kernel when releasing a device kernel source. As such, when a device maintainer rebases their device kernel on newer kernel versions (to keep up with security patches), regressions may occur.
153+
154+
### dmesg
155+
`dmesg` is a CLI tool that displays kernel buffer messages. It provides a detailed view of kernel-level activities, allowing device maintainers to diagnose system crashes, driver issues, and monitor system events. Do note that all LineageOS builds come with SELinux Enforcing by default, which requires you to be in `adb root` mode before using dmesg.
156+
157+
Example of a truncated dmesg output showing a NULL pointer dereference:
158+
159+
```
160+
# adb shell dmesg
161+
| Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
162+
| Internal error: Oops: 96000006 [#1] SMP
163+
| Call trace:
164+
| update_insn_emulation_mode+0xc0/0x148
165+
| emulation_proc_handler+0x64/0xb8
166+
| proc_sys_call_handler+0x9c/0xf8
167+
| proc_sys_write+0x18/0x20
168+
| __vfs_write+0x20/0x48
169+
| vfs_write+0xe4/0x1d0
170+
| ksys_write+0x70/0xf8
171+
| __arm64_sys_write+0x20/0x28
172+
| el0_svc_common.constprop.0+0x7c/0x1c0
173+
| el0_svc_handler+0x2c/0xa0
174+
| el0_svc+0x8/0x200
175+
```
176+
177+
### ramoops
178+
ramoops is a [Linux Kernel feature](https://www.kernel.org/doc/html/next/admin-guide/ramoops.html) that writes to memory before a system crashes. ramoops can be set-up in a device’s kernel device tree source (DTS), by reserving a memory buffer for ramoops-pmsg. It works in conjunction with the kernel pstore driver to save the ramoops to a persistent file at `/sys/fs/pstore` before a reboot.
179+
180+
An example of a ramoops configuration with a pmsg buffer allocated:
181+
182+
```
183+
/{
184+
reserved-memory {
185+
ramoops: ramoops@b0000000 {
186+
compatible = "ramoops";
187+
reg = <0 0xb0000000 0 0x00400000>;
188+
record-size = <0x40000>; /*256x1024*/
189+
console-size = <0x40000>;
190+
ftrace-size = <0x40000>;
191+
pmsg-size = <0x200000>;
192+
ecc-size = <0x0>;
193+
};
194+
};
195+
};
196+
```
197+
198+
On newer kernels, ramoops can be enabled via the following config options:
199+
200+
```
201+
CONFIG_PSTORE=y
202+
CONFIG_PSTORE_CONSOLE=y
203+
CONFIG_PSTORE_RAM=y
204+
```
205+
206+
pstore is usually compressed by default, making it harder to use during debugging. You might want to disable it via:
207+
208+
```
209+
# CONFIG_PSTORE_COMPRESS is not set
210+
```
211+
212+
Although ramoops and pstore are powerful tools, there are some caveats to using them. Since pstore, by default, writes data as buffered, and we typically use it only when a system is about to crash, we tend to see a lot of corruption when retrieving a pstore afterwards.
213+
214+
### addr2line
215+
`dmesg` and ramoops often produces cryptic memory addresses in stack traces such as `ffffff9405cebf10` from:
216+
217+
```
218+
CFI failure (target: [\<\ffffff9405cebf10\>] __typeid__ZTSFvP10net_deviceE_global_addr+0x170/0x17c):
219+
```
220+
221+
In this case, we can use Address To Line (addr2line) to find the specific file and line the issue is occurring using:
222+
```
223+
$ addr2line -e /path/to/kernel-module.o ffffff9405cebf10
224+
```
225+
226+
### decode_stacktrace.sh
227+
`decode_stacktrace.sh` is a script bundled with every Linux Kernel source at [linux/blob/master/scripts/decode_stacktrace.sh](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/decode_stacktrace.sh) that makes use of `addr2line`. To use it, you first have to enable `CONFIG_DEBUG_INFO=y` in your kernel config and build the kernel.
228+
229+
Next you have to extract a call trace from dmesg of a kernel panic you are trying to debug and save it in a text file like this dmesg.txt here:
230+
231+
```
232+
| update_insn_emulation_mode+0xc0/0x148
233+
| emulation_proc_handler+0x64/0xb8
234+
| proc_sys_call_handler+0x9c/0xf8
235+
| proc_sys_write+0x18/0x20
236+
| __vfs_write+0x20/0x48
237+
| vfs_write+0xe4/0x1d0
238+
| ksys_write+0x70/0xf8
239+
| __arm64_sys_write+0x20/0x28
240+
| el0_svc_common.constprop.0+0x7c/0x1c0
241+
| el0_svc_handler+0x2c/0xa0
242+
| el0_svc+0x8/0x200
243+
```
244+
245+
Finally, point `decode_stacktrace.sh` to the dmesg.txt file you have created as well as the kernel you have compiled:
246+
247+
```
248+
249+
$ ./scripts/decode_stacktrace.sh /path/to/vmlinux /path/to/kernel-source-dir < dmesg.txt
250+
```
251+
252+
As you can see in the following example (of a different stack trace), each call's memory address in the stack trace has been replaced with a specific file and line of the code which you can then refer to with your kernel source.
253+
254+
```
255+
| dump_stack (lib/dump_stack.c:52)
256+
| warn_slowpath_common (kernel/panic.c:418)
257+
| warn_slowpath_null (kernel/panic.c:453)
258+
| _oalloc_pages_slowpath+0x6a/0x7d0
259+
| ? zone_watermark_ok (mm/page_alloc.c:1728)
260+
| ? get_page_from_freelist (mm/page_alloc.c:1939)
261+
| __alloc_pages_nodemask (mm/page_alloc.c:2766)
262+
```
263+
264+
### Serial / gdb
265+
Devices with UART ports (see older Nexuses/Google Pixels with 3.5mm headphone port, and newer Google Pixels with USB-C debuggers) can be connected using a UART cable to view kernel console messages (kgdb). With serial, you can debug issues that happen even before the kernel has started.
266+
267+
You can consider using serial when your device is for example stuck on the boot logo.
268+
269+
### Desperate Debugging
270+
If all else fails, you can use `panic()` in portions of the kernel you wish to debug. [SebaUbuntu's patch here](https://github.com/xiaomi-sm8150-devs/android_kernel_xiaomi_sm8150-legacy/commit/9d8822a6967ee623790270539a929942b71f191b) demonstrates the use of `panic()` to catch early init issues.
271+
272+
## Chipset Vendor / OEM Specific Debugging
273+
Here are some custom debugging tools developed by OEMs we found over the years that have proved helpful.
274+
275+
### EDL memorydump (qcom)
276+
![Qualcomm CrashDump](https://lineageos.org/images/engineering/content_qualcomm_crashdump.png){: .blog_post_image_content }
277+
278+
Some Qualcomm devices have CrashDump enabled, which allows you to use Qualcomm's firehose tool to get a memorydump. As the firehose tool is closed source, we recommend using a rewritten open source version of the tool by Bjoern Kerler, which can be found here at [bkerler/edl](https://github.com/bkerler/edl). You can retrieve a memory dump using `edl memorydump`.
279+
280+
### /dev/block/by-name/debug (Samsung)
281+
`/dev/block/by-name/debug` is a special device block file on Samsung devices that contains a stream of XBL logs, kernel logs, and more. You can run `adb pull /dev/block/by-name/debug debug.bin` to dump the stream of logs.
282+
283+
Example of the truncated debug.bin file:
284+
285+
```
286+
{340532} ** XBL(1) **
287+
{340532}
288+
Format: Log Type - Time(microsec) - Message - Optional Info
289+
Log Type: B - Since Boot(Power On Reset), D - Delta, S - Statistic
290+
S - QC_IMAGE_VERSION_STRING=BOOT.XF.2.1-00133-SDM710LZB-3
291+
S - IMAGE_VARIANT_STRING=SDM670LA
292+
S - OEM_IMAGE_VERSION_STRING=21DJFC21
293+
S - Boot Interface: eMMC
294+
S - Secure Boot: On
295+
S - Boot Config @ 0x00786070 = 0x000000c9
296+
S - JTAG ID @ 0x00786130 = 0x100910e1
297+
S - OEM ID @ 0x00786138 = 0x00200000
298+
S - Feature Config Row 0 @ 0x007841a0 = 0x08d020000b588420
299+
S - Feature Config Row 1 @ 0x007841a8 = 0xe0140000000311a0
300+
S - Core 0 Frequency, 1516 MHz
301+
S - PBL Patch Ver: 0
302+
S - PBL freq: 600 MHZ
303+
S - I-cache: On
304+
S - D-cache: On
305+
```
306+
307+
Our previous Engineering Blog on [Qualcomm's Chain of Trust](https://lineageos.org/engineering/Qualcomm-Firmware) goes into more detail about what eXtensible Bootloader (XBL) is.
308+
309+
## Common errors (and how to solve them)
310+
Now that we have a basic understanding of some of the debugging tools used during Android development, let's now learn how to identify and fix common errors faced during debugging.
311+
312+
### dlopen failed
313+
Many devices have prebuilt libraries that are compiled with older versions of libraries that have dropped certain symbols. An error that might occur will look something like this:
314+
315+
```
316+
* java.lang.UnsatisfiedLinkError: dlopen failed: cannot locate symbol "_ZN7android21SurfaceComposerClient11Transaction5applyEb" referenced by "/product/lib64/libsecureuisvc_jni.so"...
317+
```
318+
319+
To solve this error, we require interpolating libraries that we colloquially refer to as "shims". By intercepting calls to the missing functions and providing alternative implementations, we can essentially emulate the behavior of the original libraries built with the prebuilt libraries you may be using.
320+
321+
Alternatively, some modern devices opt to copy an older VNDK version of the updated library to `$libname-v$vndkVersion.so`, then patchelf the problematic library to load this versioned library.
322+
323+
You can check out the pre-existing shims that we have commonized [here](https://github.com/LineageOS/android_hardware_lineage_compat) which were previously managed by individual device maintainers prior to LineageOS 20. For the example above, you can refer to [this](https://review.lineageos.org/c/LineageOS/android_device_google_bonito/+/343466) patch on how to use the shim packages for your prebuilt libraries that require them.
324+
325+
### Hidden dlopen failed
326+
As dlopen errors occur only during runtime, some of the failures do not immediately show up or are even hidden from logging. We have hence come up with a library hook, `dlopen.so`, that you can place in LD_PRELOAD that shows all linker operations, helping us see which libraries are currently missing any symbols or even missing dependencies.
327+
328+
Here is a log of a device using [the library](https://review.lineageos.org/c/LineageOS/android_hardware_lineage_compat/+/346648):
329+
330+
```
331+
instantnoodlep / # LD_PRELOAD=dlopen.so /vendor/bin/hw/android.hardware.gnss\@2.1-service-qti
332+
dlopen(libnetd_client.so) -> 0x0, errno: dlopen failed: library "libnetd_client.so" not found
333+
dlopen(libgnss.so) -> 0xdc9f08e905187e63, errno: (null)
334+
dlopen(liblbs_core.so) -> 0x618992fa4f6a1e6d, errno: (null)
335+
dlopen(liblocdiagiface.so) -> 0x0, errno: dlopen failed: library "liblocdiagiface.so" not found
336+
dlopen(libloc_net_iface.so) -> 0x0, errno: dlopen failed: library "libloc_net_iface.so" not found
337+
dlopen([email protected]) -> 0xe8b09305c7a1c55f, errno: (null)
338+
dlopen(libdataitems.so) -> 0xc4ba0f7c15946aef, errno: (null)
339+
dlopen([email protected]) -> 0xd950565f49bbcc01, errno: (null)
340+
dlopen(libgnss.so) -> 0xdc9f08e905187e63, errno: (null)
341+
dlopen(libxtadapter.so) -> 0x39eb3dbc835592b5, errno: (null)
342+
dlopen(libcdfw.so) -> 0x59b12f3c0b5e3e1b, errno: (null)
343+
dlopen(libloc_socket.so) -> 0x2229c8abec54dac9, errno: (null)
344+
```
345+
346+
## Bonus
347+
While working on non private system apps like Aperture, you could use [Android Studio](https://developer.android.com/studio) for easier testing and debugging!

0 commit comments

Comments
 (0)