Skip to content

Add crashdump example and include snapshot/scratch in core dumps#1264

Open
jsturtevant wants to merge 8 commits intohyperlight-dev:mainfrom
jsturtevant:crashdump
Open

Add crashdump example and include snapshot/scratch in core dumps#1264
jsturtevant wants to merge 8 commits intohyperlight-dev:mainfrom
jsturtevant:crashdump

Conversation

@jsturtevant
Copy link
Contributor

Core dumps generated by Hyperlight were missing the snapshot and scratch memory regions, making post-mortem debugging with GDB incomplete — register state was present but the guest's code, stack, heap, and page tables were absent. This adds the snapshot and scratch regions to the ELF core dump alongside any dynamically mapped regions so that GDB can show full backtraces, disassemble at the crash site, and inspect guest memory. A new runnable crashdump example demonstrates automatic dumps (VM-level faults), on-demand dumps (guest-caught exceptions), and per-sandbox opt-out, with GDB-based integration tests that validate register and memory content in the generated ELF files. The debugging docs are also updated with practical GDB commands for inspecting crash dumps.

@syntactically
Copy link
Member

@jsturtevant I believe you were working on another version of this that explicitly walks through the whole guest virtual address space?

@jsturtevant
Copy link
Contributor Author

@jsturtevant I believe you were working on another version of this that explicitly walks through the whole guest virtual address space?

Yes, been working with @dblnz on getting it working. Should have something today

@jsturtevant jsturtevant marked this pull request as ready for review March 3, 2026 23:14
@jsturtevant jsturtevant added the kind/bugfix For PRs that fix bugs label Mar 4, 2026
@jsturtevant jsturtevant force-pushed the crashdump branch 6 times, most recently from 21b7183 to 137b016 Compare March 4, 2026 05:52
dblnz
dblnz previously approved these changes Mar 4, 2026
Copy link
Contributor

@dblnz dblnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Having participated in these changes, I'd feel more comfortable if other people had a look/approved this 😄

ludfjig
ludfjig previously approved these changes Mar 5, 2026
Copy link
Contributor

@ludfjig ludfjig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great, i like the added tests!

Copy link
Member

@syntactically syntactically left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The direction of this looks good! I have a few comments, most of which are minor nits or just signposting things I thought were interesting points to discuss; the only thing that I'm actually particularly concerned about with the present state of the code is the mmap regions being present in the dump at VAs where they are not actually present in the sandbox.

At a higher level, it occurs to me that an alternative for crashdumps might be to just take a snapshot, and then write a separate utility that prepares a crashdump from a snapshot. That's probably worse, since in the future we might not be able to take a snapshot at arbitrary times, but only when I/O to the guest is quiet? However, I wanted to mention the idea to see what anyone else thought.

jsturtevant and others added 8 commits March 5, 2026 13:53
Co-authored-by: Doru Blânzeanu <dblnz@pm.me>
Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
The runtime config was passed by reference into set_up_hypervisor_partition then
immediately cloned, but no caller needs it afterward so it is now passed by
value. The entry point field uses Option<u64> instead of a bare zero default so
a missing value is detectable rather than silently producing a bogus AT_ENTRY.

Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
generate_crashdump_to_dir accepts the output directory
as a parameter instead of requiring callers to set the
HYPERLIGHT_CORE_DUMP_DIR environment variable. This
removes the need for unsafe std::env::set_var in tests
while preserving the existing env-var fallback path
for the automatic dump and the no-argument generate_crashdump method.

Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bugfix For PRs that fix bugs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants