Skip to content

blockdev: Fix loopback device resource leak on signal interruption #1402

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gursewak1997
Copy link
Contributor

This commit implements issue #799 by creating a signal-safe cleanup helper for loopback devices to prevent resource leaks when bootc install --via-loopback is interrupted by signals like SIGINT (Ctrl-C).

The solution uses an 'out-of-process drop' helper that:

  • Forks a cleanup helper process when creating a loopback device
  • Uses PR_SET_PDEATHSIG to detect when the parent process dies
  • Masks most signals to avoid being killed accidentally
  • Automatically cleans up leaked loopback devices if the parent dies
  • Gracefully terminates when the parent performs normal cleanup

This prevents the common issue where interrupting bootc install --via-loopback with Ctrl-C would leave /dev/loopN devices allocated on the system.

Fixes: #799

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a signal-safe cleanup mechanism for loopback devices to prevent resource leaks on signal interruption. It uses an out-of-process helper to clean up leaked loopback devices. I've added comments to enhance error logging for better debugging.

@gursewak1997 gursewak1997 force-pushed the bootc-799 branch 6 times, most recently from 70c2b77 to a4ab303 Compare July 11, 2025 05:59
@cgwalters
Copy link
Collaborator

Thanks for working on this! While it will be a bit more awkward can you try doing it this way #799 (comment) - that should 100% avoid all the unsafe code.

Basically instead of a raw fork() (source of basically all the unsafe in general) we fork+exec our own binary /proc/self/exe. Look at e.g. reexec_with_guardenv.

Copy link
Collaborator

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looking closer

anyhow::bail!("This function should only be called as a cleanup helper");
}

// Close stdin, stdout, stderr and redirect to /dev/null
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is better done in the parent process setup above

.context("Failed to read /proc/self/exe")?;

// Create the helper process using exec
let mut cmd = Command::new(self_exe);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

// Set up death signal notification - we want to be notified when parent dies
unsafe {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


// Set up death signal notification - we want to be notified when parent dies
unsafe {
if libc::prctl(libc::PR_SET_PDEATHSIG, libc::SIGUSR1) != 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems cleaner to me to use SIGTERM and react to that

}
}

// Mask most signals to avoid being killed accidentally
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So https://docs.rs/tokio/latest/tokio/signal/index.html is one way to handle this in a safe way (will require making the function async)

I think the tokio API will replace about 50 lines of unsafe code with 5 lines of safe code.


match status {
Ok(exit_status) if exit_status.success() => {
// Write to stderr since we closed stdout
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm my vote here is probably to not inherit stderr at all; if we write to stderr then we have the possibility to intermix the child process writes with the parent's.

One option is to explicitly log to the systemd journal.

I guess speaking of systemd...a whole possibility I hadn't considered until just now is that we fork off via systemd-run. That would have some nice advantages but will be trickier to get right the lifecycle binding, so let's leave that for the future.

(There's this whole giant topic in bootc overall defaulting to running via systemd in some cases, which would similarly have a lot of advantages but be a big nontrivial change)

20961247
);
Ok(())
let data = fs::read_to_string("tests/fixtures/lsblk.json").unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks unrelated? I mean it's probably fine to do but let's break it into a separate commit

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs fixing, can you move this section of the change to a different PR?

Copy link
Collaborator

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looking a lot closer!

lib/src/cli.rs Outdated
device: String,
/// Parent process ID to monitor
#[clap(long)]
parent_pid: u32,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't necessary, with PDEATHSIG we automatically monitor the parent. It can be dropped.

"loopback-cleanup-helper",
"--device",
device_path,
"--parent-pid",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed

/// Handle to manage the cleanup helper process for loopback devices
struct LoopbackCleanupHandle {
/// Process ID of the cleanup helper
helper_pid: u32,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should actually hold the Child instance. And on Drop we should send it SIGTERM - note not kill but rustix kill with SIGTERM.


// Try to spawn cleanup helper process - if it fails, continue without it
let cleanup_handle = Self::spawn_cleanup_helper(dev.as_str())
.map_err(|e| {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm uncertain about ignoring errors here by default. It seems better to make it fatal.

// Kill the cleanup helper since we're cleaning up normally
if let Some(cleanup_handle) = self.cleanup_handle.take() {
// Kill the helper process since we're doing normal cleanup
let _ = std::process::Command::new("kill")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above, we can directly send a signal instead of invoking an external command.

Also, crucially by holding a reference to the Child we avoid pid reuse problems if the child exits early.

20961247
);
Ok(())
let data = fs::read_to_string("tests/fixtures/lsblk.json").unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs fixing, can you move this section of the change to a different PR?

Add fork+exec based cleanup helper to prevent loopback device leaks when
bootc install --via-loopback is interrupted by signals like SIGINT.

- Add loopback-cleanup-helper CLI subcommand
- Implement run_loopback_cleanup_helper() with PR_SET_PDEATHSIG
- Update LoopbackDevice to spawn cleanup helper process
- Add tests for spawn mechanism
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

drop loopback out-of-process
2 participants