-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checkpoint fails for containers with deleted-but-open files in --overlay2=none mode #11425
Comments
Yeah this is a known issue. cc @fvoznika |
Yeah, I think this is a documented gap - curious though whether it's easy to fix or swallow in some way? |
The error you are seeing is coming from gvisor/pkg/sentry/fsimpl/gofer/save_restore.go Lines 131 to 135 in f6b843d
From the 5 applications you have listed above, it seems that all such crashes are coming from the rootfs. (Note that /tmp is also considered part of rootfs if the container image has a non-empty /tmp.) Could you confirm if you are using rootfs overlay? runsc enables it by default. You have to explicitly turn it off with Note that this will be harder to support with non-rootfs gofer mounts (like bind mounts). |
We have --overlay2=none set as it is helpful for us to expose and modify the file diffs in the container's filesystem. |
I verified that this issue does not occur with rootfs overlay. If rootfs overlay is disabled, then changes to the rootfs are propagated to the host. So are you migrating the rootfs after checkpoint to the restore site? Supporting deleted file restore in gofer filesystem might be tricky. On restore, we may need to create the file on the host, fill it with the file contents, open an FD to it and then delete it again. cc @nixprime any better ideas?
When using rootfs overlay, the filesystem diff is stored in gVisor tmpfs. How do you want to modify this diff? Maybe it is possible to restore the container, modify the filesystem and then checkpoint again? With rootfs overlay, you don't have to worry about filesystem migration and it also prevents issues like this one (deleted file FD from host). |
Here are some usecases enabled by --overlay=none:
|
Why do you need to checkpoint the container in this case? Consider
Sorry could you define what a "filesystem only checkpoint" is? If you do not need the gVisor checkpoint image, then no need to checkpoint the container. You can just pause it, take the "filesystem checkpoint" and use it. |
yeah what you said make sense. But at the same time we still have the use case where we need to support the normal gvisor checkpoint image. |
Description
We're seeing issues when checkpointing some workloads where applications in the container are keeping file descriptor open to a deleted file. It seems like this is a common pattern in many third-party applications, making it a significant operational issue. Would it be possible to support this, or at least provide a flag to make this more of a "warning" rather than aborting the checkpoint entirely?
Error Message:
encoding error: gofer.dentry(...).beforeSave: deleted and invalidated dentries can't be restored
Example Paths:
We've seen this with various applications, including:
/tmp/.org.chromium.Chromium.uG5Ddr
root/.xpaint/tmp/XPaint-oEMdmH
root/.npm/_logs/2025-01-30T17_48_32_236Z-debug-0.log
root/startup
/tmp/language-exchange
Here is a copy of the stacktrace
Steps to reproduce
No response
runsc version
docker version (if using docker)
uname
No response
kubectl (if using Kubernetes)
repo state (if built from source)
No response
runsc debug logs (if available)
The text was updated successfully, but these errors were encountered: