Description
Expected behavior: open_tree(2) with OPEN_TREE_CLONE on a mount that has already been detached via umount2(MNT_DETACH) should fail with EINVAL, matching Linux.
Observed behavior: gVisor allows the clone to succeed. The detached mount can then be re-attached anywhere via move_mount(2), fully reversing the umount2(MNT_DETACH) and restoring access to a filesystem that was intentionally detached.
Root cause: CloneTreeToAnonNS() in pkg/sentry/vfs/namespace.go checks namespace membership (fromMnt.ns != taskMountNs) but never checks fromMnt.umounted. After MNT_DETACH, mount.umounted is set to true but mount.ns is not cleared until the last reference is dropped. When the detached mount's .ns still equals the current task's mount namespace (the common case for a detached-but-referenced mount), the validation condition evaluates to false and the function falls through to cloneMount with no umounted guard at all:
fsName := fromMnt.Filesystem().FilesystemType().Name()
if fromMnt.ns != taskMountNs &&
(fromMnt.ns == nil || !fromMnt.ns.anonCanBeOperatedOn(taskMountNs)) &&
fsName != nsfsName {
return nil, linuxerr.EINVAL
}
// fromMnt.umounted is never checked here.
This is the same class of bug fixed in commit 6a112c60a257dadac59962e0bc9e9b5aee70b5b6 (2024) for BindAt, ConnectMountAt, MoveMountAt, and propagateMount — all of which now check .umounted via the validInMountNS() helper introduced by that fix. CloneTreeToAnonNS was added separately as part of the open_tree(2) / new VFS mount API and was not covered.
Proposed fix:
- if fromMnt.ns != taskMountNs && (fromMnt.ns == nil || !fromMnt.ns.anonCanBeOperatedOn(taskMountNs)) && fsName != nsfsName {
+ if fromMnt.umounted || (fromMnt.ns != taskMountNs && (fromMnt.ns == nil || !fromMnt.ns.anonCanBeOperatedOn(taskMountNs)) && fsName != nsfsName) {
return nil, linuxerr.EINVAL
}
Will follow up with a PR containing this fix plus a regression test (DetachedMountOpenTreeCloneFails), analogous to DetachedMountBindFails added by the 2024 fix.
Prior disclosure: Reported to gvisor-security@googlegroups.com on 2026-06-14, confirmed as a real, gVisor-specific behavioral deviation, classified Integrity/SandboxUser (confined to the sandbox, not a sandbox escape — does not qualify for VRP reward, but is a real bug worth fixing per the maintainer's own reply).
Steps to reproduce
Requires CAP_SYS_ADMIN (standard for container root).
- Mount tmpfs:
mount("tmpfs", "/tmp/A", "tmpfs", 0, "size=1m");
- Hold an
O_PATH reference (keeps the mount alive after detach):
int fd = open("/tmp/A", O_PATH | O_DIRECTORY);
- Detach the mount:
umount2("/tmp/A", MNT_DETACH);
/tmp/A is no longer accessible by path. mount.umounted is now true,
but mount.ns is unchanged.
- Clone the detached mount via the held fd:
long tree_fd = syscall(__NR_open_tree, fd, "",
OPEN_TREE_CLONE | AT_EMPTY_PATH);
- Linux: fails with
errno = EINVAL
- gVisor: succeeds, returns a valid
tree_fd
- Re-attach the cloned (detached) filesystem at a new path:
syscall(__NR_move_mount, (int)tree_fd, "", AT_FDCWD, "/tmp/B",
MOVE_MOUNT_F_EMPTY_PATH);
On gVisor this succeeds, and files from the originally-unmounted tmpfs
(e.g. a sentinel file written before step 3) are readable again at
/tmp/B.
A complete, self-contained PoC (poc_gvisor_detached_clone.c) that performs all five steps and prints pass/fail at each stage is attached to the original email report and will be linked in the follow-up PR.
runsc version
runsc version: master / HEAD as of 2026-06-14
Reproduced against: pkg/sentry/vfs/namespace.go @ HEAD
sha256: cad58c41d16f7bb0e99eb64da64ffe1489e86682cd053b4d8f3dfd7b769ae640
Confirmed Linux baseline behavior (EINVAL) on standard Linux 6.x
docker version (if using docker)
uname
No response
kubectl (if using Kubernetes)
repo state (if built from source)
No response
runsc debug logs (if available)
Description
Expected behavior:
open_tree(2)withOPEN_TREE_CLONEon a mount that has already been detached viaumount2(MNT_DETACH)should fail withEINVAL, matching Linux.Observed behavior: gVisor allows the clone to succeed. The detached mount can then be re-attached anywhere via
move_mount(2), fully reversing theumount2(MNT_DETACH)and restoring access to a filesystem that was intentionally detached.Root cause:
CloneTreeToAnonNS()inpkg/sentry/vfs/namespace.gochecks namespace membership (fromMnt.ns != taskMountNs) but never checksfromMnt.umounted. AfterMNT_DETACH,mount.umountedis set totruebutmount.nsis not cleared until the last reference is dropped. When the detached mount's.nsstill equals the current task's mount namespace (the common case for a detached-but-referenced mount), the validation condition evaluates tofalseand the function falls through tocloneMountwith noumountedguard at all:This is the same class of bug fixed in commit
6a112c60a257dadac59962e0bc9e9b5aee70b5b6(2024) forBindAt,ConnectMountAt,MoveMountAt, andpropagateMount— all of which now check.umountedvia thevalidInMountNS()helper introduced by that fix.CloneTreeToAnonNSwas added separately as part of theopen_tree(2)/ new VFS mount API and was not covered.Proposed fix:
Will follow up with a PR containing this fix plus a regression test (
DetachedMountOpenTreeCloneFails), analogous toDetachedMountBindFailsadded by the 2024 fix.Prior disclosure: Reported to gvisor-security@googlegroups.com on 2026-06-14, confirmed as a real, gVisor-specific behavioral deviation, classified Integrity/SandboxUser (confined to the sandbox, not a sandbox escape — does not qualify for VRP reward, but is a real bug worth fixing per the maintainer's own reply).
Steps to reproduce
Requires
CAP_SYS_ADMIN(standard for container root).O_PATHreference (keeps the mount alive after detach):/tmp/Ais no longer accessible by path.mount.umountedis nowtrue,but
mount.nsis unchanged.errno = EINVALtree_fdOn gVisor this succeeds, and files from the originally-unmounted tmpfs
(e.g. a sentinel file written before step 3) are readable again at
/tmp/B.A complete, self-contained PoC (
poc_gvisor_detached_clone.c) that performs all five steps and prints pass/fail at each stage is attached to the original email report and will be linked in the follow-up PR.runsc version
docker version (if using docker)
uname
No response
kubectl (if using Kubernetes)
repo state (if built from source)
No response
runsc debug logs (if available)