Skip to content

feat(sandbox): seccomp-notify DNS-pinned allowlist for Platform mode#17

Closed
Ladas wants to merge 3 commits into
mvp-v2from
feat/seccomp-notify
Closed

feat(sandbox): seccomp-notify DNS-pinned allowlist for Platform mode#17
Ladas wants to merge 3 commits into
mvp-v2from
feat/seccomp-notify

Conversation

@Ladas

@Ladas Ladas commented Jun 12, 2026

Copy link
Copy Markdown

Summary

Foundation for kernel-level connect() interception using seccomp-notify.
Adds DnsPinnedAllowlist module: resolves allowed domains to IPs at
sandbox creation, freezes them for the session (prevents DNS rebinding).

The notification event loop and on-behalf-of operations (pidfd_getfd)
will be wired once OPA policy integration is complete.

Depends on: #16 (Landlock TCP port restriction) → #15 (Platform mode base)

2 files, +135 lines. 820 tests pass, clippy clean.

What this PR adds

  • DnsPinnedAllowlist: resolve domains, pin IPs, check connect targets
  • Loopback always allowed (proxy address)
  • 4 unit tests
  • Full rustdoc (architecture, TOCTOU safety, requirements, references)

What's NOT in this PR (follow-up)

  • seccomp filter installation (SECCOMP_FILTER_FLAG_NEW_LISTENER)
  • Notification event loop (async read from notification fd)
  • On-behalf-of connect via pidfd_getfd()
  • Fork-based supervisor architecture in lib.rs
  • Integration with OPA network policies

Ref: NVIDIA#899

Assisted-By: Claude Code

@Ladas Ladas force-pushed the feat/seccomp-notify branch 2 times, most recently from 408aa3b to 2446c42 Compare June 12, 2026 16:14
@Ladas Ladas force-pushed the feat/landlock-tcp-port branch from 9ec5718 to 179d108 Compare June 12, 2026 16:26
Ladas added a commit that referenced this pull request Jun 12, 2026
Add kernel-level network syscall interception using SECCOMP_RET_USER_NOTIF
for Platform mode. Provides mandatory, syscall-level enforcement without
any capabilities.

DnsPinnedAllowlist: resolve domains to IPs at sandbox creation, freeze
for session lifetime (DNS rebinding prevention).

BPF filter intercepts: connect, sendto, sendmsg, recvfrom, recvmsg,
bind. Validates AUDIT_ARCH to prevent x32/compat ABI bypass.

Linux syscall wrappers: notification fd ioctls, pidfd_open/pidfd_getfd
for on-behalf-of operations (TOCTOU-safe), read_process_memory with
read_exact (no short reads), sockaddr parser (correct endianness for
sa_family, port, flowinfo), verify_socket_fd (mitigates fd-swap race),
deny/allow_connect response helpers.

Code review fixes applied across all PRs:
- PR #15: gateway propagates network_enforcement to DriverSandboxSpec
- PR #15: driver uses typed enum comparison (not magic integer)
- PR #16: saturating_sub prevents underflow in Landlock skipped count
- PR #16: warn!() on TCP port restriction failure (was debug)
- PR #17: BPF arch check, recvfrom/recvmsg/bind interception,
  verify_socket_fd, read_exact, allow_connect rename, flowinfo
  endianness, safety comments on all unsafe blocks

8 tests. Compiles, 949 tests pass, clippy clean.

Ref: NVIDIA#899
@Ladas Ladas force-pushed the feat/seccomp-notify branch from 2446c42 to 6078a8e Compare June 12, 2026 16:28
@Ladas Ladas force-pushed the feat/landlock-tcp-port branch from 179d108 to 59d148a Compare June 16, 2026 14:06
@Ladas Ladas force-pushed the feat/seccomp-notify branch from 6078a8e to 254154b Compare June 16, 2026 14:53
@Ladas Ladas force-pushed the feat/landlock-tcp-port branch from 59d148a to 8354d68 Compare June 16, 2026 15:37
@Ladas Ladas force-pushed the feat/seccomp-notify branch from 254154b to e16b82b Compare June 17, 2026 06:31
@Ladas Ladas force-pushed the feat/landlock-tcp-port branch from 8354d68 to 0846c55 Compare June 17, 2026 14:04
@Ladas Ladas force-pushed the feat/seccomp-notify branch from e16b82b to ea5b495 Compare June 17, 2026 14:27
@Ladas Ladas force-pushed the feat/landlock-tcp-port branch from 0846c55 to 15e0993 Compare June 17, 2026 15:13
@Ladas Ladas force-pushed the feat/seccomp-notify branch from ea5b495 to df5c661 Compare June 17, 2026 15:14
@Ladas

Ladas commented Jun 19, 2026

Copy link
Copy Markdown
Author

/ok

@Ladas

Ladas commented Jun 19, 2026

Copy link
Copy Markdown
Author

/ok to test

@Ladas Ladas changed the base branch from feat/landlock-tcp-port to mvp-v2 June 23, 2026 13:12
@Ladas Ladas force-pushed the feat/seccomp-notify branch 6 times, most recently from 93452fe to d57e999 Compare June 24, 2026 14:56
Ladas added 3 commits June 24, 2026 17:12
Add NetworkMode::Platform for running the supervisor without elevated
capabilities on Kubernetes platforms enforcing the restricted Pod
Security Standard (including OpenShift restricted-v2 SCC).

Platform Mode keeps Landlock filesystem isolation, seccomp syscall
filtering, OPA policy evaluation, credential injection, and L7
inspection via a loopback CONNECT proxy. It replaces the network
namespace (which requires CAP_SYS_ADMIN + CAP_NET_ADMIN) with:

- Loopback proxy binding (127.0.0.1 instead of veth interface)
- K8s driver: zero capabilities, drop ALL, non-root UID
- seccomp: block SOCK_DGRAM (UDP) on AF_INET/AF_INET6 to match
  the nftables UDP reject in namespace mode -- the proxy resolves
  DNS on behalf of the agent, so UDP is not needed
- Landlock scope: restrict abstract Unix sockets and signals
  (ABI v5+, BestEffort degrades on older kernels)

Security parity with namespace mode:

| Attack                 | Namespace mode         | Platform mode            |
|------------------------|------------------------|--------------------------|
| TCP bypass proxy       | nftables REJECT        | Landlock port 3128 only  |
| UDP exfiltration       | nftables REJECT        | seccomp SOCK_DGRAM block |
| DNS tunneling          | no UDP accept rule     | no SOCK_DGRAM            |
| Abstract Unix sockets  | netns isolation        | Landlock scope           |
| Signals to supervisor  | N/A (same netns)       | Landlock scope           |
| Container escape       | Risk (CAP_SYS_ADMIN)   | Impossible (zero caps)   |

Remaining gap: Landlock NetPort allows port 3128 on any IP (not just
loopback). Mitigate with egress NetworkPolicy denying all sandbox pod
egress -- loopback traffic is unaffected by NetworkPolicy.

Proto: add NetworkEnforcementMode enum and field to SandboxPolicy
and DriverSandboxSpec. Default NAMESPACE (0) preserves existing
behavior; PLATFORM (1) activates the new mode.

Signed-off-by: Ladislav Smola <lsmola@redhat.com>
Add Landlock ABI v4 TCP connect restriction for Platform mode. When the
kernel supports ABI v4, only the proxy port (default 3128) is allowed
for outbound TCP connections. On older kernels, BestEffort compat level
silently degrades -- the rule has no effect but the proxy still works
cooperatively.

Both handle_access(ConnectTcp) and add_rule(NetPort) use the ? operator
since BestEffort guarantees they succeed on all kernel versions.

Signed-off-by: Ladislav Smola <lsmola@redhat.com>
Add kernel-level connect() interception using SECCOMP_RET_USER_NOTIF.
The supervisor intercepts network syscalls, reads the destination
sockaddr from the child's memory via /proc/pid/mem, evaluates it
against a DNS-pinned allowlist, and either performs the operation on
behalf of the child via pidfd_getfd() or denies it with EPERM.

Components:
- DnsPinnedAllowlist: resolve domains to IPs at sandbox creation,
  freeze for session lifetime to prevent DNS rebinding
- BPF filter with AUDIT_ARCH validation for connect/sendto/sendmsg/
  recvfrom/recvmsg/bind syscalls
- pidfd_open + pidfd_getfd for TOCTOU-safe on-behalf-of operations
- parse_sockaddr with correct endianness for IPv4/IPv6
- read_process_memory with read_exact for short-read safety

Known limitation: DnsPinnedAllowlist cannot handle wildcard domains
(*.example.com) because getaddrinfo does not support wildcards. Callers
must skip wildcard endpoints and rely on the proxy OPA glob.match()
for wildcard domain enforcement.

Signed-off-by: Ladislav Smola <lsmola@redhat.com>
@Ladas Ladas force-pushed the feat/seccomp-notify branch from d57e999 to 8f124bd Compare June 24, 2026 15:13
@Ladas

Ladas commented Jun 24, 2026

Copy link
Copy Markdown
Author

Closing for now. seccomp-notify is optional defense-in-depth -- Platform Mode (PR #15) with UDP seccomp block and Landlock scope achieves security parity with namespace mode without it. Can revisit as a follow-up for standalone deployments without claw-proxy.

@Ladas Ladas closed this Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant