Skip to content

config-linux: Add security considerations for linux.devices raw block I/O #1313

@APEvul-cyber

Description

@APEvul-cyber

Problem

The linux.devices and linux.resources.devices sections of config-linux.md describe how to configure device access for containers but include no security guidance about the implications of granting r (read) or w (write) access to block devices.

When a block device is configured in linux.devices and linux.resources.devices grants access: "rw" or "rwm", the container process can perform raw block-level I/O via standard read() and write() syscalls — regardless of the process capabilities set.

Specifically:

  • read() on a block device fd does not require CAP_SYS_RAWIO or any other capability
  • write() on a block device fd does not require CAP_SYS_RAWIO or any other capability
  • mount() correctly requires CAP_SYS_ADMIN

This means a container with a block device entry and only the default unprivileged capability set can read the entire contents of the host device (including all filesystem data, credentials, and keys) and potentially write to it (modifying or corrupting the host filesystem at the block level).

The specification does not document this behavior. As a result, runtime implementors and container orchestrators may assume that Linux capabilities serve as a security boundary for device access — which they do for mount(), but not for raw I/O.

Impact

The gap affects the entire container ecosystem that consumes this specification:

  • Container runtimes (runc, crun, youki) faithfully implement the spec and create device nodes with the specified access — no additional validation is performed on block devices
  • Container orchestrators (containerd, CRI-O, Docker) populate linux.devices based on higher-level configuration (--device, device plugins, hostPath BlockDevice) without security warnings
  • Kubernetes exposes block devices via hostPath type: BlockDevice, device plugins (GPU, FPGA, SR-IOV), and CSI raw block volumes — all of which result in linux.devices entries
  • Security tooling (admission controllers, policy engines) commonly audit capabilities and seccomp profiles but rarely inspect device cgroup rules for block device access

Verified behavior

Tested with runc 1.3.4 on cgroup v2 (eBPF device controller), default seccomp profile active:

# Container capabilities (default set, no SYS_ADMIN, no SYS_RAWIO):
CapPrm: 0x00000000a80425fb

# mount() — correctly blocked:
mount: permission denied (are you root?)

# Raw read via dd — succeeds, extracts host /etc/shadow:
$ dd if=/dev/hostdisk bs=4096 count=38400 2>/dev/null | strings | grep '^root:'
root:x:0:0:root:/root:/bin/sh
root:*::0:::::

# Raw write via dd — succeeds:
$ echo TEST | dd of=/dev/hostdisk bs=1 seek=153000000 count=5 conv=notrunc
5+0 records in
5+0 records out

Proposed Changes

1. Add security note to linux.devices section

After the existing description of linux.devices, add:

Security consideration: Creating a block device node (type "b") and granting r or w access in linux.resources.devices allows the container process to perform raw block-level I/O on the underlying host device using standard read() and write() syscalls. These syscalls are not gated by any Linux capability — device cgroup permission and Unix file permissions are the only controls. Removing CAP_SYS_ADMIN prevents mount() but does not prevent raw data access.

Runtimes and orchestrators SHOULD warn when block devices are configured with read or write access. Effective defenses include user namespaces (remapped UID 0 cannot open root-owned device nodes) and running container processes as non-root users.

2. Add note to linux.resources.devices access field

After the access field description, add:

Note: The r and w permissions control access through the device cgroup controller (or eBPF device program on cgroup v2). When applied to block devices, these permissions enable raw block-level I/O that is independent of Linux capabilities. CAP_SYS_RAWIO is not required for read() or write() on block device file descriptors.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions