Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 80 additions & 8 deletions .github/workflows/build-kbox.yml
Original file line number Diff line number Diff line change
@@ -1,15 +1,16 @@
# Build kbox and run full test suite.
# Zero root required -- everything runs as an unprivileged user.
#
# Parallelism (4 independent jobs, 1 sequential):
# commit-hygiene -- Change-Id + subject format (needs full history)
# lint -- clang-format, newline, security, cppcheck (one apt install)
# unit-tests -- no LKL dependency, ASAN/UBSAN
# build-kbox -- fetches LKL, compiles kbox + guest/stress bins, builds rootfs
# integration -- needs build-kbox artifacts, runs integration + stress tests
# Parallelism (5 independent jobs, 1 sequential):
# commit-hygiene -- Change-Id + subject format (needs full history)
# lint -- clang-format, newline, security, cppcheck (one apt install)
# unit-tests -- no LKL dependency, ASAN/UBSAN
# build-kbox -- fetches LKL, compiles kbox + guest/stress bins, builds rootfs
# oci-image-import -- pulls nginx:alpine via mkrootfs.sh --image, validates the
# libext2fs-based ownership rewrite
# integration -- needs build-kbox artifacts, runs integration + stress tests
#
# commit-hygiene, lint, unit-tests, and build-kbox run in parallel.
# integration-tests waits for build-kbox only.
# All independent jobs run in parallel. integration-tests waits for build-kbox only.
name: Build and Test

on:
Expand Down Expand Up @@ -146,6 +147,77 @@ jobs:
tests/stress/*
!tests/stress/*.c

# ---- OCI image import: pull nginx:alpine, validate ownership rewrite ----
# Exercises scripts/oci-pull.py + tools/oci-chown end-to-end. nginx:alpine
# has multi-layer pulls and a /etc/passwd entry for "nginx" (uid=101) even
# though its tar headers are 0:0; the rewrite must restore the on-disk
# owner to 0 (vs the invoking user's UID that mke2fs -d would inherit).
oci-image-import:
runs-on: ubuntu-24.04
timeout-minutes: 5
steps:
- name: Checkout
uses: actions/checkout@v6

- name: Cache apt packages
uses: actions/cache@v5
with:
path: ~/apt-cache
key: apt-oci-${{ runner.os }}-${{ hashFiles('.github/workflows/build-kbox.yml') }}
- name: Install dependencies
run: |
mkdir -p ~/apt-cache
sudo apt-get update
sudo apt-get install -y -o Dir::Cache::Archives=$HOME/apt-cache \
e2fsprogs libext2fs-dev

- name: Build oci-chown helper
run: make -C tools/oci-chown

- name: Pull nginx:alpine with --rewrite-uid
run: |
# pipefail: don't let `tee` mask a mkrootfs.sh failure.
# nounset: catch any typo'd $VAR before it silently expands to "".
set -euo pipefail
ROOTFS=/tmp/nginx-oci.ext4 ./scripts/mkrootfs.sh \
--image=docker://nginx:alpine \
--rewrite-uid \
256 2>&1 | tee /tmp/mkrootfs.log
Comment thread
cubic-dev-ai[bot] marked this conversation as resolved.

# Helper must report at least one inode rewrite; without it the
# mke2fs-inherited invoking-user UID would silently leak through.
if ! grep -qE "rewrote [1-9][0-9]* inode" /tmp/mkrootfs.log; then
echo "::error::oci-chown reported no inode rewrites"
exit 1
fi

- name: Verify ownership round-trip
run: |
# /etc/nginx/nginx.conf is a stable file in nginx:alpine. Its tar
# header is uid=0/gid=0, so a successful rewrite ends at User=0.
# Without --rewrite-uid the inode would carry the runner's UID.
# debugfs format: "User: N Group: M Project: P ..."
STAT=$(printf "stat /etc/nginx/nginx.conf\n" \
| debugfs /tmp/nginx-oci.ext4 2>/dev/null \
| awk '/^User:/ {print $2 " " $4; exit}')
OWNER=$(echo "$STAT" | awk '{print $1}')
GROUP=$(echo "$STAT" | awk '{print $2}')
echo "/etc/nginx/nginx.conf User=$OWNER Group=$GROUP"
if [ "$OWNER" != "0" ] || [ "$GROUP" != "0" ]; then
echo "::error::expected User=0/Group=0, got User=$OWNER/Group=$GROUP"
exit 1
fi

# Mode bits must survive the rewrite. /usr/sbin/nginx is +x (mode 0755).
MODE=$(printf "stat /usr/sbin/nginx\n" \
| debugfs /tmp/nginx-oci.ext4 2>/dev/null \
| awk '/^Inode:/ {for (i=1;i<=NF;i++) if ($i=="Mode:") print $(i+1); exit}')
echo "/usr/sbin/nginx Mode=$MODE"
if [ "$MODE" != "0755" ]; then
echo "::error::expected /usr/sbin/nginx mode 0755, got $MODE"
exit 1
fi

# ---- Integration + stress tests: needs kbox binary + rootfs ----
integration-tests:
needs: build-kbox
Expand Down
19 changes: 19 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,24 @@ make rootfs # host arch
make ARCH=aarch64 CC=aarch64-linux-gnu-gcc rootfs # cross
```

To pull from a Docker [v2 registry](https://distribution.github.io/distribution/spec/api/)
instead, pass `--image=docker://...` to the rootfs script (no
[`docker`](https://www.docker.com/) daemon required, `python3` stdlib
only):

```bash
ROOTFS=alpine.ext4 ./scripts/mkrootfs.sh --image=docker://alpine:3.21
ROOTFS=node.ext4 ./scripts/mkrootfs.sh --image=docker://node:alpine \
--rewrite-uid --size=512
```

`--rewrite-uid` restores OCI tar-header ownership into the ext4 inodes
via [`tools/oci-chown`](tools/oci-chown/) (built on demand, links
against [`libext2fs`](https://e2fsprogs.sourceforge.net/)) and is
required for [`--root-id`](#selecting-an-interception-mode) guests.
See [docs/oci-image-import.md](docs/oci-image-import.md) for the full
pipeline, layer cache, and threat model.

Run a guest binary:

```bash
Expand Down Expand Up @@ -229,6 +247,7 @@ Run `./kbox --help` for the full option list.
| Three syscall interception tiers and auto selection | [docs/interception-tiers.md](docs/interception-tiers.md) |
| Internal design: dispatch routing, FD table, shadow FDs, ABI translation | [docs/architecture.md](docs/architecture.md) |
| Threat model and deployment tiers | [docs/security-model.md](docs/security-model.md) |
| Building rootfs images from OCI registries | [docs/oci-image-import.md](docs/oci-image-import.md) |
| Using kbox as an AI agent execution layer | [docs/ai-agents.md](docs/ai-agents.md) |
| Web dashboard and telemetry endpoints | [docs/web-observatory.md](docs/web-observatory.md) |
| GDB workflow and helper commands | [docs/gdb-workflow.md](docs/gdb-workflow.md) |
Expand Down
196 changes: 196 additions & 0 deletions docs/oci-image-import.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,196 @@
# OCI image import

`kbox` can build a rootfs from any OCI image hosted on a Docker
[v2 registry](https://distribution.github.io/distribution/spec/api/),
not just the bundled Alpine minirootfs. Pass `--image=docker://...` to
[`scripts/mkrootfs.sh`](../scripts/mkrootfs.sh) and the script pulls the
image's manifest and layer blobs, applies them to a staging directory,
and feeds the result into `mke2fs -d` exactly like the Alpine path.

The implementation is rootless and depends only on `python3` (stdlib
only) and `e2fsprogs` (already required for `mke2fs`). The optional
`--rewrite-uid` flag adds an in-tree libext2fs helper
([`tools/oci-chown`](../tools/oci-chown/)) for restoring OCI tar-header
uid/gid/mode into ext4 inodes; it is built on demand and only required
when the rootfs will be used with `kbox --root-id`.

## Quick start

```bash
# Pull and build a rootfs from Docker Hub.
ROOTFS=alpine.ext4 ./scripts/mkrootfs.sh --image=docker://alpine:3.21

# Pin to a digest for reproducibility.
ROOTFS=alpine.ext4 ./scripts/mkrootfs.sh \
--image=docker://alpine@sha256:1832327faf04...

# Other registries (host:port supported).
ROOTFS=node.ext4 ./scripts/mkrootfs.sh \
--image=docker://node:alpine --size=512

# Restore OCI tar-header uid/gid/mode (required for --root-id).
ROOTFS=node.ext4 ./scripts/mkrootfs.sh \
--image=docker://node:alpine --rewrite-uid

# Run with kbox.
./kbox -S node.ext4 --root-id -- /usr/bin/node --version

# Manage the layer cache.
python3 ./scripts/oci-pull.py prune # wipe
python3 ./scripts/oci-pull.py prune --keep-bytes=2G # keep newest 2 GB
```

`--image` accepts:

- `docker://NAME[:TAG]` — `library/` prefix is implied for unscoped
Docker Hub names. Default tag is `latest`.
- `docker://REGISTRY/REPO[:TAG]` — non-Docker-Hub registries
(`quay.io/...`, `ghcr.io/...`, host:port).
- `docker://REPO@sha256:DIGEST` — pin to a content digest for
reproducibility. The digest may name either a single-arch image
manifest (in which case arch selection is a no-op) or an OCI index /
manifest list (in which case `oci-pull.py` still resolves it to the
matching `linux/<arch>` entry, just like a tag).

## Pipeline

```
--image=docker://...
┌─────────────────────────┐
│ scripts/oci-pull.py │ registry pull (urllib + bearer-token)
│ └ manifest list resolve│
│ └ layer fetch │ → cache: $XDG_CACHE_HOME/kbox/oci-layers
│ └ apply (whiteouts, │
│ hardlinks, symlinks)│
└────────────┬────────────┘
│ staging/ (+ optional manifest)
┌─────────────────────────┐
│ mke2fs -d staging │ rootless ext4 build
└────────────┬────────────┘
│ rootfs.ext4
▼ (with --rewrite-uid only)
┌─────────────────────────┐
│ tools/oci-chown │ libext2fs inode rewrite
│ └ ext2fs_namei │ → uid/gid/mode from OCI tar header
│ └ ext2fs_write_inode │
└─────────────────────────┘
```

## Layer cache

Layer blobs are content-addressed (sha256). The cache lives at
`$XDG_CACHE_HOME/kbox/oci-layers/<sha256>` (default
`~/.cache/kbox/oci-layers/`). Writes are atomic
(`tempfile.mkstemp` + `os.replace`); reads re-hash the file and drop
the entry on mismatch, so a corrupted cache is self-healing on the next
pull. Pass `--no-cache` to bypass entirely; use `oci-pull.py prune` to
clear or trim the cache.

## Rootless ownership rewrite (`--rewrite-uid`)

`mke2fs -d` inherits the invoking user's UID into ext4 inodes. Without
intervention, the guest sees its own files owned by a non-root UID,
which breaks setuid binaries and `apk add` install scripts when the
guest is launched with `kbox --root-id` (forces guest uid=0).

The fix runs in three steps:

1. `oci-pull.py --manifest=PATH` records `(uid, gid, mode)` per file,
directory, and hardlink during layer apply. Symlinks are excluded
(`lchown` semantics aren't load-bearing for the kbox guest); device
nodes are excluded (rootless cannot `mknod`; `/dev` is mounted at
guest runtime). Records are NUL-separated:
`<uid>\t<gid>\t<mode_octal>\t<path>\0`.
2. `mke2fs -d` builds the ext4 image with invoking-user ownership.
3. `tools/oci-chown <image> <manifest>` opens the image read-write via
libext2fs, resolves each path through `ext2fs_namei`, and rewrites
`i_uid` (16-bit lo + hi), `i_gid` (16-bit lo + hi), and `i_mode`
permission bits (preserving type bits like `S_IFREG`/`S_IFDIR`).
`ext2fs_close` flushes.

The helper is build-time-only: `tools/oci-chown/Makefile` links against
`-lext2fs -lcom_err` from `e2fsprogs`. The kbox supervisor build is
unchanged. `mkrootfs.sh --rewrite-uid` builds the helper on demand if
the binary isn't present.

When you don't pass `--root-id`, you don't need `--rewrite-uid`: the
guest runs as the host user, host UID matches inode UID, and ownership
is consistent.

## Hardening

The layer-apply path is the main attack surface (a malicious image
could try to write outside the staging directory). Defenses:

- **Path traversal.** `safe_join` strips leading `./` and `/` from tar
member names and rejects any `..` component before joining onto the
staging root.
- **Symlink Zip Slip.** Each member's parent directory is realpath-checked
against the staging root before any write, unlink, or `chmod`. A
malicious layer that creates `staging/etc -> /etc` and then writes
`etc/passwd` is rejected because `realpath(staging/etc) = /etc`
doesn't sit under `realpath(staging)`. File writes additionally pass
`O_NOFOLLOW`, and pre-existing symlinks at the destination are
unlinked before `os.makedirs`/`os.chmod`.
- **Hardlink source confinement.** Hardlink targets resolve through
`safe_join` (or its parent-relative variant for ustar-format
tarballs, which also runs through `safe_join` to reject `..` after
normalization). Absolute linknames are rejected. The resolved source
is realpath-checked against the staging root, and `os.link` is called
with `follow_symlinks=False` so a staging-resident symlink cannot
redirect the link to a host file.
- **DoS caps.** `MAX_MANIFEST_BYTES=4 MB`, `MAX_BLOB_BYTES=8 GB`,
`MAX_TAR_MEMBERS=500_000`. Layer descriptors with declared size over
the blob cap are rejected before download; the streaming loop also
caps actual bytes received.
- **Auth handling.** Bearer tokens are stripped on cross-host redirects
(compared by `netloc`, not just hostname, so a port-change redirect
on the same host also drops the token).
- **Digest verification.** Every blob is sha256-verified in flight;
cache reads re-verify before use. Corrupted entries are removed and
re-fetched.
- **Manifest validation in `oci-chown`.** Every record's uid/gid is
range-checked (`0 <= v <= UINT32_MAX`); leading sign or whitespace is
rejected. Mode bits outside `0o7777` are rejected. A manifest tail
missing the trailing NUL fails loudly with byte offset and record
index.

The supervisor itself never reads the OCI image at runtime (`kbox`
treats the resulting `.ext4` as opaque LKL filesystem state), so a
mis-applied layer cannot escalate beyond the staging directory.

## Limitations

- Only `docker://` URIs are accepted. No `oci://` (local OCI layout
directories), no `containers-storage://`. Adding a local OCI layout
reader is a small follow-up if needed.
- Mutable tags (e.g. `alpine:3.21`) are resolved on every pull. Pin to
a digest for reproducibility.
- Cosign / notation signature verification is not implemented; treat
this as a development tool, not a supply-chain control.
- Synthetic parent directories (created when a tarball omits explicit
parent dir entries) are recorded as `0:0:0755`. Well-formed OCI
layers always include explicit parent entries, so this only fires on
malformed input.
- zstd-compressed layer support depends on Python's `tarfile`
capability (Python 3.14+ on most distros).

## Acceptance

Verified end-to-end on x86_64 (`node1`) and aarch64 (`arm`) hosts:

| Scenario | Result |
|---|---|
| `alpine:3.21` round-trip | 185 inodes; busybox `User=0` after rewrite |
| `node:alpine` round-trip | ~2880 inodes; `/home/node` `User=1000` (uid round-trip) |
| Integration suite vs OCI rootfs | parity with baseline tarball rootfs |

A CI job
([`.github/workflows/build-kbox.yml`](../.github/workflows/build-kbox.yml))
runs the full pipeline against `nginx:alpine` on every PR, verifying
the helper reports a non-zero rewrite count and that
`/etc/nginx/nginx.conf` ends up `User=0/Group=0` and `/usr/sbin/nginx`
mode is `0755`.
Loading
Loading