Skip to content

Cpu pressure#9

Open
daandemeyer wants to merge 56 commits intomainfrom
cpu-pressure
Open

Cpu pressure#9
daandemeyer wants to merge 56 commits intomainfrom
cpu-pressure

Conversation

@daandemeyer
Copy link
Owner

No description provided.

@daandemeyer
Copy link
Owner Author

@claude review

@github-actions
Copy link

github-actions bot commented Mar 18, 2026

Claude review of PR #9 (dcdf8a8)

<!-- claude-pr-review -->

This PR extends the existing memory pressure monitoring infrastructure in sd-event and PID 1 to also support CPU and IO pressure, using Linux's PSI (Pressure Stall Information) interface. The refactoring approach is clean — the memory-pressure-specific code is generalized to work with a PressureResource enum, reducing duplication across all three resource types.

The PR is well structured across 7 commits:

  1. Test macro migration (prep work)
  2. Unprivileged test support (prep work)
  3. sd-event CPU pressure API
  4. PID 1 / core CPU pressure support
  5. IO pressure support (sd-event + core)
  6. Documentation update (MEMORY_PRESSURE.md → PRESSURE.md)
  7. Minor docs fix

Overall the code quality is high and the generalization is done consistently.

Suggestions

  • CPU auto mode missing cpu_accounting check (dismissed) — src/core/cgroup.hcgroup_context_want_pressure(PRESSURE_CPU) doesn't check c->cpu_accounting, unlike the memory and IO variants. The docs say auto enables "if CPU accounting is anyway enabled", but CPUAccounting=true alone won't trigger it.
  • Missing sanitize_environment entries for CPU/IO pressuresrc/core/manager.csanitize_environment() strips MEMORY_PRESSURE_WATCH/MEMORY_PRESSURE_WRITE but does not strip the corresponding CPU_PRESSURE_WATCH/CPU_PRESSURE_WRITE/IO_PRESSURE_WATCH/IO_PRESSURE_WRITE env vars. If PID 1 inherits these from its environment, they could leak into services.

Nits

  • Parser function namingsrc/core/load-fragment-gperf.gperf.inconfig_parse_memory_pressure_watch is reused for CPU/IO but the name still says "memory"
  • Unused callback parameter in test wrapperssrc/test/test-pressure.c — The wrapper functions ignore the callback parameter passed to them
  • Stale comment in cgroup.csrc/core/cgroup.c:2123 — Comment still says "try to open the memory pressure interface anew" but the code now sets up all pressure resource types
  • IO test file not cleaned upsrc/test/test-pressure.c:496/var/tmp/.io-pressure-test is created during the IO pressure test but never cleaned up afterward

<!-- claude-pr-review -->

Workflow run

@daandemeyer
Copy link
Owner Author

@claude review

1 similar comment
@daandemeyer
Copy link
Owner Author

@claude review

@daandemeyer
Copy link
Owner Author

@claude review

@@ -2110,7 +2116,8 @@ static int unit_update_cgroup(
* that PID 1 runs before init.scope is even set up. Hence, whenever init.scope is realized, let's
* try to open the memory pressure interface anew. */

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: nit: Stale comment: "try to open the memory pressure interface anew" should be updated to reflect that this now sets up all pressure resource types (memory, CPU, and IO), e.g. "try to open the pressure interfaces anew".

for (;;) {
_cleanup_close_ int fd = -EBADF;

fd = open("/var/tmp/.io-pressure-test", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0600);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: nit: /var/tmp/.io-pressure-test is created by the IO pressure child process but never cleaned up after the test (e.g., in outro() or via _cleanup_). Consider adding (void) unlink("/var/tmp/.io-pressure-test"); to the test cleanup.

@daandemeyer
Copy link
Owner Author

@claude review

{{type}}.MemoryPressureWatch, config_parse_memory_pressure_watch, 0, offsetof({{type}}, cgroup_context.pressure_watch[PRESSURE_MEMORY])
{{type}}.CPUPressureThresholdSec, config_parse_sec, 0, offsetof({{type}}, cgroup_context.pressure_threshold_usec[PRESSURE_CPU])
{{type}}.CPUPressureWatch, config_parse_memory_pressure_watch, 0, offsetof({{type}}, cgroup_context.pressure_watch[PRESSURE_CPU])
{{type}}.NFTSet, config_parse_cgroup_nft_set, NFT_SET_PARSE_CGROUP, offsetof({{type}}, cgroup_context)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: nit: config_parse_memory_pressure_watch is reused for CPUPressureWatch and IOPressureWatch, but the function name still says "memory". Consider renaming it to config_parse_pressure_watch for consistency, since it now handles all pressure resource types.

noxiouz and others added 12 commits March 18, 2026 22:55
Move the bit-rate computation out of property_get_bit_rates() in
networkd-link-bus.c into a standalone link_get_bit_rates() helper in
networkd-speed-meter.c, which already owns the speed meter state.

Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
Expose the speed meter transmit/receive rates as a BitRates struct in
the per-link JSON output and the io.systemd.Network Varlink IDL.

Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
Add an sd_varlink* parameter to both functions so Varlink callers can
receive a deferred reply once all async work completes, symmetrically
with the existing sd_bus_message* path.

Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
The handler calls dhcp4_renew() and logs a warning on failure.

Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
The handler calls sd_dhcp_server_forcerenew() if the server is running
and logs a warning on failure.

Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
The handler calls link_reconfigure_full() with UNCONDITIONALLY|CLEANLY
flags and defers the Varlink reply until reconfiguration completes.

Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
Bind networkd's reload handler to the generic io.systemd.service.Reload
method.

Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Yu Watanabe <watanabe.yu+github@gmail.com>

Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
The handler returns the JSON produced by link_build_json().

Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
This makes networkctl fetch bit-rate statistics and offered DHCP
leases via Link.Describe() method instead of D-Bus.

Co-authored-by: Yu Watanabe <watanabe.yu+github@gmail.com>

Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
noxiouz and others added 3 commits March 19, 2026 11:50
…enew Varlink methods

The D-Bus counterparts (bus_link_method_renew, bus_link_method_force_renew)
reject calls on unmanaged interfaces with BUS_ERROR_UNMANAGED_INTERFACE,
but the Varlink methods silently succeed. Add the same guard to both
Varlink methods, returning io.systemd.Network.Link.InterfaceUnmanaged,
and declare the error in the IDL.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Right now, if a user tries to pass VMADDR_CID_ANY to systemd-ssh-proxy,
an assert is triggered:

 $ ssh vsock%4294967295
 Assertion 'cid != VMADDR_CID_ANY' failed at src/ssh-generator/ssh-proxy.c:21, function process_vsock_cid(). Aborting.
 mm_receive_fd: recvmsg: expected received 1 got 0
 proxy dialer did not pass back a connection

This is becauase the value returned from vsock_parse_cid is not checked
before being passed to process_vsock_string. Add a check to prevent
that.
It has been observed on some systems[1] that ssh-issue may print out:

 Try contacting this VM's SSH server via 'ssh vsock%4294967295' from host.

i.e. it suggests connecting with VMADDR_CID_ANY, which is not valid. It
seems that IOCTL_VM_SOCKETS_GET_LOCAL_CID may return VMADDR_CID_ANY in
some cases, e.g. when vsock is not full initialized or so.

Treat VMADDR_CID_ANY as special in vsock_get_local_cid(), the same as
VMADDR_CID_LOCAL and VMADDR_CID_HOST, and return an error.

[1] https://launchpad.net/bugs/2145027
yuwata added 4 commits March 22, 2026 03:34
FORCERENEW message support has been disabled so long time for security
concern. Most other implementations of DHCP server/client neither
support FORCERENEW. Let's completely drop relevant code.
Call client_stop() on error and return 0 only on callback.
Normal non-callback functions should propagate errors.

This also makes client_initialize_time_events() use
event_reset_time_relative().
Oblivionsage and others added 24 commits March 21, 2026 20:34
dns_packet_unref() unconditionally unrefs p->more on every call,
even when n_ref > 1. But dns_packet_ref() doesn't ref p->more.
This means if a packet with a ->more chain gets ref'd and unref'd
multiple times, the chain gets freed too early while the parent
still holds a dangling pointer.

Move the p->more unref into the n_ref == 1 block so the chain
only gets cleaned up when the packet is actually being freed.
This adds the low-level io.systemd.Manager shutdown support. This
is (much) simpler than the logind one. It mimics dbus but uses
a shared helper for the simple cases.

Note that this is more restrictive than the dbus version. The
dbus version uses SD_BUS_VTABLE_CAPABILITY(CAP_SYS_BOOT) but
the varlink version uses varlink_check_privileged_peer(link).
This is mostly because I'm not sure how to do the equivalent
in a race-free way.

Thanks to Daan for suggesting this.
Extract a common helper varlink_log_caller() and use in the varlink
code when logging the caller of a method. It also logs the method
now that was tried (but failed) to be logged with log_notice just
like manager_log_caller() would do.

I was looking into modifying `manager_log_caller` instead and
accept a NULL pidref but could not log more than the method without
pidref and would make the manager_log_caller slightly less nice.

Thanks to keszybz for suggesting this.
This commit allows unset pidref when calling manager_log_caller().
With that we can log manager calls even if we cannot resolve the
caller. Currently when we cannot resolve the caller we are just
not logging anything. With this commit we at least log the call
(even though we don't know what caller it was).

Thanks to keszybz for the suggestion.
The commit 6d7cb9a fixes the assignment
of the these options when specified through SendOption=. However, it
breaks when specified through UserClass= or SendVendorOption=.

When UserClass= or SendVendorOption= is specified, the option length is
calculated from the sd_dhcp_client.user_class or .vendor_options. Hence,
we can use 0 for the length in that case.

Follow-up for 6d7cb9a.
systemd now has a system call wrapper that does a long series of #ifdef's to
differentiate between architectures and ABIs. This wrapper has two problems.

1. On mips, it needs to differentiate between O32, N32, N64 ABI. It does that
via a code block in src/include/override/sys/generate-syscall.py (and derived
files):

     76 #  elif defined(_MIPS_SIM)
     77 #    if _MIPS_SIM == _MIPS_SIM_ABI32
     78 #      define systemd_NR_{syscall} {nr_mipso32}
     79 #    elif _MIPS_SIM == _MIPS_SIM_NABI32
     80 #      define systemd_NR_{syscall} {nr_mips64n32}
     81 #    elif _MIPS_SIM == _MIPS_SIM_ABI64
     82 #      define systemd_NR_{syscall} {nr_mips64}
     83 #    else
     84 #      error "Unknown MIPS ABI"
     85 #    endif
     86 #  elif defined(__hppa__)

Now the _MIPS_SIM* constants stem from a vendor-specific header file sgidefs.h,
which is included with glibc, but not with musl. It is however always present
in the Linux kernel headers as asm/sgidefs.h ...

2. To work around this, the syscall wrapper already has a block

     47 #ifdef ARCH_MIPS
     48 #include <asm/sgidefs.h>
     49 #endif

Turns out, ARCH_MIPS is defined nowhere in Gentoo, neither on glibc nor on musl.
As a result the code (by accident, probably sgidefs.h is included transitively
somehow) works on glibc, but not on musl.

The simplest fix is to replace line 47 in the generator and the derived file
with

     47 #ifdef __mips__

Two other source code files require a similar fix since they rely on the
constants.

Bug: systemd#41239
Bug: https://bugs.gentoo.org/971376
Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
This is not actually used so it doesn't really matter in
practice and the fields are used anyway, but for cleanliness
fix it

Reported on yeswehack.com as YWH-PGM9780-165

Follow-up for 5699a16
mountfsd applies R/O when the varlink readOnly flag is set to false

Reported on yeswehack.com as YWH-PGM9780-164

Follow-up for 702a52f
networkctl previously called networkd over D-Bus for several operations.
This replaces all of those calls with Varlink, making it the sole IPC
mechanism between networkctl and networkd.

 New Varlink methods added to networkd:

- io.systemd.Network.Link (new sub-interface for link-specific
operations):

- Link.Describe — replaces Link.BitRates + DHCPServer.Leases D-Bus
properties
- Link.Up — replaces Link.SetUp D-Bus method (moved from
**io.systemd.Network.LinkUp**)
- Link.Down — replaces Link.SetDown D-Bus method (moved from
**io.systemd.Network.LinkDown**)
      - Link.Renew  — replaces Link.Renew D-Bus method
      - Link.ForceRenew   — replaces Link.ForceRenew D-Bus method
      - Link.Reconfigure  — replaces Link.Reconfigure D-Bus method

  - io.systemd.service 
     - Reload            — replaces Manager.Reload D-Bus method

  Supporting changes:
- link_get_bit_rates() extracted from networkd-link.c into
networkd-speed-meter.c
- BitRates added to link_build_json() so Link.Describe returns them
inline
alongside the existing interface description, replacing a separate D-Bus
read
- link_reconfigure_full() and manager_reload() extended to accept
sd_varlink*
for deferred async replies (consistent with existing sd_bus_message*
path)
  - DHCP lease display (networkctl status) uses Link.Describe instead of
DHCPServer.Leases; falls back to ClientId when hostname is not present
From yeswehack.com reports
Currently translated at 100.0% (266 of 266 strings)

Co-authored-by: Arif Budiman <arifpedia@gmail.com>
Translate-URL: https://translate.fedoraproject.org/projects/systemd/main/id/
Translation: systemd/main
libtss2-tcti-device0 is not installed by default in the openSUSE
image, but is now required when building the test image. Without it,
the build fails with

```
Shared library 'libtss2-tcti-device.so.0' is not available:
libtss2-tcti-device.so.0: cannot open shared object file: No such file or directory
```

Follow-up for 5f85409
Works the same way as memory pressure notifications. Code is refactored
to work on enum arrays to reduce duplication.
Make the doc more generic and mention all pressure types, not just
memory.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.