Split glibc and linux sigset_t ABIs and the accessor functions #23601

rootbeer · 2025-04-18T21:23:18Z

Zig's c.zig glibc/musl layer and the linux.zig kernel syscall wrapper layers both define sigset_t. Currently linux.zig uses a definition compatible with glibc/musl, and then c.zig imports that definition. The glibc ABI defines a 1024-bit sigset_t. The linux kernel ABI generally defines a 64-bit sigset_t (or 128-bit on MIPS). At run-time, glibc (and Zig) just presents the lowest bits of the larger sigset_t to the kernel. Glibc "needs" the extra space in its ABI so it can accommodate future kernels that support a larger number of signals. Zig does not need that compatibility unless linking the C library. The linux.zig ABI should be able to define a sigset_t that matches the underlying kernel ABI.

The array of words backing sigset_t array must match the kernel's layout. Two 32-bit words are not the same as one 64-bit word on a big-endian platform. The kernel headers use unsigned long as the element, so I used c_ulong for consistency. This means the array is sometimes of 32-bit words, and sometimes made from 64-bit words. I added some tests that make sure blocking specific signals works to check we're getting the bit indexes correct. (In practice most signal numbers are less than 32, so this wasn't much of a practical problem.)

The sigfillset() call needs to go through the dynamically linked C library, when one is present. The run-time library might decide to reserve some signals. Similarly, sigaddset() might reject some signal numbers reserved by the C library. Also, Zig's pre-filled filled_sigset can't work when linking a C library, because Zig can't invoke external symbols in a static initializer (at least, that's what I understood the error I got to mean). This PR provides POSIX-like sigemptyset() and sigfillset() functions, but in Zig they return the initialized set (instead of taking the set as an output parameter). This PR also removes linux.all_mask.

Sigaction contains a sigset_t. In Zig this structure is already mapped (in the sigaction wrapper) into a k_sigaction that has a "restorer" field and an ad-hoc, right-sized signal mask field. With a properly defined sigset_t, some differences in std.os.linx.Sigaction versions collapse (e.g., the MIPS variations were just encapsulating sigset_t differences.) The Zig-only Sigaction structure doesn't need a restorer, so remove that. Also doesn't need to be an extern struct.

The context_t structure also contains a glibc-compatible sigset_t. I made that explicit in this PR, and avoided fixing it for now, as I think there are deeper changes required around this structure.

Using a u6 for the signal number is too narrow. Signal numbers can go up to 128 on some platforms. Changed to a u8 parameter since the actual boundary isn't always a power of two. Means we need to @truncate when converting signal numbers into bit masks in some places.

I am unclear about the darwin.zig changes. I'm not sure if the sigset accessors (sigaddset(), sigfillset(), etc) are defined in the Darwin C library (but I assume they are?).

Added a std.os.posix layer to hide differences (mostly just error handling) between C-library sigset manipulators and the direct linux kernel accessor. I can drop these if folks would rather not see new code added to posix, but they were handy for making tests to make sure all the implementations are consistent.

PR inspired by conversation on https://github.com/ziglang/zig/pull/23502\#discussion\_r2034064507

lib/std/c.zig

lib/std/c/darwin.zig

lib/std/os/linux.zig

rootbeer · 2025-04-19T06:48:53Z

Thanks for quick the review! I've got some test failures to dig into, in addition to your feedback to address. I'll bring this back from a draft when its ready for another look.

rootbeer · 2025-04-24T02:43:48Z

Looks like I still have some work to do ...

error: the following command exited with error code 1:
/home/ci/actions-runner9/_work/zig/zig/zig-local-cache/o/81247a531862a8e65f403d02274c5c82/shared_lib_unwind 
slices differ. first difference occurs at index 0 (0x0)

============ expected this output: =============  len: 5 (0x5)

[0]: 17718044
[1]: 17715288
[2]: 281472964925348
[3]: 281472964925616
[4]: 17715004

============= instead found this: ==============  len: 5 (0x5)

[0]: 12297829382473034410
[1]: 12297829382473034410
[2]: 12297829382473034410
[3]: 12297829382473034410
[4]: 12297829382473034410

================================================

error: TestExpectedEqual
/home/ci/actions-runner9/_work/zig/zig/build-debug/../lib/std/testing.zig:435:5: 0x10e4c3f in expectEqualSlices__anon_23789 (shared_lib_unwind)
    return error.TestExpectedEqual;
    ^
/home/ci/actions-runner9/_work/zig/zig/build-debug/../lib/std/testing.zig:[128](https://github.com/ziglang/zig/actions/runs/14627905825/job/41043798928?pr=23601#step:3:129):27: 0x10e4ddf in expectEqualInner__anon_23774 (shared_lib_unwind)
        .array => |array| try expectEqualSlices(array.child, &expected, &actual),
                          ^
/home/ci/actions-runner9/_work/zig/zig/test/standalone/stack_iterator/shared_lib_unwind.zig:47:5: 0x10e4f97 in main (shared_lib_unwind)
    try testing.expectEqual(expected, unwound);

So far all I know is that 12297829382473034410 is 0xaaaaaaaaaaaaaaaa, and this is on aarch64-linux. I'm guessing I messed up ucontext_t somehow to have broken the backtrace ....

rootbeer · 2025-04-25T17:34:02Z

Dropped empty_sigset and filled_sigset (this is an API breakage for Zig?). With this PR posix.sigfillset() and posix.sigemptyset() return sets (they don't take a pointer to an existing set like the C API). Same for the linux/darwin/plan wrappers. The c.zig version of `sigfillset` and `sigemptyset` still have the C-style API, though. This API keeps the Sigaction initializers simple.

QEMU maps target signals onto host signals. When the target has more signals than the host, the extras signals are just not supported. See
https://gitlab.com/qemu-project/qemu/-/blob/master/linux-user/signal.c#L522 This manifests in one Zig test as a linux.kill() call returning EINVAL. Thus some hackery in the test case for ignoring that failure mode when it is encountered.

For future work, it might make sense to add an initializer to the Sigaction. That could hide the sigset initialization, and could also do the restorer field initialization. Then we wouldn't need a separate k_Sigaction type.

Splitting up the ucontext_t (glibc vs native) is another bit of follow-up work. I'll file a bug for that.

alexrp · 2025-04-25T17:44:46Z

For future work, it might make sense to add an initializer to the Sigaction. That could hide the sigset initialization, and could also do the restorer field initialization. Then we wouldn't need a separate k_Sigaction type.

Can you show a minimal diff of what this would look like? It can be follow-up work in any case; I'm just curious.

rootbeer · 2025-04-26T21:20:46Z

For future work, it might make sense to add an initializer to the Sigaction. That could hide the sigset initialization, and could also do the restorer field initialization. Then we wouldn't need a separate k_Sigaction type.

Can you show a minimal diff of what this would look like? It can be follow-up work in any case; I'm just curious.

I was thinking of something like this (these would be constructors on Sigaction):

        /// Zero'd sigaction.  Caller is responsible for the `restorer` field.
        pub fn init() Sigaction {
            return Sigaction{
                flags = 0,
                handler = null,
                restorer = null,
                mask = linux.sigemptyset(),
            };
        }

        /// Initialize Sigaction with siginfo_t style handler. The SA.SIGINFO flag will be added.
        pub fn init_siginfo(handler: k_sigaction_funcs.sigaction_fn, flags: c_uint) Sigaction {
            return Sigaction{
                flags = SA.SIGINFO | flags,
                handler = @ptrCast(handler),
                restorer = &restore_rt,
                mask = linux.sigemptyset(),
            };
        }

        /// Initialize Sigaction to restore a default handler.
        pub fn init_sigdefault() Sigaction {
            return Sigaction{
                flags = 0,
                handler = SIG.DFL,
                restorer = &restore,
                mask = linux.sigemptyset(),
            };
        }

The optimization is that these constructors can initialize the restorer field appropriately (so Zig doesn't need to copy a Sigaction into a k_sigaction inside the sigaction syscall wrapper.) The "API" of these structs is pretty funky, though, so you either need a lot of constructors to cover the variations, or a complicated constructor signature. Plus these would have to be common across all the systems backing the posix API so they could be used through const act = posix.sigaction.init...(). After expanding it out a bit like this, I'm not sure its worth the effort just to optimize sigaction invocations (and to deviate from the C API).

alexrp · 2025-04-26T21:24:17Z

Makes sense. And yeah, I think I agree with your conclusion.

lib/std/os/emscripten.zig

lib/std/os/plan9.zig

lib/std/posix/test.zig

alexrp

LGTM other than the above comments.

rootbeer · 2025-04-29T00:29:08Z

And it looks like I've broken something with the latest update .. (I did run the tests locally! But not getting the same MIPS coverage? Or release?) ... Anyway, I'll be back with more patches soon ...

alexrp · 2025-04-29T00:34:42Z

To run non-native architectures locally, you have to pass -fqemu, and optionally --glibc-runtimes <...> if you've built glibc.

rootbeer · 2025-04-29T07:16:55Z

Heh, I think the MIPS signal numbers were wrong. So the "safe" SIGURG signal was actually triggering a SIGSTOP (on MIPS).

We have the technology to verify all these constants match their C equivalents. (And that field offsets and sizes match too.) But, AFAICT, there are no tests doing the equivalent of expectEqual(c_headers.signal.SIGURG, std.os.linux.SIG.URG) ...

Export the sigset_t ops (sigaddset, etc) from the C library. Don't rely on the linux.zig defintions (which will be defined to use the kernel ABI). Move Darwin sigset and NSIG declarations into darwin.zig. Remove extraneous (?) sigaddset. The C library sigaddset can reject some signals being added, so need to defer to it.

The kernel ABI sigset_t is smaller than the glibc one. Define the right-sized sigset_t and fixup the sigaction() wrapper to leverage it. The Sigaction wrapper here is not an ABI, so relax it (drop the "extern" and the "restorer" fields), the existing `k_sigaction` is the ABI sigaction struct. Linux defines `sigset_t` with a c_ulong, so it can be 32-bit or 64-bit, depending on the platform. This can make a difference on big-endian systems. Patch up `ucontext_t` so that this change doesn't impact its layout. AFAICT, its currently the glibc layout.

Unify the C library sigset_t and Linux native sigset_t and the accessor operations. Add tests that the various sigset_t operations are working. And clean up existing tests a bit.

When linking a libc, Zig should defer to the C library for sigset operations. The pre-filled constants signal sets (empty_sigset, filled_sigset) are not compatible with C library initialization, so remove them and use the runtime `sigemptyset` and `sigfillset` methods to initialize any sigset.

…gset_t By returning an initialized sigset (instead of taking the set as an output parameter), these functions can be used to directly initialize the `mask` parameter of a `Sigaction` instance.

All the existing code that manipulates `ucontext_t` expects there to be a glibc-compatible sigmask (1024-bit). The `ucontext_t` struct need to be cleaned up so the glibc-dependent format is only used when linking glibc/musl library, but that is a more involved change. In practice, no Zig code looks at the sigset field contents, so it just needs to be the right size.

Dunno why the MIPS signal numbers are different, or why Zig had them already special cased, but wrong. We have the technology to test these constants. We should use it.

alexrp · 2025-05-01T19:25:50Z

We have the technology to verify all these constants match their C equivalents. (And that field offsets and sizes match too.) But, AFAICT, there are no tests doing the equivalent of expectEqual(c_headers.signal.SIGURG, std.os.linux.SIG.URG) ...

I like the idea of this (and in fact do that in one of my personal projects), but there's the issue of what we do when zig translate-c is officially split out from ziglang/zig into ziglang/translate-c. AFAIK Andrew doesn't like the idea of ziglang/zig having external dependencies (even through the Zig build system). OTOH, the same issue will pop up with zig cc and building the source code for our vendored libcs.

So we need to actually decide on a direction for the project in this regard with all the relevant facts taken into account. Could you file an issue for this idea? I'll bring it up for discussion in one of the upcoming team meetings.

alexrp · 2025-05-01T19:28:28Z

Splitting up the ucontext_t (glibc vs native) is another bit of follow-up work. I'll file a bug for that.

Just a reminder to file a bug for this as well.

alexrp · 2025-05-01T19:29:26Z

Great work, thanks!

rootbeer · 2025-05-01T19:41:10Z

Thanks for the reviews! I'll file the 2 bugs in the next couple days ...

alexrp self-assigned this Apr 18, 2025

alexrp self-requested a review April 18, 2025 21:29