Skip to content

Split glibc and linux sigset_t ABIs and the accessor functions #23601

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 1, 2025

Conversation

rootbeer
Copy link
Contributor

@rootbeer rootbeer commented Apr 18, 2025

Zig's c.zig glibc/musl layer and the linux.zig kernel syscall wrapper layers both define sigset_t. Currently linux.zig uses a definition compatible with glibc/musl, and then c.zig imports that definition. The glibc ABI defines a 1024-bit sigset_t. The linux kernel ABI generally defines a 64-bit sigset_t (or 128-bit on MIPS). At run-time, glibc (and Zig) just presents the lowest bits of the larger sigset_t to the kernel. Glibc "needs" the extra space in its ABI so it can accommodate future kernels that support a larger number of signals. Zig does not need that compatibility unless linking the C library. The linux.zig ABI should be able to define a sigset_t that matches the underlying kernel ABI.

The array of words backing sigset_t array must match the kernel's layout. Two 32-bit words are not the same as one 64-bit word on a big-endian platform. The kernel headers use unsigned long as the element, so I used c_ulong for consistency. This means the array is sometimes of 32-bit words, and sometimes made from 64-bit words. I added some tests that make sure blocking specific signals works to check we're getting the bit indexes correct. (In practice most signal numbers are less than 32, so this wasn't much of a practical problem.)

The sigfillset() call needs to go through the dynamically linked C library, when one is present. The run-time library might decide to reserve some signals. Similarly, sigaddset() might reject some signal numbers reserved by the C library. Also, Zig's pre-filled filled_sigset can't work when linking a C library, because Zig can't invoke external symbols in a static initializer (at least, that's what I understood the error I got to mean). This PR provides POSIX-like sigemptyset() and sigfillset() functions, but in Zig they return the initialized set (instead of taking the set as an output parameter). This PR also removes linux.all_mask.

Sigaction contains a sigset_t. In Zig this structure is already mapped (in the sigaction wrapper) into a k_sigaction that has a "restorer" field and an ad-hoc, right-sized signal mask field. With a properly defined sigset_t, some differences in std.os.linx.Sigaction versions collapse (e.g., the MIPS variations were just encapsulating sigset_t differences.) The Zig-only Sigaction structure doesn't need a restorer, so remove that. Also doesn't need to be an extern struct.

The context_t structure also contains a glibc-compatible sigset_t. I made that explicit in this PR, and avoided fixing it for now, as I think there are deeper changes required around this structure.

Using a u6 for the signal number is too narrow. Signal numbers can go up to 128 on some platforms. Changed to a u8 parameter since the actual boundary isn't always a power of two. Means we need to @truncate when converting signal numbers into bit masks in some places.

I am unclear about the darwin.zig changes. I'm not sure if the sigset accessors (sigaddset(), sigfillset(), etc) are defined in the Darwin C library (but I assume they are?).

Added a std.os.posix layer to hide differences (mostly just error handling) between C-library sigset manipulators and the direct linux kernel accessor. I can drop these if folks would rather not see new code added to posix, but they were handy for making tests to make sure all the implementations are consistent.

PR inspired by conversation on https://github.com/ziglang/zig/pull/23502\#discussion\_r2034064507

@alexrp alexrp self-assigned this Apr 18, 2025
@alexrp alexrp self-requested a review April 18, 2025 21:29
@rootbeer
Copy link
Contributor Author

Thanks for quick the review! I've got some test failures to dig into, in addition to your feedback to address. I'll bring this back from a draft when its ready for another look.

@rootbeer rootbeer marked this pull request as draft April 19, 2025 06:49
@rootbeer
Copy link
Contributor Author

Looks like I still have some work to do ...

error: the following command exited with error code 1:
/home/ci/actions-runner9/_work/zig/zig/zig-local-cache/o/81247a531862a8e65f403d02274c5c82/shared_lib_unwind 
slices differ. first difference occurs at index 0 (0x0)

============ expected this output: =============  len: 5 (0x5)

[0]: 17718044
[1]: 17715288
[2]: 281472964925348
[3]: 281472964925616
[4]: 17715004

============= instead found this: ==============  len: 5 (0x5)

[0]: 12297829382473034410
[1]: 12297829382473034410
[2]: 12297829382473034410
[3]: 12297829382473034410
[4]: 12297829382473034410

================================================

error: TestExpectedEqual
/home/ci/actions-runner9/_work/zig/zig/build-debug/../lib/std/testing.zig:435:5: 0x10e4c3f in expectEqualSlices__anon_23789 (shared_lib_unwind)
    return error.TestExpectedEqual;
    ^
/home/ci/actions-runner9/_work/zig/zig/build-debug/../lib/std/testing.zig:[128](https://github.com/ziglang/zig/actions/runs/14627905825/job/41043798928?pr=23601#step:3:129):27: 0x10e4ddf in expectEqualInner__anon_23774 (shared_lib_unwind)
        .array => |array| try expectEqualSlices(array.child, &expected, &actual),
                          ^
/home/ci/actions-runner9/_work/zig/zig/test/standalone/stack_iterator/shared_lib_unwind.zig:47:5: 0x10e4f97 in main (shared_lib_unwind)
    try testing.expectEqual(expected, unwound);

So far all I know is that 12297829382473034410 is 0xaaaaaaaaaaaaaaaa, and this is on aarch64-linux. I'm guessing I messed up ucontext_t somehow to have broken the backtrace ....

@rootbeer rootbeer marked this pull request as ready for review April 25, 2025 17:32
@rootbeer
Copy link
Contributor Author

Dropped empty_sigset and filled_sigset (this is an API breakage for Zig?). With this PR posix.sigfillset() and posix.sigemptyset() return sets (they don't take a pointer to an existing set like the C API). Same for the linux/darwin/plan wrappers. The c.zig version of `sigfillset` and `sigemptyset` still have the C-style API, though. This API keeps the Sigaction initializers simple.

QEMU maps target signals onto host signals. When the target has more signals than the host, the extras signals are just not supported. See
https://gitlab.com/qemu-project/qemu/-/blob/master/linux-user/signal.c#L522 This manifests in one Zig test as a linux.kill() call returning EINVAL. Thus some hackery in the test case for ignoring that failure mode when it is encountered.

For future work, it might make sense to add an initializer to the Sigaction. That could hide the sigset initialization, and could also do the restorer field initialization. Then we wouldn't need a separate k_Sigaction type.

Splitting up the ucontext_t (glibc vs native) is another bit of follow-up work. I'll file a bug for that.

@alexrp
Copy link
Member

alexrp commented Apr 25, 2025

For future work, it might make sense to add an initializer to the Sigaction. That could hide the sigset initialization, and could also do the restorer field initialization. Then we wouldn't need a separate k_Sigaction type.

Can you show a minimal diff of what this would look like? It can be follow-up work in any case; I'm just curious.

@rootbeer
Copy link
Contributor Author

For future work, it might make sense to add an initializer to the Sigaction. That could hide the sigset initialization, and could also do the restorer field initialization. Then we wouldn't need a separate k_Sigaction type.

Can you show a minimal diff of what this would look like? It can be follow-up work in any case; I'm just curious.

I was thinking of something like this (these would be constructors on Sigaction):

        /// Zero'd sigaction.  Caller is responsible for the `restorer` field.
        pub fn init() Sigaction {
            return Sigaction{
                flags = 0,
                handler = null,
                restorer = null,
                mask = linux.sigemptyset(),
            };
        }

        /// Initialize Sigaction with siginfo_t style handler. The SA.SIGINFO flag will be added.
        pub fn init_siginfo(handler: k_sigaction_funcs.sigaction_fn, flags: c_uint) Sigaction {
            return Sigaction{
                flags = SA.SIGINFO | flags,
                handler = @ptrCast(handler),
                restorer = &restore_rt,
                mask = linux.sigemptyset(),
            };
        }

        /// Initialize Sigaction to restore a default handler.
        pub fn init_sigdefault() Sigaction {
            return Sigaction{
                flags = 0,
                handler = SIG.DFL,
                restorer = &restore,
                mask = linux.sigemptyset(),
            };
        }

The optimization is that these constructors can initialize the restorer field appropriately (so Zig doesn't need to copy a Sigaction into a k_sigaction inside the sigaction syscall wrapper.) The "API" of these structs is pretty funky, though, so you either need a lot of constructors to cover the variations, or a complicated constructor signature. Plus these would have to be common across all the systems backing the posix API so they could be used through const act = posix.sigaction.init...(). After expanding it out a bit like this, I'm not sure its worth the effort just to optimize sigaction invocations (and to deviate from the C API).

@alexrp
Copy link
Member

alexrp commented Apr 26, 2025

Makes sense. And yeah, I think I agree with your conclusion.

Copy link
Member

@alexrp alexrp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than the above comments.

@rootbeer
Copy link
Contributor Author

And it looks like I've broken something with the latest update .. (I did run the tests locally! But not getting the same MIPS coverage? Or release?) ... Anyway, I'll be back with more patches soon ...

@alexrp
Copy link
Member

alexrp commented Apr 29, 2025

To run non-native architectures locally, you have to pass -fqemu, and optionally --glibc-runtimes <...> if you've built glibc.

@rootbeer
Copy link
Contributor Author

Heh, I think the MIPS signal numbers were wrong. So the "safe" SIGURG signal was actually triggering a SIGSTOP (on MIPS).

We have the technology to verify all these constants match their C equivalents. (And that field offsets and sizes match too.) But, AFAICT, there are no tests doing the equivalent of expectEqual(c_headers.signal.SIGURG, std.os.linux.SIG.URG) ...

Export the sigset_t ops (sigaddset, etc) from the C library.  Don't rely
on the linux.zig defintions (which will be defined to use the kernel ABI).

Move Darwin sigset and NSIG declarations into darwin.zig.  Remove
extraneous (?) sigaddset.  The C library sigaddset can reject some signals
being added, so need to defer to it.
The kernel ABI sigset_t is smaller than the glibc one.  Define the
right-sized sigset_t and fixup the sigaction() wrapper to leverage it.
The Sigaction wrapper here is not an ABI, so relax it (drop the "extern"
and the "restorer" fields), the existing `k_sigaction` is the ABI
sigaction struct.

Linux defines `sigset_t` with a c_ulong, so it can be 32-bit or 64-bit,
depending on the platform.  This can make a difference on big-endian
systems.

Patch up `ucontext_t` so that this change doesn't impact its layout.
AFAICT, its currently the glibc layout.
Unify the C library sigset_t and Linux native sigset_t and the accessor
operations.

Add tests that the various sigset_t operations are working.  And clean up
existing tests a bit.
When linking a libc, Zig should defer to the C library for sigset
operations.  The pre-filled constants signal sets (empty_sigset,
filled_sigset) are not compatible with C library initialization, so remove
them and use the runtime `sigemptyset` and `sigfillset` methods to
initialize any sigset.
…gset_t

By returning an initialized sigset (instead of taking the set as an output
parameter), these functions can be used to directly initialize the `mask`
parameter of a `Sigaction` instance.
All the existing code that manipulates `ucontext_t` expects there to be a
glibc-compatible sigmask (1024-bit).  The `ucontext_t` struct need to be
cleaned up so the glibc-dependent format is only used when linking
glibc/musl library, but that is a more involved change.

In practice, no Zig code looks at the sigset field contents, so it just
needs to be the right size.
Dunno why the MIPS signal numbers are different, or why Zig had them
already special cased, but wrong.

We have the technology to test these constants.  We should use it.
@alexrp
Copy link
Member

alexrp commented May 1, 2025

We have the technology to verify all these constants match their C equivalents. (And that field offsets and sizes match too.) But, AFAICT, there are no tests doing the equivalent of expectEqual(c_headers.signal.SIGURG, std.os.linux.SIG.URG) ...

I like the idea of this (and in fact do that in one of my personal projects), but there's the issue of what we do when zig translate-c is officially split out from ziglang/zig into ziglang/translate-c. AFAIK Andrew doesn't like the idea of ziglang/zig having external dependencies (even through the Zig build system). OTOH, the same issue will pop up with zig cc and building the source code for our vendored libcs.

So we need to actually decide on a direction for the project in this regard with all the relevant facts taken into account. Could you file an issue for this idea? I'll bring it up for discussion in one of the upcoming team meetings.

@alexrp
Copy link
Member

alexrp commented May 1, 2025

Splitting up the ucontext_t (glibc vs native) is another bit of follow-up work. I'll file a bug for that.

Just a reminder to file a bug for this as well.

@alexrp alexrp merged commit ad9cb40 into ziglang:master May 1, 2025
9 checks passed
@alexrp
Copy link
Member

alexrp commented May 1, 2025

Great work, thanks!

@alexrp alexrp removed their assignment May 1, 2025
@rootbeer rootbeer deleted the sig-split branch May 1, 2025 19:40
@rootbeer
Copy link
Contributor Author

rootbeer commented May 1, 2025

Thanks for the reviews! I'll file the 2 bugs in the next couple days ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants