Skip to content

Commit dfbcd39

Browse files
committed
Add thread_spawn_hook rfc.
1 parent e4bff82 commit dfbcd39

File tree

1 file changed

+238
-0
lines changed

1 file changed

+238
-0
lines changed

text/3641-thread-spawn-hook.md

+238
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,238 @@
1+
- Feature Name: `thread_spawn_hook`
2+
- Start Date: 2024-05-22
3+
- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)
4+
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)
5+
6+
# Summary
7+
8+
Add `std::thread::add_spawn_hook` to register a hook that runs every time a thread spawns.
9+
This will effectively provide us with "inheriting thread locals", a much requested feature.
10+
11+
```rust
12+
thread_local! {
13+
static MY_THREAD_LOCAL: Cell<u32> = Cell::new(0);
14+
}
15+
16+
std::thread::add_spawn_hook(|_| {
17+
// Get the value of X in the spawning thread.
18+
let value = MY_THREAD_LOCAL.get();
19+
20+
Ok(move || {
21+
// Set the value of X in the newly spawned thread.
22+
MY_THREAD_LOCAL.set(value);
23+
})
24+
});
25+
```
26+
27+
# Motivation
28+
29+
Thread local variables are often used for scoped "global" state.
30+
For example, a testing framework might store the status or name of the current
31+
unit test in a thread local variable, such that multiple tests can be run in
32+
parallel in the same process.
33+
34+
However, this information will not be preserved across threads when a unit test
35+
will spawn a new thread, which is problematic.
36+
37+
The solution seems to be "inheriting thread locals": thread locals that are
38+
automatically inherited by new threads.
39+
40+
However, adding this property to thread local variables is not easily possible.
41+
Thread locals are initialized lazily. And by the time they are initialized, the
42+
parent thread might have already disappeared, such that there is no value left
43+
to inherit from.
44+
Additionally, even if the parent thread was still alive, there is no way to
45+
access the value in the parent thread without causing race conditions.
46+
47+
Allowing hooks to be run as part of spawning a thread allows precise control
48+
over how thread locals are "inherited".
49+
One could simply `clone()` them, but one could also add additional information
50+
to them, or even add relevant information to some (global) data structure.
51+
52+
For example, not only could a custom testing framework keep track of unit test
53+
state even across spawned threads, but a logging/debugging/tracing library could
54+
keeps track of which thread spawned which thread to provide more useful
55+
information to the user.
56+
57+
# Public Interface
58+
59+
```rust
60+
// In std::thread:
61+
62+
/// Registers a function to run for every new thread spawned.
63+
///
64+
/// The hook is executed in the parent thread, and returns a function
65+
/// that will be executed in the new thread.
66+
///
67+
/// The hook is called with the `Thread` handle for the new thread.
68+
///
69+
/// If the hook returns an `Err`, thread spawning is aborted. In that case, the
70+
/// function used to spawn the thread (e.g. `std::thread::spawn`) will return
71+
/// the error returned by the hook.
72+
///
73+
/// Hooks can only be added, not removed.
74+
///
75+
/// The hooks will run in order, starting with the most recently added.
76+
///
77+
/// # Usage
78+
///
79+
/// ```
80+
/// std::add_spawn_hook(|_| {
81+
/// ..; // This will run in the parent (spawning) thread.
82+
/// Ok(move || {
83+
/// ..; // This will run it the child (spawned) thread.
84+
/// })
85+
/// });
86+
/// ```
87+
///
88+
/// # Example
89+
///
90+
/// ```
91+
/// thread_local! {
92+
/// static MY_THREAD_LOCAL: Cell<u32> = Cell::new(0);
93+
/// }
94+
///
95+
/// std::thread::add_spawn_hook(|_| {
96+
/// // Get the value of X in the spawning thread.
97+
/// let value = MY_THREAD_LOCAL.get();
98+
///
99+
/// Ok(move || {
100+
/// // Set the value of X in the newly spawned thread.
101+
/// MY_THREAD_LOCAL.set(value);
102+
/// })
103+
/// });
104+
/// ```
105+
pub fn add_spawn_hook<F, G>(hook: F)
106+
where
107+
F: 'static + Sync + Fn(&Thread) -> std::io::Result<G>,
108+
G: 'static + Send + FnOnce();
109+
```
110+
111+
# Implementation
112+
113+
The implementation could simply be a static `RwLock` with a `Vec` of
114+
(boxed/leaked) `dyn Fn`s, or a simple lock free linked list of hooks.
115+
116+
Functions that spawn a thread, such as `std::thread::spawn` will eventually call
117+
`spawn_unchecked_`, which will call the hooks in the parent thread, after the
118+
child `Thread` object has been created, but before the child thread has been
119+
spawned. The resulting `FnOnce` objects are stored and passed on to the child
120+
thread afterwards, which will execute them one by one before continuing with its
121+
main function.
122+
123+
# Downsides
124+
125+
- The implementation requires allocation for each hook (to store them in the
126+
global list of hooks), and an allocation each time a hook is spawned
127+
(to store the resulting closure).
128+
129+
- A library that wants to make use of inheriting thread locals will have to
130+
register a global hook, and will need to keep track of whether its hook has
131+
already been added (e.g. in a static `AtomicBool`).
132+
133+
- The hooks will not run if threads are spawned through e.g. pthread directly,
134+
bypassing the Rust standard library.
135+
(However, this is already the case for output capturing in libtest:
136+
that does not work across threads when not spawned by libstd.)
137+
138+
# Rationale and alternatives
139+
140+
## Use of `io::Result`.
141+
142+
The hook returns an `io::Result` rather than the `FnOnce` directly.
143+
This can be useful for e.g. resource limiting or possible errors while
144+
registering new threads, but makes the signature more complicated.
145+
146+
An alternative could be to simplify the signature by removing the `io::Result`,
147+
which is fine for most use cases.
148+
149+
## Global vs thread local effect
150+
151+
`add_spawn_hook` has a global effect (similar to e.g. libc's `atexit()`),
152+
to keep things simple.
153+
154+
An alternative could be to store the list of spawn hooks per thread,
155+
that are inherited to by new threads from their parent thread.
156+
That way, a hook added by `add_spawn_hook` will only affect the current thread
157+
and all (direct and indirect) future child threads of the current thread,
158+
not other unrelated threads.
159+
160+
Both are relatively easy and efficient to implement (as long as removing hooks
161+
is not an option).
162+
163+
However, the first (global) behavior is conceptually simpler and allows for more
164+
flexibility. Using a global hook, one can still implement the thread local
165+
behavior, but this is not possible the other way around.
166+
167+
## Add but no remove
168+
169+
Having only an `add_spawn_hook` but not a `remove_spawn_hook` keeps things
170+
simple, by 1) not needing a global (thread safe) data structure that allows
171+
removing items and 2) not needing a way to identify a specific hook (through a
172+
handle or a name).
173+
174+
If a hook only needs to execute conditionally, one can make use of an
175+
`if` statement.
176+
177+
## Requiring storage on spawning
178+
179+
Because the hooks run on the parent thread first, before the child thread is
180+
spawned, the results of those hooks (the functions to be executed in the child)
181+
need to be stored. This will require heap allocations (although it might be
182+
possible for an optimization to save small objects on the stack up to a certain
183+
size).
184+
185+
An alternative interface that wouldn't require any store is possible, but has
186+
downsides. Such an interface would spawn the child thread *before* running the
187+
hooks, and allow the hooks to execute a closure on the child (before it moves on
188+
to its main function). That looks roughly like this:
189+
190+
```rust
191+
std::thread::add_spawn_hook(|child| {
192+
// Get the value on the parent thread.
193+
let value = MY_THREAD_LOCAL.get();
194+
// Set the value on the child thread.
195+
child.exec(|| MY_THREAD_LOCAL.set(value));
196+
});
197+
```
198+
199+
This could be implemented without allocations, as the function executed by the
200+
child can now be borrowed from the parent thread.
201+
202+
However, this means that the parent thread will have to block until the child
203+
thread has been spawned, and block for each hook to be finished on both threads,
204+
significantly slowing down thread creation.
205+
206+
Considering that spawning a thread involves several allocations and syscalls,
207+
it doesn't seem very useful to try to minimize an extra allocation when that
208+
comes at a significant cost.
209+
210+
## `impl` vs `dyn` in the signature
211+
212+
An alternative interface could use `dyn` instead of generics, as follows:
213+
214+
```rust
215+
pub fn add_spawn_hook<F, G>(
216+
hook: Box<dyn Fn(&Thread) -> io::Result<Box<dyn FnOnce() + Send>> + Sync>
217+
);
218+
```
219+
220+
However, this mostly has downsides: it requires the user to write `Box::new` in
221+
a few places, and it prevents us from ever implementing some optimization tricks
222+
to, for example, use a single allocation for multiple hook results.
223+
224+
# Unresolved questions
225+
226+
- Should the return value of the hook be an `Option`, for when the hook does not
227+
require any code to be run in the child?
228+
229+
- Should the hook be able to access/configure more information about the child
230+
thread? E.g. set its stack size.
231+
(Note that settings that can be changed afterwards by the child thread, such as
232+
the thread name, can already be set by simply setting it as part of the code
233+
that runs on the child thread.)
234+
235+
# Future possibilities
236+
237+
- Using this in libtest for output capturing (instead of today's
238+
implementation that has special hardcoded support in libstd).

0 commit comments

Comments
 (0)