Add nonunix platform support#274
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please copy-paste ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. LLM Linter (✨ experimental)Possible typos and grammar issues:
2026-06-22 21:32:45 |
ryanofsky
left a comment
There was a problem hiding this comment.
Thanks for the review! Will update with suggestions.
Add ProcessId = int type alias and apply it to WaitProcess, SpawnProcess (pid output argument), and callers. ProcessId type will be different on windows so this provides more portability.
Add SocketId = int and SocketError = -1 type aliases and apply SocketId to SpawnProcess (return type and callback parameter) and callers. SocketId type will be different on Windows, so this provides more portability. Co-authored-by: Sjors Provoost <sjors@sprovoost.nl>
Add SpawnConnectInfo type alias to pass socket handle from parent process to child process in more platform independent way.
gen.cpp used fork() directly via <unistd.h> to invoke the capnp compiler as a subprocess, but fork() is not available on Windows, so shouldn't be used in application code. Add a StartProcess(const std::vector<std::string>& args) function to util.h/util.cpp that spawns a process and returns its ProcessId, leaving the caller responsible for WaitProcess. On POSIX it uses posix_spawn; on Windows it can use CreateProcess. Update gen.cpp to replace the inline fork/exec/wait with mp::WaitProcess(mp::StartProcess(args)). There a change in behavior if starting the process fails, because KJ_FAIL_SYSCALL is now used for error reporting and the fs::weakly_canonical call is dropped (it is probably more useful to see literal executable name passed). Also previously, the child process could deadlock if exec() failed because threads in the parent may have held allocator or stdio mutex locks that are now leaked in the child. posix_spawn() avoids this entirely by never running arbitrary code in the child process. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extract socket pair creation from SpawnProcess into a standalone SocketPair() function, and use it to replace the inline socketpair() call. This is pretty much a refactoring but technically there are two small behavior changes: - FD_CLOEXEC is now used to ensure sockets are not leaked if processes are spawned. This has no impact on SpawnProcess, but is better hygiene and could affect future callers of SocketPair(). - KJ_SYSCALL is now used to simplify error-handling. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…objects Replace the m_wait_fd/m_post_fd raw int members with m_wait_stream/m_post_stream kj::Own<kj::AsyncIoStream> and m_post_writer kj::Own<kj::OutputStream>. The constructor uses provider->newTwoWayPipe() instead of calling socketpair() directly. The loop() and post() methods write through m_post_writer instead of calling write() with a raw fd, and EventLoopRef::reset does the same.
kj::AsyncIoStream::getFd() was added in capnproto 0.9 (commit d27bfb8a4175b32b783de68d93dd1dbafadddea5, first released in 0.9.0). The code now uses getFd() in proxy.cpp, so 0.7 is no longer a sufficient minimum. Set olddeps version to 0.9.2, which is the patched 0.9.x release for CVE-2022-46149. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…m objects Instead of accepting raw file descriptor integers and wrapping them internally, ConnectStream and ServeStream now accept kj::Own<kj::AsyncIoStream> directly. This removes the assumption that the transport is always a local unix fd, making the API easier to adapt to other I/O types (e.g. Windows handles). The Stream type alias (kj::Own<kj::AsyncIoStream>) is added as a convenience. Callers are updated to wrap their fd with wrapSocketFd() before calling.
Flush pending Cap'n Proto release messages before closing the stream. When one side of a socket pair closes, the other side does not receive an onDisconnect event, so it relies on receiving release messages from the closing side to free its ProxyServer objects and shut down cleanly. Without this, Server objects are not freed by Cap'n Proto on disconnection.
…ibraries
On macOS, when libcapnp is built as a dynamic library and Bitcoin Core
REDUCE_EXPORT option is used the RTTI typeinfo for kj::Exception has a
different address in libcapnp.dylib versus the calling binary. This
means catch (const kj::Exception& e) in the calling binary silently
fails to match exceptions thrown by capnp, so the DISCONNECTED exception
from shutdownWrite() propagates as a fatal uncaught exception instead of
being suppressed as intended.
This causes the Bitcoin Core macOS native CI job to fail with:
Fatal uncaught kj::Exception: kj/async-io-unix.c++:491: disconnected:
shutdown(fd, SHUT_WR): Socket is not connected
The fix is to use kj::runCatchingExceptions/kj::throwRecoverableException,
which use KJ's own thread-level exception interception mechanism rather
than C++ RTTI-based matching, and therefore work correctly across dynamic
library boundaries. This is the same approach used elsewhere in the
codebase (proxy.cpp EventLoop::post, type-context.h server request handler)
for the same reason.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
MSVC error when building multiprocess.vcxproj:
mp/util.h(146,46): error C2280:
'std::variant<T *,T>::variant(const std::variant<T *,T> &)':
attempting to reference a deleted function [with T=mp::Lock]
The PtrOrValue constructor used a ternary expression to initialize data:
data(ptr ? ptr : std::variant<T*, T>{std::in_place_type<T>, args...})
Both arms are prvalues of type std::variant<T*,T>, so under C++17's
mandatory copy elision no copy/move constructor should be invoked. GCC
and Clang apply this correctly. MSVC does not apply guaranteed copy
elision to ternary expressions in this context: it materializes the
temporary and then attempts to copy-construct data from it. Since
std::variant<Lock*,Lock> has a deleted copy constructor (Lock holds a
std::unique_lock which is move-only), MSVC fails.
Fix by initializing data to hold T*=ptr in the member initializer list,
then emplacing T in-place in the constructor body if ptr is null. This
avoids the ternary entirely and requires only the in-place constructor
of T, not any variant copy or move.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
MSVC warns (C4305, treated as error) about truncation from 'int' to 'const bool' when initializing static const bool members from integer bitwise-and expressions. Use constexpr bool with explicit != 0 to make the boolean conversion unambiguous. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
I opened stratum-mining/sv2-tp#110 to test the changes in the Template Provider. |
Sjors
left a comment
There was a problem hiding this comment.
Studied two first three commits...
| // exec, so the fd survives into the spawned process regardless of how | ||
| // the socket was created. | ||
| int flags = fcntl(fds[0], F_GETFD); | ||
| if (flags == -1) throw std::system_error(errno, std::system_category(), "fcntl F_GETFD"); |
There was a problem hiding this comment.
In 24c5e57 util: Clear FD_CLOEXEC on child socket before exec: to not make things more non-fork-safe:
diff --git a/src/mp/util.cpp b/src/mp/util.cpp
index 15400215e4..b2a545ef5f 100644
--- a/src/mp/util.cpp
+++ b/src/mp/util.cpp
@@ -62,4 +62,12 @@ size_t MaxFd()
}
+template <size_t N>
+[[noreturn]] void ChildProcessError(const char (&msg)[N])
+{
+ const ssize_t writeResult = ::write(STDERR_FILENO, msg, N - 1);
+ (void)writeResult;
+ _exit(126);
+}
+
} // namespace
@@ -144,8 +152,5 @@ std::tuple<ProcessId, SocketId> SpawnProcess(ConnectInfoToArgsFn&& connect_info_
throw std::system_error(errno, std::system_category(), "close");
}
- static constexpr char msg[] = "SpawnProcess(child): close(fds[1]) failed\n";
- const ssize_t writeResult = ::write(STDERR_FILENO, msg, sizeof(msg) - 1);
- (void)writeResult;
- _exit(126);
+ ChildProcessError("SpawnProcess(child): close(fds[1]) failed\n");
}
@@ -164,8 +169,12 @@ std::tuple<ProcessId, SocketId> SpawnProcess(ConnectInfoToArgsFn&& connect_info_
// the socket was created.
int flags = fcntl(fds[0], F_GETFD);
- if (flags == -1) throw std::system_error(errno, std::system_category(), "fcntl F_GETFD");
+ if (flags == -1) {
+ ChildProcessError("SpawnProcess(child): fcntl(fds[0], F_GETFD) failed\n");
+ }
if (flags & FD_CLOEXEC) {
flags &= ~FD_CLOEXEC;
- if (fcntl(fds[0], F_SETFD, flags) == -1) throw std::system_error(errno, std::system_category(), "fcntl F_SETFD");
+ if (fcntl(fds[0], F_SETFD, flags) == -1) {
+ ChildProcessError("SpawnProcess(child): fcntl(fds[0], F_SETFD) failed\n");
+ }
}There was a problem hiding this comment.
Was there something in particular that led you this (hardening) commit?
There was a problem hiding this comment.
re: #274 (comment)
In 24c5e57 util: Clear FD_CLOEXEC on child socket before exec: to not make things more non-fork-safe:
Good catch. An easier to way to make it safe was just to move the CLOEXEC code before the fork() call so now doing that
There was a problem hiding this comment.
re: #274 (comment)
Was there something in particular that led you this (hardening) commit?
I don't remember exactly. It was definitely a bug of some kind. I think an earlier version of the code was using newTwoWayPipe which wraps a socket pair and sets CLOEXEC. Also originally SpawnProcess required callers to construct the socket pair, taking the child socket as an argument instead of both sockets. But this made calling code more complicated, so now SpawnProcess returns a parent socket instead of accepting a child socket.
Now since SpawnProcess is creating the socket pair it knows whether CLOEXEC is set so this code is not necessary. But I think it's good to use CLOEXEC to prevent accidental leaks so I did keep some of this and integrated it better with the earlier commit introducing SocketPair.
| //! errors in python unit tests. | ||
| std::string LogEscape(const kj::StringTree& string, size_t max_size); | ||
|
|
||
| using Stream = kj::Own<kj::AsyncIoStream>; |
There was a problem hiding this comment.
re: #274 (comment)
In 091f5e1 proxy, refactor: Change ConnectStream and ServeStream to accept stream objects: would it make sense to introduce this earlier, in 3c81cf2?
It could make sense but I feel it's a little clearer if the EventLoop code is using kj::Own<kj::AsyncIoStream> and kj::Own<kj::OutputStream> types directly and not tied to the mp::Stream type for external callers and meant to be more opaque.
But I did extend this commit to use Stream stream type in ConnectStream and ServeStream functions since these are external functions.
| // resources and shutting down cleanly. | ||
| try { | ||
| m_stream->shutdownWrite(); | ||
| } catch (const kj::Exception& e) { |
There was a problem hiding this comment.
re: #274 (comment)
In bfc2db7 proxy: Call shutdownWrite() in Connection destructor: maybe squash 28e4c7f into this?
Didn't squash these because I think 28e4c7f is good documentation of the macos REDUCE_EXPORTS bug and will be probably be useful to reference in the future (the bug can happen very easily just by throwing and catching exceptions.)
MSVC does not correctly apply SFINAE when the substitution failure
occurs in a default function argument that uses decltype of another
function parameter (MSVC error C2039). Instead of silently excluding
the overload, MSVC instantiates the function body and reports a hard
error, e.g.:
type-interface.h(62,56): error C2039: 'Calls': is not a member of
'capnp::List<mp::test::messages::Pair<capnp::Text,capnp::Text>,
capnp::Kind::STRUCT>::Builder'
The root cause is that MSVC evaluates default argument expressions
outside the SFINAE immediate context when they reference function
parameters via decltype. GCC and Clang treat this as a substitution
failure and silently exclude the overload, as the standard intends.
Replace all uses of the `* enable = nullptr` default-argument SFINAE
pattern with C++20 requires clauses, which are well-supported on all
three compilers and give cleaner constraint-violation diagnostics.
Add two concepts to util.h to reduce repetition across the type-*.h
headers:
FieldTypeIs<T, U> - T's .get() returns exactly type U
InterfaceField<T> - T's .get() returns a capnp interface type
(a type that exposes a nested ::Calls member)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This repo has introduced API changes to add Windows support to libmultiprocess (HANDLE-based IPC alongside the existing fd-based IPC). These changes require corresponding updates to Bitcoin Core, which are pending in bitcoin/bitcoin#35084. Until that PR merges, the Bitcoin Core CI jobs fail against master because Bitcoin Core has not yet been updated to use the new API. Switch the Bitcoin Core checkout in both jobs to use refs/pull/35084/merge so CI tests against the compatible version. A BITCOIN_CORE_REF env var is introduced at the top of the file; once (and keep the var in place for any future API compatibility cycles). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
On MSVC, std::terminate() does not print the exception message before calling abort()/fastfail, so exceptions thrown during mpgen execution appear as a bare 0xC0000409 exit code with no diagnostic output. Wrap main() in a try-catch to explicitly print the error to stderr and return 1 instead of crashing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Bitcoin Core linter rejects it: https://github.com/bitcoin/bitcoin/actions/runs/24568789956/job/71835997334?pr=32387
…bols Use target_compile_definitions on mpgen to expose CAPNP_EXECUTABLE, CAPNPC_CXX_EXECUTABLE (via $<TARGET_FILE:...> generator expressions on the CapnProto::capnp_tool and CapnProto::capnpc_cpp imported targets), and CAPNP_INCLUDE_DIRS (from the CAPNP_INCLUDE_DIRS variable set by find_package). gen.cpp uses these directly instead of constructing paths from capnp_PREFIX. Remove capnp_PREFIX from config.h.in as it is no longer needed there. Add compat fallbacks in compat_config.cmake to synthesize the tool imported targets and CAPNP_INCLUDE_DIRS from older variables when using an older CapnProto package. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ubuntu Noble's libcapnp-dev 1.0.1 cmake config file is installed under /usr/lib/x86_64-linux-gnu/cmake/CapnProto/ but its _IMPORT_PREFIX calculation goes up only 3 directory levels to /usr/lib instead of 4 levels to /usr, so IMPORTED_LOCATION for CapnProto::capnp_tool is set to /usr/lib/bin/capnp (non-existent) rather than /usr/bin/capnp. The previous compat_config.cmake fallback only fired when the target didn't exist at all (NOT TARGET), so it didn't catch this case where the target exists but has a wrong path. Add a validation pass that iterates over both tool targets after they are created (either by the package or by our own fallback). For each target, check whether any IMPORTED_LOCATION (config-specific or generic) resolves to an existing file. If none do, use find_program (with capnp_PREFIX/bin as a hint) to locate the actual binary and override all stored locations on that target. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Thanks for the reviews!
Rebased 7cb83a5 -> 68ed129 (pr/wins.1 -> pr/wins.2, compare) implementing review suggestions and fixing conflict with #279
Updated 68ed129 -> a43e5a8 (pr/wins.2 -> pr/wins.3, compare) to fix environ undeclared on macOS/BSD: add explicit declaration before posix_spawn() https://github.com/bitcoin-core/libmultiprocess/actions/runs/27974097183/job/82787313836
Updated a43e5a8 -> 6cc729f (pr/wins.3 -> pr/wins.4, compare) to fix clang-tidy readability-redundant-declaration on environ declaration https://github.com/bitcoin-core/libmultiprocess/actions/runs/27978804911/job/82803323635
| // exec, so the fd survives into the spawned process regardless of how | ||
| // the socket was created. | ||
| int flags = fcntl(fds[0], F_GETFD); | ||
| if (flags == -1) throw std::system_error(errno, std::system_category(), "fcntl F_GETFD"); |
There was a problem hiding this comment.
re: #274 (comment)
Was there something in particular that led you this (hardening) commit?
I don't remember exactly. It was definitely a bug of some kind. I think an earlier version of the code was using newTwoWayPipe which wraps a socket pair and sets CLOEXEC. Also originally SpawnProcess required callers to construct the socket pair, taking the child socket as an argument instead of both sockets. But this made calling code more complicated, so now SpawnProcess returns a parent socket instead of accepting a child socket.
Now since SpawnProcess is creating the socket pair it knows whether CLOEXEC is set so this code is not necessary. But I think it's good to use CLOEXEC to prevent accidental leaks so I did keep some of this and integrated it better with the earlier commit introducing SocketPair.
| //! errors in python unit tests. | ||
| std::string LogEscape(const kj::StringTree& string, size_t max_size); | ||
|
|
||
| using Stream = kj::Own<kj::AsyncIoStream>; |
There was a problem hiding this comment.
re: #274 (comment)
In 091f5e1 proxy, refactor: Change ConnectStream and ServeStream to accept stream objects: would it make sense to introduce this earlier, in 3c81cf2?
It could make sense but I feel it's a little clearer if the EventLoop code is using kj::Own<kj::AsyncIoStream> and kj::Own<kj::OutputStream> types directly and not tied to the mp::Stream type for external callers and meant to be more opaque.
But I did extend this commit to use Stream stream type in ConnectStream and ServeStream functions since these are external functions.
| // resources and shutting down cleanly. | ||
| try { | ||
| m_stream->shutdownWrite(); | ||
| } catch (const kj::Exception& e) { |
There was a problem hiding this comment.
re: #274 (comment)
In bfc2db7 proxy: Call shutdownWrite() in Connection destructor: maybe squash 28e4c7f into this?
Didn't squash these because I think 28e4c7f is good documentation of the macos REDUCE_EXPORTS bug and will be probably be useful to reference in the future (the bug can happen very easily just by throwing and catching exceptions.)
a43e5a8 to
6cc729f
Compare
| } | ||
| _exit(1); | ||
| ProcessId pid; | ||
| if (int err = posix_spawn(&pid, argv[0], nullptr, nullptr, argv.data(), ::environ)) { |
There was a problem hiding this comment.
In ddd3945 util, refactor: Do not fork() and exec() separately: not posix_spawnp?
|
|
||
| //! Create a socket pair that can be used to communicate within a process or | ||
| //! between parent and child processes. | ||
| std::array<SocketId, 2> SocketPair(); |
There was a problem hiding this comment.
In d9a5a68 util, refactor: Add SocketPair() and use it in SpawnProcess: it seems a bit odd to set FD_CLOEXEC and then unset it again for the child. The unset code also assume it was the only flag, since it sets 0. Maybe it's better to give this function a boolean argument for whether to set this flag on the child?
This PR implements API changes and fixes needed to allow libmultiprocess to work on nonunix platforms.
These changes were originally part of #231, which adds windows support, but were split out to allow windows and nonwindows changes to be reviewed separately.