-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test(daemon): Improve cross-platform sockets using pid suffix #1620
base: master
Are you sure you want to change the base?
test(daemon): Improve cross-platform sockets using pid suffix #1620
Conversation
Unsure if I should rebase (👍) or update (👎) before landing from my end to avoid |
1630693
to
1d5748f
Compare
1d5748f
to
6c379f4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
? raw`\\?\pipe\endo-test.sock` | ||
: path.join(dirname, 'tmp/endo.sock'), | ||
? raw`\\?\pipe\endo-test-${process.pid}.sock` | ||
: path.join(os.tmpdir(), `endo-test-${process.pid}.sock`), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using an OS tmpdir makes it harder to inspect the logs when the tests fail. If we go this route, there at least needs to be a t.log
so the developer can rendezvous with the dump, but I’m inclined to keep it local. There’s a corresponding entry in .gitignore
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this just the socket path, and nothing else? Or is there some bizarre derivation of the log path from the socket path I'm missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. You are correct. This only affects the location of the socket or named pipe, not the log dump.
@@ -29,8 +30,8 @@ const locator = { | |||
cachePath: path.join(dirname, 'tmp/cache'), | |||
sockPath: | |||
process.platform === 'win32' | |||
? raw`\\?\pipe\endo-test.sock` | |||
: path.join(dirname, 'tmp/endo.sock'), | |||
? raw`\\?\pipe\endo-test-${process.pid}.sock` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m curious whether the problem can be better addressed by ensuring that Endo stomps over the existing named pipe, since restarting without a graceful shutdown will affect end users as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, if handling the named pipe gracefully on Windows is not in the cards, we may need to adjust some other behavior around @endo/where
for discovering the rendezvous location of the Endo named pipe by looking first at the PID file to infer the PID in the named pipe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m interested in a precise reproduction so we can tell whether this is a problem with the design of the tests, or a broader problem that will affect normal usage.
I the problem affects normal usage, I would like to consider an alternate approach of using a retry loop with backoff around the socket listener, since it’s likely that we’re seeing this problem only because of a delay between when the daemon stops and when its socket gets cleaned up.
The retry loop would be around the attempt to listen, with bounded back-off and full jitter. That is, the first attempt should be fast and it should settle into a reasonable mean polling frequency, but also randomize to avoid synchronizing with other loops.
Some care should be taken since concurrent attempts to start the daemon are likely to occur in practice. In the CLI, every command that uses a daemon client will attempt to start the daemon if it’s not already running. These should not thrash. One should win the race and both should connect to the same daemon instance.
From the perspective I am coming from, I saw one test for In that sense, I'd like to propose a direction where there are tests for the specific moving parts, to ensure there is enough clarity on what is going on when something breaks, so that the enduser can attempt to figure out the problem or at least know what to report in an issue. I was working with limited insights and bandwidth, and did not explicitly determine the causes of the failures. I explored different workarounds at the different times and those suggest that the test is starting in an environment that did not meet the preconditions where it could create the necessary socket. The most stable workaround of course where It is reasonable to pretest any assumptions about the platform or the environment ahead of time. Otherwise, the runtime throwing an error when the only parameter is a With all the moving parts involved, assumptions will feel like insights, until there is insight that they were not 😅 Meanwhile, I feel strongly inclined that passing a Ultimately, I would rather we figure out a direction for a PR that would allow better diagnostics and concrete insights, to better separate implementation, platform and environment concerns. |
Rerunning
yarn test
can fail on some systems when a previous run terminates without properly cleaning up thedaemon/tmp/endo.sock
.Within the scope of testing and separate from the implementation, the issue can be viewed as a naming conflict, which can be addressed by
PID
-suffixing the.sock
path.On non-
win32
platforms, the.sock
is currently being placed inside the worktree, a discrepancy which further complicates this issue, and that can be addressed by moving the.sock
outside the worktree, specificallyos.tmpdir()
given the purpose.Additional Details
This is particularly a problem on macOS which I encounter with different versions of node, including
14
and19
.The less-than-ideal workaround for now is to use
docker
with two separate-v
mounts to ensure proper.sock
handling: