-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gunicorn worker processes killed by main process under gramine #1798
Comments
@jonathan-sha Thanks for a great description of the problem! This was already discussed in #1134, so I'm not sure if we should consider this issue a duplicate and move your descriptions as a new comment to #1134. I'll keep it like this for now, as a separate issue.
Sure, feel free to create a PR, and we'll take a look at it (depending on the complexity of this PR, this may take a while before we do the reviews). Make sure to properly synchronize -- you'll need to take
Yes, I agree. Btw, I don't like the approach of "periodic refresh" because this will introduce more complexity (who performs this periodic refresh? what should be the refresh period? how does this helper thread learn which inodes to refresh?). Also, Currently we have a "use cached values" implementation of
We can add a call to We'll need to add atime/mtime/ctime to Line 465 in d82e885
We'll also need to update this: gramine/pal/src/host/linux-sgx/pal_files.c Line 423 in d82e885
In particular, we'll need to add more fields in this helper func: gramine/pal/src/host/linux-common/file_info.c Lines 25 to 30 in d82e885
NOTES:
|
Since this is a duplicate of #1134 (and this issue is clearly mentioned in that issue), I'll close this one. |
Description of the problem
gunicorn main process will kill the worker processes under gramine when timeout elapses.
gunicorn uses the following mechanism:
os.fork()
)chmod(tmp_file_fd, 0)
andchmod(tmp_file_fd, 1)
in round-robin between every request and while waiting for new requests - this is supposed to update the ctime of the tmp file.Note - the problem was discussed here:
#1134
gramineproject/examples#80
There are a few issues here -
Regarding (2), I have a merge request I can open in case we want to fix this.
Regarding (1), I think the best way to solve this is to use an eager "slow-path" stat for allowed files. Or as @pwmarcz suggested, we can periodically refresh the inode stat for allowed files. Otherwise, we can add a manifest option to explicitly "force stat refresh" on selected uris, if the app needs it. Though I'm not sure why we won't want this by default - if the app is calling stat, then it shouldn't get "stale" data.
Steps to reproduce
I used a gsc container running a gunicorn app:
The container is run by docker compose with the following command:
This issue should be reproducible whenever running gunicorn under gramine with a
--timeout
option.Expected results
The master process should correctly detect that the workers are alive and well
Actual results
The master process calls stat on the tmp_files, reads the returned ctime and checks if timeout has elapsed. It then kills the worker process, even though it is alive and well and handling requests.
Gramine commit hash
v1.6
The text was updated successfully, but these errors were encountered: