fix: prevent http connection race condition after restoring from Lambda SnapStart #569
+56
−7
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The issue:
After restoring from Lambda SnapStart, the
CLOCK_MONOTONIC
might be inconsistent with the snapshoted one.Sadly, in rust the
std::time::Instant
useCLOCK_MONOTONIC
to track time, andhyper-util
useInstant
to check if an established connection is expired. This will cause unstable behaviors. Related: hyperium/hyper#3810As the result, when the lambda function is restored from SnapStart, sometimes lambda web adapter will reuse an expired connection because lambda web adapter thought the connection is not expired. This will only happen on the first invoke after restoring from Lambda SnapStart.
If the app server has an http idle timeout set, there might be a race condition after restoring from Lambda SnapStart: the app server wants to disconnect but lambda web adapter is sending a new request using the expired connection, cause
IncompleteMessage
error.Description of changes:
This PR will check the configured client side idle timeout manually, using
std::time::SystemTime
instead ofstd::time::Instant
.If the timeout is reached, lambda web adapter will create new connections instead of reusing existing connections.
Besides, this PR add an environment variable
AWS_LWA_CLIENT_IDLE_TIMEOUT_MS
to make the client side timeout configurable.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.