-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate possible scaling issues when there are lots of outbound HTTP calls #158
Comments
Internal TrackingCSEF 163249 |
I don't know if it's related but using durable-functions 1.4.1 we get a lot of errors during traffic peaks (scaling out) ie:
it's somehow related to the orchestrator |
Hi @gunzip! Also, just to make sure I understand, is this behavior only there for |
Hi @davidmrdavid, I meant that it must be related to I don't know if it's related only to 1.4.1 since that's the latest version we're currently using. This is the typical error pattern we get during traffic spikes: And the related error trace: |
@gunzip did you find any solution? I'm also facing exactly same problem. Durable starter is unable to initiate orchestrator function. Restarting the function app resolve the issue for couple of days but root cause is unknown. |
Hi @sangoya, could you please open a new issue describing your situation? Feel free to tag me on it and I'll investigate, thanks! |
Bundle Id: Microsoft.Azure.Functions.ExtensionBundle @davidmrdavid I've seen similar error at high traffic hours as well when I use durable client Full Exception : |
Hi @zkbule, please open a new issue with a description of these errors and, if possible, a minimal repro, and I can look into it, thanks! In it, please link this issue so we can keep related issues in association to one another. Thanks again! |
@gunzip I've got exactly same error. Did your issue resolved? |
Now that #152 is completed, we should rerun some scale tests to see if things have improved.
Pasting in some findings from tests I ran in late December:
Did a bit more testing on durable scaling using @brandonh-msft's sample app. Ran into a few issues (some old, some new) but I think I have a better idea of what's happening now. Sharing some initial findings:
TL;DR: I managed to get it working by making these changes:
maxSockets
on the HTTPS agent used by AxiosThe code for this is here: https://github.com/anthonychu/durablefunctions-javascript-scaletest/tree/20191229-refactor/javascript-axios
The main issue appears to be port exhaustion. There are 2 main types of errors:
EADDRINUSE
andETIMEDOUT
. By limiting themaxSockets
on the HTTPS agent, it seemed to have eliminated theEADDRINUSE
issue.However, there are still occasional
ETIMEDOUT
errors. They appear with the IP address of the API that the activity function is calling. I don't think the remote API that it is calling is down or timing out; I think it's a problem with the app itself. (Also seeETIMEDOUT
error below when the client calls the host)Using Bottleneck to limit outbound API calls to 4 per second appears to eliminate these
ETIMEDOUT
errors. I haven't tried increasing that number.I've also tried a couple of other versions of the code that didn't work:
Both of these encountered a different
ETIMEDOUT
error. The calls between the JS worker and the host via the frontend were failing when trying to invokeDurableOrchestrationClient.startNew()
.I'm not sure why this error didn't happen in the Axios implementation. I wonder if limiting the
maxSockets
and other settings in Axios was also affecting the instance of Axios used in the Durable Functions JS SDK.The text was updated successfully, but these errors were encountered: