Better LLM retry behavior #6557

rbren · 2025-01-30T22:21:43Z

End-user friendly description of the problem this fixes or functionality that this introduces

Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below
no changelog

Give a summary of what the PR does, explaining any non-trivial design decisions

The LLM is retrying a lot of unrecoverable exceptions, which makes it look like the app is just stuck.

The current configuration also waits a total of 11 minutes (!) for a good response, not including the request time, which can add ~5-8 minutes to that total. So the app looks VERY stuck.

We could potentially move this into a config if these errors are common enough that eval needs them. CC @xingyaoww

Link of any specific issues this addresses

To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:185288b-nikolaik   --name openhands-app-185288b   docker.all-hands.dev/all-hands-ai/openhands:185288b

enyst · 2025-01-30T22:37:25Z

openhands/llm/llm.py

    RateLimitError,
-    ServiceUnavailableError,


503 is a transitory error, we could probably keep it?

Hmm. It's transitory but also unexpected...

I'm open to it but I lean towards telling the user their LLM is flaking out rather than OpenHands looking like it's slow

I kinda agree with you actually. We always had a problem in understanding our retry settings, because it's a bit weird to figure out a sensible default for "unexpected stuff happened".

And now we do allow the user to continue normally after reporting the error.

eval is the exception, I'd love to hear from Xingyao on that.

enyst · 2025-01-30T22:42:49Z

There are some issues on litellm on this, the exceptions as defined are mixing permanent and transitory exceptions from the provider. We have some weird code due to that. I would agree that cleaning them and start again is reasonable. 😅

enyst · 2025-01-30T23:17:34Z

Small related detail, there's a try/except due to retries in llm.py, which is unnecessary even in main, and more so now. We might as well clean that out:

Chore: clean up LLM (prompt caching, supports fn calling), leftover renames #6095

openhands/llm/llm.py

enyst · 2025-02-13T20:47:25Z

Please see also a small follow-up here:

[rbren no-retries] add user-friendly messages #6576

rbren · 2025-02-14T15:14:14Z

Thanks @enyst! Any lingering issues here?

enyst

I think it would be great if @xingyaoww can take a look, because it's possible that the removed exceptions are happening.

Up to you.

rbren added 2 commits January 30, 2025 17:20

stop retrying on all exceptions

cc0bf44

fix retry behavior

4019cfe

rbren changed the title ~~stop retrying on all exceptions~~ Better LLM retry behavior Jan 30, 2025

enyst reviewed Jan 30, 2025

View reviewed changes

fix tests

c46665c

rbren and others added 3 commits January 31, 2025 11:46

fix test

ee06747

Merge branch 'main' into rb/no-retry-llm

6a829b7

Merge branch 'main' into rb/no-retry-llm

e94c57d

enyst reviewed Feb 3, 2025

View reviewed changes

openhands/llm/llm.py Outdated Show resolved Hide resolved

enyst and others added 2 commits February 4, 2025 00:16

Update openhands/llm/llm.py

e81b312

Merge branch 'main' into rb/no-retry-llm

11937d0

[rbren no-retries] add user-friendly messages (#6576)

185288b

enyst approved these changes Feb 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better LLM retry behavior #6557

Better LLM retry behavior #6557

rbren commented Jan 30, 2025 •

edited by github-actions bot

Loading

enyst Jan 30, 2025

rbren Jan 30, 2025

enyst Jan 30, 2025

enyst commented Jan 30, 2025

enyst commented Jan 30, 2025

enyst commented Feb 13, 2025

rbren commented Feb 14, 2025

enyst left a comment

Better LLM retry behavior #6557

Are you sure you want to change the base?

Better LLM retry behavior #6557

Conversation

rbren commented Jan 30, 2025 • edited by github-actions bot Loading

enyst Jan 30, 2025

Choose a reason for hiding this comment

rbren Jan 30, 2025

Choose a reason for hiding this comment

enyst Jan 30, 2025

Choose a reason for hiding this comment

enyst commented Jan 30, 2025

enyst commented Jan 30, 2025

enyst commented Feb 13, 2025

rbren commented Feb 14, 2025

enyst left a comment

Choose a reason for hiding this comment

rbren commented Jan 30, 2025 •

edited by github-actions bot

Loading