Skip to content

Conversation

@derekmeegan
Copy link
Contributor

why

LiteLLM's synchronous completion() method was blocking the event loop in async handlers, preventing concurrent execution of multiple LLM calls. This caused performance degradation when multiple operations needed to run in parallel.

what changed

  • Converted LLMClient.create_response() from sync to async method using litellm.acompletion()
  • Updated inference.observe() and inference.extract() functions to be async
  • Modified all handlers (ObserveHandler, ExtractHandler) to await async inference calls
  • Updated mock LLM client's create_response() method to be async for test compatibility

test plan

  • run CI tests and inspect that things are working as expected

@derekmeegan derekmeegan merged commit 3bcdd05 into main Sep 25, 2025
13 checks passed
@derekmeegan derekmeegan deleted the derek/make_litellm_async branch September 25, 2025 23:23
@github-actions github-actions bot mentioned this pull request Sep 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants