Currently this library uses the blocking litellm.completion method in inference.py even though the library methods are themselves async. The LLM inference file should be updated to be asynchronous as well, unless there are other blockers I'm not aware of.