-
Notifications
You must be signed in to change notification settings - Fork 504
feat(streaming): support external async token generators #1286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Add ability to pass custom async token generators to `stream_async`, enabling integration with external LLMs or custom streaming sources. Update docs and add tests for output rails interaction and edge cases with external generators.
Documentation preview |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #1286 +/- ##
===========================================
+ Coverage 69.78% 69.82% +0.03%
===========================================
Files 161 161
Lines 16057 16061 +4
===========================================
+ Hits 11206 11214 +8
+ Misses 4851 4847 -4
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - a couple of changes and a question on supporting explain / generation options with the external generators.
- You want to use a different LLM provider that has its own streaming API | ||
- You have pre-generated responses that you want to stream through guardrails | ||
- You want to implement custom token generation logic | ||
- You want to test your output rails or its config in streaming mode wihtout relying on an LLM which generates stochastic outputs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- You want to test your output rails or its config in streaming mode wihtout relying on an LLM which generates stochastic outputs. | |
- You want to test your output rails or its config in streaming mode on predefined responses without actually relying on an actual LLM generation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected a mistype and also changed the wording. Is this the correct usage, on predefined / given assistant responses only with output rails?
app = LLMRails(config) | ||
|
||
async def my_token_generator() -> AsyncIterator[str]: | ||
# this could be from OpenAI, Anthropic, or any other source |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# this could be from OpenAI, Anthropic, or any other source | |
# This could be from OpenAI API, Anthropic API, or any other LLM API that already has a streaming token generator. Mocking the stream here, for a simple example. |
|
||
# use the external generator with guardrails | ||
async for chunk in app.stream_async( | ||
messages=history, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing history
- can we put a simple example?
|
||
When using an external generator: | ||
|
||
- The internal LLM generation is completely bypassed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The internal LLM generation is completely bypassed | |
- The internal LLM generation in the Guardrails runtime is completely bypassed, the LLM responses are given by the external generator |
When using an external generator: | ||
|
||
- The internal LLM generation is completely bypassed | ||
- Output rails are still applied if configured |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Output rails are still applied if configured | |
- Output rails are still applied to the LLM responses returned by the external generator, if configured |
) | ||
else: | ||
return generator | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this still be used with explain / generation options?
Add ability to pass custom async token generators to
stream_async
, enabling integration with external LLMs or custom streaming sources. Update docs and add tests for output rails interaction and edge cases with external generators.Note: