Skip to content

feat(streaming): support external async token generators #1286

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

Pouyanpi
Copy link
Collaborator

Add ability to pass custom async token generators to stream_async, enabling integration with external LLMs or custom streaming sources. Update docs and add tests for output rails interaction and edge cases with external generators.

Note:

  • This is equivalent to output rails only option in streaming mode.
  • Useful for testing streaming with output rails feature

Add ability to pass custom async token generators to `stream_async`, enabling
integration with external LLMs or custom streaming sources. Update docs and
add tests for output rails interaction and edge cases with external generators.
@Pouyanpi Pouyanpi requested review from trebedea and tgasser-nv July 11, 2025 13:03
@Pouyanpi Pouyanpi added this to the v0.15.0 milestone Jul 11, 2025
@Pouyanpi Pouyanpi added the enhancement New feature or request label Jul 11, 2025
@Pouyanpi Pouyanpi self-assigned this Jul 11, 2025
Copy link

Documentation preview

https://nvidia.github.io/NeMo-Guardrails/review/pr-1286

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 69.82%. Comparing base (ef97795) to head (8c24085).

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1286      +/-   ##
===========================================
+ Coverage    69.78%   69.82%   +0.03%     
===========================================
  Files          161      161              
  Lines        16057    16061       +4     
===========================================
+ Hits         11206    11214       +8     
+ Misses        4851     4847       -4     
Flag Coverage Δ
python 69.82% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
nemoguardrails/rails/llm/llmrails.py 89.65% <100.00%> (+0.72%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Collaborator

@trebedea trebedea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - a couple of changes and a question on supporting explain / generation options with the external generators.

- You want to use a different LLM provider that has its own streaming API
- You have pre-generated responses that you want to stream through guardrails
- You want to implement custom token generation logic
- You want to test your output rails or its config in streaming mode wihtout relying on an LLM which generates stochastic outputs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- You want to test your output rails or its config in streaming mode wihtout relying on an LLM which generates stochastic outputs.
- You want to test your output rails or its config in streaming mode on predefined responses without actually relying on an actual LLM generation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected a mistype and also changed the wording. Is this the correct usage, on predefined / given assistant responses only with output rails?

app = LLMRails(config)

async def my_token_generator() -> AsyncIterator[str]:
# this could be from OpenAI, Anthropic, or any other source
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# this could be from OpenAI, Anthropic, or any other source
# This could be from OpenAI API, Anthropic API, or any other LLM API that already has a streaming token generator. Mocking the stream here, for a simple example.


# use the external generator with guardrails
async for chunk in app.stream_async(
messages=history,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing history - can we put a simple example?


When using an external generator:

- The internal LLM generation is completely bypassed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The internal LLM generation is completely bypassed
- The internal LLM generation in the Guardrails runtime is completely bypassed, the LLM responses are given by the external generator

When using an external generator:

- The internal LLM generation is completely bypassed
- Output rails are still applied if configured
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Output rails are still applied if configured
- Output rails are still applied to the LLM responses returned by the external generator, if configured

)
else:
return generator

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this still be used with explain / generation options?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants