feat(streaming): support external async token generators #1286

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

Pouyanpi wants to merge 1 commit into develop from feat/output-rails-streaming-only

Collaborator

Pouyanpi commented Jul 11, 2025

Add ability to pass custom async token generators to stream_async, enabling integration with external LLMs or custom streaming sources. Update docs and add tests for output rails interaction and edge cases with external generators.

Note:

This is equivalent to output rails only option in streaming mode.
Useful for testing streaming with output rails feature


          feat(streaming): support external async token generators

8c24085

Add ability to pass custom async token generators to `stream_async`, enabling
integration with external LLMs or custom streaming sources. Update docs and
add tests for output rails interaction and edge cases with external generators.

Pouyanpi requested review from trebedea and tgasser-nv

July 11, 2025 13:03

Pouyanpi added this to the v0.15.0 milestone

Pouyanpi added the enhancement label

Pouyanpi self-assigned this

github-actions bot commented Jul 11, 2025

Documentation preview

https://nvidia.github.io/NeMo-Guardrails/review/pr-1286

codecov-commenter commented Jul 11, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 69.82%. Comparing base (ef97795) to head (8c24085).

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #1286      +/-   ##
===========================================
+ Coverage    69.78%   69.82%   +0.03%     
===========================================
  Files          161      161              
  Lines        16057    16061       +4     
===========================================
+ Hits         11206    11214       +8     
+ Misses        4851     4847       -4

Flag	Coverage Δ
python	`69.82% <100.00%> (+0.03%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
nemoguardrails/rails/llm/llmrails.py	`89.65% <100.00%> (+0.72%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

trebedea approved these changes

View reviewed changes

Collaborator

trebedea left a comment

Looks good - a couple of changes and a question on supporting explain / generation options with the external generators.

docs/user-guides/advanced/streaming.md

+              - You want to use a different LLM provider that has its own streaming API
+              - You have pre-generated responses that you want to stream through guardrails
+              - You want to implement custom token generation logic
+              - You want to test your output rails or its config in streaming mode wihtout relying on an LLM which generates stochastic outputs.

Collaborator

trebedea Jul 14, 2025

Suggested change

      
            - You want to test your output rails or its config in streaming mode wihtout relying on an LLM which generates stochastic outputs.
          
            - You want to test your output rails or its config in streaming mode on predefined responses without actually relying on an actual LLM generation.

Collaborator

trebedea Jul 14, 2025

Corrected a mistype and also changed the wording. Is this the correct usage, on predefined / given assistant responses only with output rails?

docs/user-guides/advanced/streaming.md

+              app = LLMRails(config)
+              async def my_token_generator() -> AsyncIterator[str]:
+                  # this could be from OpenAI, Anthropic, or any other source

Collaborator

trebedea Jul 14, 2025

Suggested change

      
                # this could be from OpenAI, Anthropic, or any other source
          
                # This could be from OpenAI API, Anthropic API, or any other LLM API that already has a streaming token generator. Mocking the stream here, for a simple example.

docs/user-guides/advanced/streaming.md

+              # use the external generator with guardrails
+              async for chunk in app.stream_async(
+                  messages=history,

Collaborator

trebedea Jul 14, 2025

missing history - can we put a simple example?

docs/user-guides/advanced/streaming.md


		When using an external generator:

		- The internal LLM generation is completely bypassed

Collaborator

trebedea Jul 14, 2025

Suggested change

      
            - The internal LLM generation is completely bypassed
          
            - The internal LLM generation in the Guardrails runtime is completely bypassed, the LLM responses are given by the external generator

docs/user-guides/advanced/streaming.md

+              When using an external generator:
+              - The internal LLM generation is completely bypassed
+              - Output rails are still applied if configured

Collaborator

trebedea Jul 14, 2025

Suggested change

      
            - Output rails are still applied if configured
          
            - Output rails are still applied to the LLM responses returned by the external generator, if configured

nemoguardrails/rails/llm/llmrails.py

+                              )
+                          else:
+                              return generator

Collaborator

trebedea Jul 14, 2025

Can this still be used with explain / generation options?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels