feat: implement parallel streaming output rails execution #1263

Pouyanpi · 2025-07-04T12:54:33Z

This PR implements streaming for parallel output rails execution

Add _run_output_rails_in_parallel_streaming method to run output rails concurrently
Use asyncio tasks to execute multiple rails simultaneously during streaming
Implement early termination when any rail blocks content to optimize performance
Register the new action in the runtime dispatcher
Add proper error handling and cancellation for robust parallel execution
Avoid full flow state management issues that can occur with hide_prev_turn logic during streaming
Add comprehensive tests for parallel streaming functionality

requires #1259

Copilot

Pull Request Overview

This PR adds parallel streaming capabilities to the output rails execution path, improving throughput and allowing early termination when any rail blocks content.

Introduce a parallel flag in the OutputRails config and wire it through the LLM pipeline.
Implement a new _run_output_rails_in_parallel_streaming action that runs rails concurrently with early-stop on block, and register it at runtime.
Extend the test suite with comprehensive scenarios covering allowed flows, blocking, performance, error handling, and default behavior.

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
nemoguardrails/rails/llm/config.py	Added `parallel` flag to `OutputRails` config.
nemoguardrails/rails/llm/llmrails.py	Integrated parallel streaming logic: context prep, dispatch call, and error handling.
nemoguardrails/colang/v1_0/runtime/runtime.py	Implemented `_run_output_rails_in_parallel_streaming` action.
nemoguardrails/colang/runtime.py	Registered the new parallel streaming action in the dispatcher.
tests/test_parallel_streaming_output_rails.py	Added extensive tests covering parallel streaming, blocking, and performance.

Comments suppressed due to low confidence (1)

nemoguardrails/rails/llm/llmrails.py:334

After canceling pending tasks, await them (for example via await asyncio.gather(*tasks, return_exceptions=True)) to ensure proper cleanup and avoid Task was destroyed but it is pending! warnings.

Copilot · 2025-07-09T15:36:38Z

nemoguardrails/rails/llm/llmrails.py

+                context.update(context_message["content"])
+


Updating the context dict with a string (context_message["content"]) will iterate over its characters; instead assign the content under a specific key or ensure the updated value is a mapping.

Suggested change

context.update(context_message["content"])

if isinstance(context_message.get("content"), dict):

context.update(context_message["content"])

else:

log.warning("context_message['content'] is not a dictionary and will be ignored.")

Copilot · 2025-07-09T15:36:38Z

nemoguardrails/rails/llm/llmrails.py

@@ -1346,6 +1346,32 @@ def _get_latest_user_message(
                    return message
            return {}

+        def _prepare_context_for_parallel_rails(


[nitpick] Defining helper functions (_prepare_context_for_parallel_rails, _create_events_for_chunk) inside a method can make the code harder to navigate; consider extracting them to module-level helpers.

tgasser-nv

This looks good to me. Can you run a local "integration test" where you enable streaming and run parallel input and output rails through the 3 Nemoguard NVCF models to check the responses make sense and everything works correctly? Please paste in the outputs, latency measurements, etc into the PR comments

nemoguardrails/colang/v1_0/runtime/runtime.py

tgasser-nv · 2025-07-10T01:53:37Z

nemoguardrails/colang/v1_0/runtime/runtime.py

+
+                        # cancel remaining tasks
+                        for pending_task in tasks:
+                            if not pending_task.done():


If we try and cancel a task that's already done, will that throw an exception? A task could finish in between the pending_task.done() and pending_task.cancel() calls. We need to be able to cancel a task that's already done and no Exceptions to be thrown

If we try and cancel a task that's already done, will that throw an exception?

calling cancel() on a task that is already done or cancelled will not throw an exception. It simply returns False and has no effect. There is no need to guard cancel() with a try/except for this reason.

but this is a point good we might want to document.

A task could finish in between the pending_task.done() and pending_task.cancel() calls

it shows a race condition which is the expected behavior, but we can log a warning:

For example

rails A, B, C are running in parallel

rail A completes and says "blocked"

while we're processing A's result rail B also completes and says "blocked"

We cancel rail C (which is still pending)

We only report that A blocked, not B

based on our assumptions:

not a bug: multiple rails might detect violations, but we only need to know that at least one violation occurred

expected behavior: first violation detected stops the stream

correct outcome: content is blocked regardless of which rail detected it first

so for content modifying rails one should use sequential execution.
for read only validation rails parallel execution is fine.

Pouyanpi · 2025-07-10T10:49:06Z

PERFORMANCE COMPARISON SUMMARY

Config 1

config:

Stream first: False
Chunk size: 200
Context size: 50

Output Rails:

content safety check output $model=content_safety
self check output

Latency Improvements (First Chunk):
Test 1: 2.488s → 1.535s (38.3% faster, 1.62x speedup)
Test 2: 2.255s → 0.956s (57.6% faster, 2.36x speedup)
Test 3: 2.097s → 0.567s (73.0% faster, 3.70x speedup)

Total Duration Comparison:
Test 1: 2.489s (sequential) vs 2.117s (parallel)
Test 2: 2.256s (sequential) vs 1.525s (parallel)
Test 3: 2.099s (sequential) vs 1.097s (parallel)

Rail Execution Details:

Total LLM calls: 30
Rails executed:
- content_safety_check_output $model=content_safety: 12 call(s)
- general: 6 call(s)
- self_check_output: 12 call(s)

Config 2 (Lower chunk size higher output rails invocation)

config:

Streaming enabled: True
Stream first: False
Chunk size: 30
Context size: 5

Output Rails:

content safety check output $model=content_safety
self check output

Latency Improvements (First Chunk):
Test 1: 2.015s → 0.690s (65.7% faster, 2.92x speedup)
Test 2: 1.605s → 0.586s (63.5% faster, 2.74x speedup)
Test 3: 1.547s → 0.366s (76.3% faster, 4.23x speedup)

Total Duration Comparison:
Test 1: 3.925s (sequential) vs 2.202s (parallel)
Test 2: 4.507s (sequential) vs 1.162s (parallel)
Test 3: 1.547s (sequential) vs 0.940s (parallel)

Rail Execution Details:

Total LLM calls: 44
Rails executed:
- content_safety_check_output $model=content_safety: 19 call(s)
- general: 6 call(s)
- self_check_output: 19 call(s)

- Add _run_output_rails_in_parallel_streaming method to run output rails concurrently - Use asyncio tasks to execute multiple rails simultaneously during streaming - Implement early termination when any rail blocks content to optimize performance - Register the new action in the runtime dispatcher - Add proper error handling and cancellation for robust parallel execution - Avoid full flow state management issues that can occur with hide_prev_turn logic during streaming - Add comprehensive tests for parallel streaming functionality parallel rails accept flow params dict

codecov-commenter · 2025-07-10T12:33:49Z

Codecov Report

Attention: Patch coverage is 79.12088% with 19 lines in your changes missing coverage. Please review.

Project coverage is 69.83%. Comparing base (ef97795) to head (2c2469d).

Files with missing lines	Patch %	Lines
nemoguardrails/colang/v1_0/runtime/runtime.py	69.76%	13 Missing ⚠️
nemoguardrails/rails/llm/llmrails.py	86.66%	6 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #1263      +/-   ##
===========================================
+ Coverage    69.78%   69.83%   +0.04%     
===========================================
  Files          161      161              
  Lines        16057    16135      +78     
===========================================
+ Hits         11206    11268      +62     
- Misses        4851     4867      +16

Flag	Coverage Δ
python	`69.83% <79.12%> (+0.04%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
nemoguardrails/colang/runtime.py	`88.57% <100.00%> (+0.69%)`	⬆️
nemoguardrails/rails/llm/config.py	`90.32% <100.00%> (+0.01%)`	⬆️
nemoguardrails/rails/llm/llmrails.py	`88.88% <86.66%> (-0.04%)`	⬇️
nemoguardrails/colang/v1_0/runtime/runtime.py	`82.47% <69.76%> (-2.50%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

tgasser-nv

Thanks for adding the local performance test values, this looks great.

Pouyanpi self-assigned this Jul 4, 2025

Pouyanpi added this to the v0.15.0 milestone Jul 4, 2025

Pouyanpi added the enhancement New feature or request label Jul 4, 2025

Pouyanpi marked this pull request as ready for review July 7, 2025 12:05

Pouyanpi added the status: in review label Jul 7, 2025

Pouyanpi requested a review from Copilot July 7, 2025 12:05

This comment was marked as outdated.

Sign in to view

Pouyanpi requested a review from tgasser-nv July 9, 2025 15:18

Pouyanpi force-pushed the feat/parallel-output-rails-streaming branch from eb420fa to c33a0b7 Compare July 9, 2025 15:34

Pouyanpi requested a review from Copilot July 9, 2025 15:35

Copilot AI reviewed Jul 9, 2025

View reviewed changes

tgasser-nv reviewed Jul 10, 2025

View reviewed changes

Base automatically changed from fix/streaming-outputrails to develop July 10, 2025 11:17

Pouyanpi force-pushed the feat/parallel-output-rails-streaming branch from c33a0b7 to 4cb2762 Compare July 10, 2025 12:27

rename result to is_blocked

2c2469d

tgasser-nv approved these changes Jul 11, 2025

View reviewed changes

Pouyanpi merged commit bfe81f1 into develop Jul 21, 2025
17 checks passed

Pouyanpi deleted the feat/parallel-output-rails-streaming branch July 21, 2025 15:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: implement parallel streaming output rails execution #1263

feat: implement parallel streaming output rails execution #1263

Uh oh!

Pouyanpi commented Jul 4, 2025 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jul 9, 2025

Uh oh!

Copilot AI Jul 9, 2025

Uh oh!

tgasser-nv left a comment

Uh oh!

Uh oh!

tgasser-nv Jul 10, 2025

Uh oh!

Pouyanpi Jul 10, 2025

Uh oh!

Pouyanpi commented Jul 10, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Jul 10, 2025 •

edited

Loading

Uh oh!

tgasser-nv left a comment

Uh oh!

Uh oh!

Uh oh!

feat: implement parallel streaming output rails execution #1263

feat: implement parallel streaming output rails execution #1263

Uh oh!

Conversation

Pouyanpi commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

tgasser-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tgasser-nv Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

Pouyanpi Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

Pouyanpi commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PERFORMANCE COMPARISON SUMMARY

Config 1

Config 2 (Lower chunk size higher output rails invocation)

Uh oh!

codecov-commenter commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

tgasser-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Pouyanpi commented Jul 4, 2025 •

edited

Loading

Pouyanpi commented Jul 10, 2025 •

edited

Loading

codecov-commenter commented Jul 10, 2025 •

edited

Loading