feat(runnable_rails): complete rewrite of RunnableRails with full LangChain Runnable protocol support #1366

Pouyanpi · 2025-09-02T16:06:49Z

Description

Complete rewrite of RunnableRails implementing full LangChain Runnable protocol
Added comprehensive async/sync support for invoke, batch, and streaming operations
Enhanced input/output handling for all LangChain message formats and chaining scenarios
Extensive test suite with 14 test files covering batching, streaming, tool calling, and composition

Add example configuration and documentation for using NVIDIA NeMoGuard NIMs, including content moderation, topic control, and jailbreak detection.

Update verbose logging to safely handle cases where log records may not have 'id' or 'task' attributes. Prevents potential AttributeError and improves robustness of LLM and prompt log output formatting.

Implements tool call extraction and passthrough functionality in LLMRails: - Add tool_calls_var context variable for storing LLM tool calls - Refactor llm_call utils to extract and store tool calls from responses - Support tool calls in both GenerationResponse and dict message formats - Add ToolMessage support for langchain message conversion - Comprehensive test coverage for tool calling integration

github-actions · 2025-09-02T16:08:08Z

Documentation preview

https://nvidia.github.io/NeMo-Guardrails/review/pr-1366

codecov-commenter · 2025-09-02T16:13:12Z

Codecov Report

❌ Patch coverage is 73.19347% with 115 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.52%. Comparing base (960e0b8) to head (68f438e).
⚠️ Report is 6 commits behind head on develop.

Files with missing lines	Patch %	Lines
...uardrails/integrations/langchain/runnable_rails.py	68.33%	114 Missing ⚠️
nemoguardrails/actions/llm/utils.py	98.18%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #1366      +/-   ##
===========================================
- Coverage    71.62%   71.52%   -0.10%     
===========================================
  Files          171      171              
  Lines        17020    17348     +328     
===========================================
+ Hits         12191    12409     +218     
- Misses        4829     4939     +110

Flag	Coverage Δ
python	`71.52% <73.19%> (-0.10%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
nemoguardrails/context.py	`100.00% <100.00%> (ø)`
nemoguardrails/logging/verbose.py	`90.90% <100.00%> (+0.39%)`	⬆️
nemoguardrails/rails/llm/llmrails.py	`90.79% <100.00%> (+0.06%)`	⬆️
nemoguardrails/rails/llm/options.py	`63.39% <100.00%> (+0.32%)`	⬆️
nemoguardrails/actions/llm/utils.py	`82.31% <98.18%> (+2.31%)`	⬆️
...uardrails/integrations/langchain/runnable_rails.py	`71.18% <68.33%> (-18.90%)`	⬇️

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

… Runnable protocol support - Implement comprehensive async/sync invoke, batch, and streaming support - Add robust input/output transformation for all LangChain formats (ChatPromptValue, BaseMessage, dict, string) - Enhance chaining behavior with intelligent __or__ method handling RunnableBinding and complex chains - Add concurrency controls, error handling, and configurable blocking messages - Implement proper tool calling support with tool call passthrough - Add extensive test suite (14 test files, 2800+ lines) covering all major functionality including batching, streaming, composition, piping, and tool calling - Reorganize and expand test structure for better maintainability

Copilot

Pull Request Overview

This PR completes a full rewrite of RunnableRails with comprehensive LangChain Runnable protocol support, providing async/sync operations, streaming capabilities, tool calling functionality, and enhanced input/output handling.

Implemented complete LangChain Runnable protocol including invoke, batch, stream, and async variants
Added tool calling support with proper context variable management across the pipeline
Enhanced streaming functionality with proper chunk formatting and state management

Reviewed Changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
tests/utils.py	Added streaming support to FakeLLM with _stream and _astream methods
tests/test_tool_calls_context.py	New test file for tool calls context variable functionality
tests/test_tool_calling_utils.py	New comprehensive tests for tool calling utility functions
tests/test_tool_calling_passthrough_integration.py	Integration tests for tool calling in passthrough mode
tests/runnable_rails/*.py	14 new test files covering streaming, batching, composition, and tool calling
nemoguardrails/rails/llm/options.py	Added tool_calls field to GenerationResponse model
nemoguardrails/rails/llm/llmrails.py	Enhanced to extract and include tool calls in responses
nemoguardrails/logging/verbose.py	Fixed potential AttributeError with missing record attributes
nemoguardrails/integrations/langchain/runnable_rails.py	Complete rewrite implementing full Runnable protocol
nemoguardrails/context.py	Added tool_calls_var context variable
nemoguardrails/actions/llm/utils.py	Refactored llm_call with tool calling support and improved message handling
examples/configs/nemoguards/*	New example configuration demonstrating NeMoGuard safety rails

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-09T09:45:00Z

tests/utils.py

+    def _stream(self, prompt, stop=None, run_manager=None, **kwargs):
+        """Stream the response by breaking it into tokens."""
+        if self.exception:
+            raise self.exception
+
+        current_i = self.i
+        if current_i >= len(self.responses):
+            raise RuntimeError(
+                f"No responses available for query number {current_i + 1} in FakeLLM. "
+                "Most likely, too many LLM calls are made or additional responses need to be provided."
+            )
+
+        response = self.responses[current_i]
+        self.i = current_i + 1
+
+        if not self.streaming:
+            # If streaming is disabled, return single response
+            yield response
+            return
+
+        tokens = response.split()
+        for i, token in enumerate(tokens):
+            if i == 0:
+                yield token
+            else:
+                yield " " + token


[nitpick] The streaming logic is duplicated between _stream and _astream methods. Consider extracting the token splitting logic into a helper method to reduce code duplication.

Copilot · 2025-09-09T09:45:01Z

nemoguardrails/logging/verbose.py

+                    id_str = getattr(record, "id", None)
+                    id_display = f"({id_str[:5]}..)" if id_str else ""
+                    console.print(f"[cyan]LLM {title} {id_display}[/]")


When id_str is None, the slice operation id_str[:5] will fail. The condition should check for None before slicing.

Copilot · 2025-09-09T09:45:01Z

nemoguardrails/integrations/langchain/runnable_rails.py

@@ -199,43 +668,281 @@ def invoke(
        # If more than one message is returned, we only take the first one.
        # This can happen for advanced use cases, e.g., when the LLM could predict
        # multiple function calls at the same time. We'll deal with these later.
-        if isinstance(result, list):
+        if isinstance(result, list) and len(result) > 0:


[nitpick] This silently discards multiple results when a list is returned. Consider logging when multiple results are discarded or adding configuration to control this behavior.

Suggested change

if isinstance(result, list) and len(result) > 0:

if isinstance(result, list) and len(result) > 0:

if len(result) > 1:

logger.warning(

f"Multiple results returned ({len(result)}). Only the first result will be used. "

"Consider updating your configuration or code to handle multiple results if needed."

)

Copilot · 2025-09-09T09:45:01Z

nemoguardrails/integrations/langchain/runnable_rails.py

+        semaphore = asyncio.Semaphore(self.concurrency_limit)
+
+        async def process_with_semaphore(input_item):
+            async with semaphore:
+                return await self.ainvoke(input_item, config, **kwargs)
+
+        return await gather_with_concurrency(
+            self.concurrency_limit,
+            *[process_with_semaphore(input_item) for input_item in inputs],
+        )
+


[nitpick] The concurrency_limit is used both for the semaphore and gather_with_concurrency, creating redundant concurrency control. Consider using only one mechanism or clarifying why both are needed.

Suggested change

semaphore = asyncio.Semaphore(self.concurrency_limit)

async def process_with_semaphore(input_item):

async with semaphore:

return await self.ainvoke(input_item, config, **kwargs)

return await gather_with_concurrency(

self.concurrency_limit,

*[process_with_semaphore(input_item) for input_item in inputs],

)

return await gather_with_concurrency(

self.concurrency_limit,

*[self.ainvoke(input_item, config, **kwargs) for input_item in inputs],

)

Copilot · 2025-09-09T09:45:02Z