-
Notifications
You must be signed in to change notification settings - Fork 538
feat(runnable_rails): complete rewrite of RunnableRails with full LangChain Runnable protocol support #1366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Add example configuration and documentation for using NVIDIA NeMoGuard NIMs, including content moderation, topic control, and jailbreak detection.
Update verbose logging to safely handle cases where log records may not have 'id' or 'task' attributes. Prevents potential AttributeError and improves robustness of LLM and prompt log output formatting.
Implements tool call extraction and passthrough functionality in LLMRails: - Add tool_calls_var context variable for storing LLM tool calls - Refactor llm_call utils to extract and store tool calls from responses - Support tool calls in both GenerationResponse and dict message formats - Add ToolMessage support for langchain message conversion - Comprehensive test coverage for tool calling integration
Documentation preview |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #1366 +/- ##
===========================================
- Coverage 71.62% 71.52% -0.10%
===========================================
Files 171 171
Lines 17020 17348 +328
===========================================
+ Hits 12191 12409 +218
- Misses 4829 4939 +110
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
34dba2b
to
7ae2f3a
Compare
… Runnable protocol support - Implement comprehensive async/sync invoke, batch, and streaming support - Add robust input/output transformation for all LangChain formats (ChatPromptValue, BaseMessage, dict, string) - Enhance chaining behavior with intelligent __or__ method handling RunnableBinding and complex chains - Add concurrency controls, error handling, and configurable blocking messages - Implement proper tool calling support with tool call passthrough - Add extensive test suite (14 test files, 2800+ lines) covering all major functionality including batching, streaming, composition, piping, and tool calling - Reorganize and expand test structure for better maintainability
7ae2f3a
to
68f438e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR completes a full rewrite of RunnableRails with comprehensive LangChain Runnable protocol support, providing async/sync operations, streaming capabilities, tool calling functionality, and enhanced input/output handling.
- Implemented complete LangChain Runnable protocol including invoke, batch, stream, and async variants
- Added tool calling support with proper context variable management across the pipeline
- Enhanced streaming functionality with proper chunk formatting and state management
Reviewed Changes
Copilot reviewed 26 out of 26 changed files in this pull request and generated 9 comments.
Show a summary per file
File | Description |
---|---|
tests/utils.py | Added streaming support to FakeLLM with _stream and _astream methods |
tests/test_tool_calls_context.py | New test file for tool calls context variable functionality |
tests/test_tool_calling_utils.py | New comprehensive tests for tool calling utility functions |
tests/test_tool_calling_passthrough_integration.py | Integration tests for tool calling in passthrough mode |
tests/runnable_rails/*.py | 14 new test files covering streaming, batching, composition, and tool calling |
nemoguardrails/rails/llm/options.py | Added tool_calls field to GenerationResponse model |
nemoguardrails/rails/llm/llmrails.py | Enhanced to extract and include tool calls in responses |
nemoguardrails/logging/verbose.py | Fixed potential AttributeError with missing record attributes |
nemoguardrails/integrations/langchain/runnable_rails.py | Complete rewrite implementing full Runnable protocol |
nemoguardrails/context.py | Added tool_calls_var context variable |
nemoguardrails/actions/llm/utils.py | Refactored llm_call with tool calling support and improved message handling |
examples/configs/nemoguards/* | New example configuration demonstrating NeMoGuard safety rails |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
def _stream(self, prompt, stop=None, run_manager=None, **kwargs): | ||
"""Stream the response by breaking it into tokens.""" | ||
if self.exception: | ||
raise self.exception | ||
|
||
current_i = self.i | ||
if current_i >= len(self.responses): | ||
raise RuntimeError( | ||
f"No responses available for query number {current_i + 1} in FakeLLM. " | ||
"Most likely, too many LLM calls are made or additional responses need to be provided." | ||
) | ||
|
||
response = self.responses[current_i] | ||
self.i = current_i + 1 | ||
|
||
if not self.streaming: | ||
# If streaming is disabled, return single response | ||
yield response | ||
return | ||
|
||
tokens = response.split() | ||
for i, token in enumerate(tokens): | ||
if i == 0: | ||
yield token | ||
else: | ||
yield " " + token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The streaming logic is duplicated between _stream and _astream methods. Consider extracting the token splitting logic into a helper method to reduce code duplication.
Copilot uses AI. Check for mistakes.
id_str = getattr(record, "id", None) | ||
id_display = f"({id_str[:5]}..)" if id_str else "" | ||
console.print(f"[cyan]LLM {title} {id_display}[/]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When id_str is None, the slice operation id_str[:5] will fail. The condition should check for None before slicing.
Copilot uses AI. Check for mistakes.
@@ -199,43 +668,281 @@ def invoke( | |||
# If more than one message is returned, we only take the first one. | |||
# This can happen for advanced use cases, e.g., when the LLM could predict | |||
# multiple function calls at the same time. We'll deal with these later. | |||
if isinstance(result, list): | |||
if isinstance(result, list) and len(result) > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] This silently discards multiple results when a list is returned. Consider logging when multiple results are discarded or adding configuration to control this behavior.
if isinstance(result, list) and len(result) > 0: | |
if isinstance(result, list) and len(result) > 0: | |
if len(result) > 1: | |
logger.warning( | |
f"Multiple results returned ({len(result)}). Only the first result will be used. " | |
"Consider updating your configuration or code to handle multiple results if needed." | |
) |
Copilot uses AI. Check for mistakes.
semaphore = asyncio.Semaphore(self.concurrency_limit) | ||
|
||
async def process_with_semaphore(input_item): | ||
async with semaphore: | ||
return await self.ainvoke(input_item, config, **kwargs) | ||
|
||
return await gather_with_concurrency( | ||
self.concurrency_limit, | ||
*[process_with_semaphore(input_item) for input_item in inputs], | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The concurrency_limit is used both for the semaphore and gather_with_concurrency, creating redundant concurrency control. Consider using only one mechanism or clarifying why both are needed.
semaphore = asyncio.Semaphore(self.concurrency_limit) | |
async def process_with_semaphore(input_item): | |
async with semaphore: | |
return await self.ainvoke(input_item, config, **kwargs) | |
return await gather_with_concurrency( | |
self.concurrency_limit, | |
*[process_with_semaphore(input_item) for input_item in inputs], | |
) | |
return await gather_with_concurrency( | |
self.concurrency_limit, | |
*[self.ainvoke(input_item, config, **kwargs) for input_item in inputs], | |
) |
Copilot uses AI. Check for mistakes.
@@ -175,15 +226,15 @@ def get_colang_history( | |||
history += f'user "{event["text"]}"\n' | |||
elif event["type"] == "UserIntent": | |||
if include_texts: | |||
history += f' {event["intent"]}\n' | |||
history += f" {event['intent']}\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Using single quotes inside f-strings with double quotes is inconsistent with the rest of the codebase style. Consider using consistent quote style throughout.
Copilot uses AI. Check for mistakes.
else: | ||
history += f'user {event["intent"]}\n' | ||
history += f"user {event['intent']}\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Using single quotes inside f-strings with double quotes is inconsistent with the rest of the codebase style. Consider using consistent quote style throughout.
Copilot uses AI. Check for mistakes.
elif event["type"] == "BotIntent": | ||
# If we have instructions, we add them before the bot message. | ||
# But we only do that for the last bot message. | ||
if "instructions" in event and idx == last_bot_intent_idx: | ||
history += f"# {event['instructions']}\n" | ||
history += f'bot {event["intent"]}\n' | ||
history += f"bot {event['intent']}\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Using single quotes inside f-strings with double quotes is inconsistent with the rest of the codebase style. Consider using consistent quote style throughout.
Copilot uses AI. Check for mistakes.
@@ -352,9 +403,9 @@ def flow_to_colang(flow: Union[dict, Flow]) -> str: | |||
if "_type" not in element: | |||
raise Exception("bla") | |||
if element["_type"] == "UserIntent": | |||
colang_flow += f'user {element["intent_name"]}\n' | |||
colang_flow += f"user {element['intent_name']}\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Using single quotes inside f-strings with double quotes is inconsistent with the rest of the codebase style. Consider using consistent quote style throughout.
Copilot uses AI. Check for mistakes.
elif element["_type"] == "run_action" and element["action_name"] == "utter": | ||
colang_flow += f'bot {element["action_params"]["value"]}\n' | ||
colang_flow += f"bot {element['action_params']['value']}\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Using single quotes inside f-strings with double quotes is inconsistent with the rest of the codebase style. Consider using consistent quote style throughout.
colang_flow += f"bot {element['action_params']['value']}\n" | |
colang_flow += f"bot {element["action_params"]["value"]}\n" |
Copilot uses AI. Check for mistakes.
Description
requires #1364 and #1343 and #1289