-
Notifications
You must be signed in to change notification settings - Fork 514
Feature/moderation hallucination eval multilingual translation #1265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Feature/moderation hallucination eval multilingual translation #1265
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds multilingual translation support to the NeMo-Guardrails evaluation pipeline, introducing translation providers, caching, and integration into moderation and hallucination workflows.
- Core translation utilities and caching mechanism added (
utils_translate.py
) - Integration of translation into dataset loading and evaluation modules (
utils.py
,evaluate_moderation.py
,evaluate_hallucination.py
) - New translation provider implementations (DeepL, Riva, local HF) and extensive test coverage
Reviewed Changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 4 comments.
Show a summary per file
File | Description |
---|---|
tests/eval/translate/ | Added unit and integration tests for translation |
nemoguardrails/evaluate/utils_translate.py | Core translation loading, caching, and dataset I/O |
nemoguardrails/evaluate/utils.py | Extended dataset loading with translation support |
nemoguardrails/evaluate/langproviders/ | Implemented DeepL, Riva, and local HF translators |
nemoguardrails/evaluate/evaluate_moderation.py | Added translation initialization and loading |
nemoguardrails/evaluate/evaluate_hallucination.py | Added translation initialization and loading |
nemoguardrails/evaluate/cli/evaluate.py | Exposed translation flags in CLI |
pyproject.toml | Added translation-related dependencies |
Comments suppressed due to low confidence (1)
pyproject.toml:103
- [nitpick] The
pyproject-toml
dependency and translation libraries are now always installed; consider moving them into optional extras to avoid pulling heavy packages for users not using translation.
pyproject-toml = "^0.1.0"
# Generate cache file name based on service name | ||
safe_service_name = service_name.replace("/", "_").replace("\\", "_").replace(":", "_") | ||
self.cache_file = self.cache_dir / f"translations_{safe_service_name}.json" | ||
print("cache_file: ", self.cache_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace the debugging print with a logging call or remove it to avoid unwanted console output in production.
print("cache_file: ", self.cache_file) | |
logging.debug(f"cache_file: {self.cache_file}") |
Copilot uses AI. Check for mistakes.
def get_translation_cache(service_name: str = "default") -> TranslationCache: | ||
"""Get or create translation cache instance for the specified service.""" | ||
_translation_caches = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _translation_caches
dictionary is created inside the function, so caching never persists across calls. Move _translation_caches
to module scope to reuse cache instances.
def get_translation_cache(service_name: str = "default") -> TranslationCache: | |
"""Get or create translation cache instance for the specified service.""" | |
_translation_caches = {} | |
# Global dictionary to store translation cache instances | |
_translation_caches = {} | |
def get_translation_cache(service_name: str = "default") -> TranslationCache: | |
"""Get or create translation cache instance for the specified service.""" |
Copilot uses AI. Check for mistakes.
langprovider_config = { | ||
"langproviders": {language_service["model_type"]: language_service} | ||
} | ||
logging.debug(f"langauge provision service: {language_service['language']}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a typo in the debug message: 'langauge' should be 'language'.
logging.debug(f"langauge provision service: {language_service['language']}") | |
logging.debug(f"language provision service: {language_service['language']}") |
Copilot uses AI. Check for mistakes.
"""Generate cache key from text and target language.""" | ||
# Create a hash of the text and target language | ||
content = f"{text}:{target_lang}" | ||
return content |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Docstring suggests a hash is created but implementation concatenates text and language directly. Consider actually hashing long texts or updating documentation to match behavior.
return content | |
return hashlib.sha256(content.encode('utf-8')).hexdigest() |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed on this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I change it.
Documentation preview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provided several comments - most are just nice to have.
@SnowMasaya try to fix the ones you feel are most important - e.g. some duplicated code and documentation related.
@Pouyanpi can you check if you have any feedback related to tests and using poetry?
A local translation provider using Hugging Face models. | ||
|
||
**Supported Models:** | ||
- **M2M100**: Multilingual translation model (supports 100 languages) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- **M2M100**: Multilingual translation model (supports 100 languages) | |
- **M2M100**: Multilingual Many-to-Many translation models (supports 100 languages) |
### Remote Providers | ||
|
||
#### DeeplTranslator | ||
High-quality translation service using the DeepL API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High-quality translation service using the DeepL API. | |
High-quality translation service using the DeepL API. Requires DeepL API key for using it. |
|
||
**Features:** | ||
- High-quality translations | ||
- Supports 29 languages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Supports 29 languages | |
- Supports 29 languages (check official website for exact number) |
- Commercial use available | ||
|
||
#### RivaTranslator | ||
Translation service using NVIDIA Riva. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Translation service using NVIDIA Riva. | |
Translation service using NVIDIA Riva. Requires an API key for using it. |
|
||
## Configuration Parameters | ||
|
||
### Common Parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we highlight the required parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
"""Generate cache key from text and target language.""" | ||
# Create a hash of the text and target language | ||
content = f"{text}:{target_lang}" | ||
return content |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed on this one.
if isinstance(item, dict): | ||
# For JSON format, translate specific fields | ||
translated_item = item.copy() | ||
for field in ["answer", "question", "evidence"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's mention in the documentation that when translation JSONs only these fields are processed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
cached_translation = cache.get(original_text, translator.target_lang) | ||
if cached_translation: | ||
translated_dataset.append(cached_translation) | ||
else: | ||
# Translate and cache | ||
translated_text = translator._translate(original_text) | ||
translated_dataset.append(translated_text) | ||
cache.set(original_text, translator.target_lang, translated_text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines are c&p-ed from above - shouldn't we wrap this in a helper method in the translator cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
self.dataset = load_dataset( | ||
self.dataset_path, translation_config=self.translation_config | ||
)[: self.num_samples] | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should print a warning if translation is enable , but the translator in None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
try: | ||
from nemoguardrails.evaluate.utils_translate import _load_langprovider | ||
|
||
self.translator = _load_langprovider(self.translation_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done again in load_dataset
. Can't we do it only once there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I try it.
…r translation code - Add YAML configurable endpoints to RivaTranslator (remote.py): * Support uri parameters from YAML config * Local mode: only uri can be overridden, others use defaults - Refactor translation utilities (utils_translate.py): * Extract _check_cache_and_translate() helper function * Eliminate duplicate cache checking and translation logic * Simplify load_dataset() function while preserving functionality * Reduce code duplication across different file formats - Update translation provider tests (base.py, local.py): * Fix test configurations to use list format for langproviders * Remove assertions on non-existent attributes * Update error handling for new validation logic * Ensure compatibility with configurable endpoint feature
- Fix test configurations to use list format for langproviders - Remove obsolete assertions on non-existent attributes - Add configurable endpoint tests to test_remote_translators.py - Update cache tests to work with new translation logic - Consolidate RivaTranslator tests in single file
- Add YAML examples for RivaTranslator endpoint configuration - Document local mode parameter behavior - Update existing examples for consistency Helps users configure RivaTranslator endpoints via YAML.
- README: remove hf_args - pyproject.toml: update dependency for translation
Description
feat: Add multilingual translation support for evaluation pipeline(moderation and hallucination)
📋 Summary
This PR introduces comprehensive multilingual translation capabilities to the NeMo-Guardrails evaluation system, enabling users to evaluate AI models across different languages and cultures. The implementation includes a flexible translation provider system, caching mechanisms, and seamless integration with existing evaluation workflows.
🚀 Key Features
🌍 Multilingual Translation System
Dependencies Added
deepl
(^1.22.0) - DeepL translation service integrationnvidia-riva-client
(^2.21.0) - NVIDIA Riva translation servicetorch
(^2.7.1) - PyTorch for local translation modelstransformers
(^4.53.0) - HuggingFace transformers for local modelssentencepiece
(^0.2.0) - Tokenization supportArchitecture
Translation Provider System
Core Components
utils_translate.py
: Core translation utilities and cachingutils.py
: Integration with dataset loading🔄 Usage Examples
Basic Translation Configuration
CLI Usage
🧪 Testing
The implementation includes comprehensive test coverage:
Run tests with:
Configuration
Translation Service Configuration
Breaking Changes
None. This is a purely additive feature that maintains full backward compatibility.
📝 Documentation
🎯 Impact
This enhancement significantly expands NeMo-Guardrails' evaluation capabilities, making it a truly global tool for AI safety and compliance evaluation across different languages and cultures.
Checklist