Feature/moderation hallucination eval multilingual translation #1265

SnowMasaya · 2025-07-07T05:41:19Z

Description

feat: Add multilingual translation support for evaluation pipeline(moderation and hallucination)

📋 Summary

This PR introduces comprehensive multilingual translation capabilities to the NeMo-Guardrails evaluation system, enabling users to evaluate AI models across different languages and cultures. The implementation includes a flexible translation provider system, caching mechanisms, and seamless integration with existing evaluation workflows.

🚀 Key Features

🌍 Multilingual Translation System

Flexible Translation Providers: Support for both local (HuggingFace) and remote (DeepL, NVIDIA Riva) translation services
Translation Caching: Intelligent caching system to avoid redundant translations and improve performance
Configurable Backends: Easy configuration for different translation services via YAML configs
Progress Tracking: Real-time progress bars for translation operations

Dependencies Added

deepl (^1.22.0) - DeepL translation service integration
nvidia-riva-client (^2.21.0) - NVIDIA Riva translation service
torch (^2.7.1) - PyTorch for local translation models
transformers (^4.53.0) - HuggingFace transformers for local models
sentencepiece (^0.2.0) - Tokenization support

Architecture

Translation Provider System

nemoguardrails/evaluate/langproviders/
├── base.py # Base provider interface
├── local.py # HuggingFace-based local translator
├── remote.py # Remote service providers (DeepL, Riva)
├── configs/ # Translation service configurations
└── README.md # Provider documentation

Core Components

utils_translate.py: Core translation utilities and caching
Enhanced utils.py: Integration with dataset loading
Updated evaluation modules: Multilingual support in hallucination and moderation evaluation
CLI enhancements: Translation configuration support

🔄 Usage Examples

Basic Translation Configuration

# translation.yaml
provider: "deepl"
api_key: "${DEEPL_API_KEY}"
target_language: "ja"

CLI Usage

# Evaluate with translation
nemoguardrails evaluate hallucination \
  --dataset data/hallucination/sample.txt \
  --translation-config configs/translation.yaml

# Evaluate moderation with Japanese translation
nemoguardrails evaluate moderation \
  --dataset data/moderation/harmful.txt \
  --translation-config configs/japanese_translation.yaml

🧪 Testing

The implementation includes comprehensive test coverage:

Provider Tests: Unit tests for all translation providers
Integration Tests: End-to-end translation workflow testing
Cache Tests: Translation caching mechanism validation
CLI Tests: Command-line interface testing with translation support

Run tests with:

pytest tests/eval/translate/ -v

Configuration

Translation Service Configuration

# DeepL Configuration
provider: "deepl"
api_key: "${DEEPL_API_KEY}"
target_language: "ja"

# HuggingFace Local Configuration
provider: "huggingface"
model_name: "Helsinki-NLP/opus-mt-en-ja"
target_language: "ja"
device: "cpu"

# NVIDIA Riva Configuration
provider: "riva"
url: "https://riva-server:8000"
target_language: "ja"

Breaking Changes

None. This is a purely additive feature that maintains full backward compatibility.

📝 Documentation

Added comprehensive README for translation providers
Updated evaluation documentation with multilingual examples
Added configuration examples for all supported translation services

🎯 Impact

This enhancement significantly expands NeMo-Guardrails' evaluation capabilities, making it a truly global tool for AI safety and compliance evaluation across different languages and cultures.

Checklist

I've read the CONTRIBUTING guidelines.
I've updated the documentation if applicable.
I've added tests if applicable.
@mentions of the person or team responsible for reviewing proposed changes.

Copilot

Pull Request Overview

This PR adds multilingual translation support to the NeMo-Guardrails evaluation pipeline, introducing translation providers, caching, and integration into moderation and hallucination workflows.

Core translation utilities and caching mechanism added (utils_translate.py)
Integration of translation into dataset loading and evaluation modules (utils.py, evaluate_moderation.py, evaluate_hallucination.py)
New translation provider implementations (DeepL, Riva, local HF) and extensive test coverage

Reviewed Changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tests/eval/translate/	Added unit and integration tests for translation
nemoguardrails/evaluate/utils_translate.py	Core translation loading, caching, and dataset I/O
nemoguardrails/evaluate/utils.py	Extended dataset loading with translation support
nemoguardrails/evaluate/langproviders/	Implemented DeepL, Riva, and local HF translators
nemoguardrails/evaluate/evaluate_moderation.py	Added translation initialization and loading
nemoguardrails/evaluate/evaluate_hallucination.py	Added translation initialization and loading
nemoguardrails/evaluate/cli/evaluate.py	Exposed translation flags in CLI
pyproject.toml	Added translation-related dependencies

Comments suppressed due to low confidence (1)

pyproject.toml:103

[nitpick] The pyproject-toml dependency and translation libraries are now always installed; consider moving them into optional extras to avoid pulling heavy packages for users not using translation.

pyproject-toml = "^0.1.0"

Copilot · 2025-07-07T10:27:59Z

nemoguardrails/evaluate/utils_translate.py

+        # Generate cache file name based on service name
+        safe_service_name = service_name.replace("/", "_").replace("\\", "_").replace(":", "_")
+        self.cache_file = self.cache_dir / f"translations_{safe_service_name}.json"
+        print("cache_file: ", self.cache_file)


Replace the debugging print with a logging call or remove it to avoid unwanted console output in production.

Suggested change

print("cache_file: ", self.cache_file)

logging.debug(f"cache_file: {self.cache_file}")

Copilot · 2025-07-07T10:27:59Z

nemoguardrails/evaluate/utils_translate.py

+def get_translation_cache(service_name: str = "default") -> TranslationCache:
+    """Get or create translation cache instance for the specified service."""
+    _translation_caches = {}


The _translation_caches dictionary is created inside the function, so caching never persists across calls. Move _translation_caches to module scope to reuse cache instances.

Suggested change

def get_translation_cache(service_name: str = "default") -> TranslationCache:

"""Get or create translation cache instance for the specified service."""

_translation_caches = {}

# Global dictionary to store translation cache instances

_translation_caches = {}

def get_translation_cache(service_name: str = "default") -> TranslationCache:

"""Get or create translation cache instance for the specified service."""

Copilot · 2025-07-07T10:28:00Z

nemoguardrails/evaluate/utils_translate.py

+    langprovider_config = {
+        "langproviders": {language_service["model_type"]: language_service}
+    }
+    logging.debug(f"langauge provision service: {language_service['language']}")


There's a typo in the debug message: 'langauge' should be 'language'.

Suggested change

logging.debug(f"langauge provision service: {language_service['language']}")

logging.debug(f"language provision service: {language_service['language']}")

Copilot · 2025-07-07T10:28:00Z

nemoguardrails/evaluate/utils_translate.py

+        """Generate cache key from text and target language."""
+        # Create a hash of the text and target language
+        content = f"{text}:{target_lang}"
+        return content


[nitpick] Docstring suggests a hash is created but implementation concatenates text and language directly. Consider actually hashing long texts or updating documentation to match behavior.

Suggested change

return content

return hashlib.sha256(content.encode('utf-8')).hexdigest()

Agreed on this one.

OK, I change it.

github-actions · 2025-07-07T14:18:46Z

Documentation preview

https://nvidia.github.io/NeMo-Guardrails/review/pr-1265

trebedea

Provided several comments - most are just nice to have.
@SnowMasaya try to fix the ones you feel are most important - e.g. some duplicated code and documentation related.

@Pouyanpi can you check if you have any feedback related to tests and using poetry?

trebedea · 2025-07-09T11:00:34Z

nemoguardrails/evaluate/langproviders/README.md

+A local translation provider using Hugging Face models.
+
+**Supported Models:**
+- **M2M100**: Multilingual translation model (supports 100 languages)


Suggested change

- **M2M100**: Multilingual translation model (supports 100 languages)

- **M2M100**: Multilingual Many-to-Many translation models (supports 100 languages)

trebedea · 2025-07-09T11:03:16Z

nemoguardrails/evaluate/langproviders/README.md

+### Remote Providers
+
+#### DeeplTranslator
+High-quality translation service using the DeepL API.


Suggested change

High-quality translation service using the DeepL API.

High-quality translation service using the DeepL API. Requires DeepL API key for using it.

trebedea · 2025-07-09T11:04:34Z

nemoguardrails/evaluate/langproviders/README.md

+
+**Features:**
+- High-quality translations
+- Supports 29 languages


Suggested change

- Supports 29 languages

- Supports 29 languages (check official website for exact number)

trebedea · 2025-07-09T11:21:23Z

nemoguardrails/evaluate/langproviders/README.md

+- Commercial use available
+
+#### RivaTranslator
+Translation service using NVIDIA Riva.


Suggested change

Translation service using NVIDIA Riva.

Translation service using NVIDIA Riva. Requires an API key for using it.

trebedea · 2025-07-09T11:23:07Z

nemoguardrails/evaluate/langproviders/README.md

+
+## Configuration Parameters
+
+### Common Parameters


Can we highlight the required parameters?

trebedea · 2025-07-11T09:39:56Z

nemoguardrails/evaluate/utils_translate.py

+        """Generate cache key from text and target language."""
+        # Create a hash of the text and target language
+        content = f"{text}:{target_lang}"
+        return content


Agreed on this one.

trebedea · 2025-07-11T09:48:20Z

nemoguardrails/evaluate/utils_translate.py

+            if isinstance(item, dict):
+                # For JSON format, translate specific fields
+                translated_item = item.copy()
+                for field in ["answer", "question", "evidence"]:


Let's mention in the documentation that when translation JSONs only these fields are processed.

trebedea · 2025-07-11T09:50:49Z

nemoguardrails/evaluate/utils_translate.py

+                cached_translation = cache.get(original_text, translator.target_lang)
+                if cached_translation:
+                    translated_dataset.append(cached_translation)
+                else:
+                    # Translate and cache
+                    translated_text = translator._translate(original_text)
+                    translated_dataset.append(translated_text)
+                    cache.set(original_text, translator.target_lang, translated_text)


These lines are c&p-ed from above - shouldn't we wrap this in a helper method in the translator cache?

trebedea · 2025-07-11T09:57:19Z

nemoguardrails/evaluate/evaluate_hallucination.py

+            self.dataset = load_dataset(
+                self.dataset_path, translation_config=self.translation_config
+            )[: self.num_samples]
+        else:


We should print a warning if translation is enable , but the translator in None.

trebedea · 2025-07-11T09:58:26Z

nemoguardrails/evaluate/evaluate_moderation.py

+            try:
+                from nemoguardrails.evaluate.utils_translate import _load_langprovider
+
+                self.translator = _load_langprovider(self.translation_config)


This is done again in load_dataset . Can't we do it only once there?

OK. I try it.

…r translation code - Add YAML configurable endpoints to RivaTranslator (remote.py): * Support uri parameters from YAML config * Local mode: only uri can be overridden, others use defaults - Refactor translation utilities (utils_translate.py): * Extract _check_cache_and_translate() helper function * Eliminate duplicate cache checking and translation logic * Simplify load_dataset() function while preserving functionality * Reduce code duplication across different file formats - Update translation provider tests (base.py, local.py): * Fix test configurations to use list format for langproviders * Remove assertions on non-existent attributes * Update error handling for new validation logic * Ensure compatibility with configurable endpoint feature

- Fix test configurations to use list format for langproviders - Remove obsolete assertions on non-existent attributes - Add configurable endpoint tests to test_remote_translators.py - Update cache tests to work with new translation logic - Consolidate RivaTranslator tests in single file

- Add YAML examples for RivaTranslator endpoint configuration - Document local mode parameter behavior - Update existing examples for consistency Helps users configure RivaTranslator endpoints via YAML.

- README: remove hf_args - pyproject.toml: update dependency for translation

masayaOgushi added 4 commits July 7, 2025 14:22

feat: add multilingual translation dependencies

e0a54ea

feat: implement multilingual translation system

a7f05c4

feat: integrate multilingual support in evaluation pipeline

d4f53f2

test: add comprehensive translation system tests

2f6e7d8

Pouyanpi requested review from Pouyanpi, Copilot and trebedea July 7, 2025 10:25

Copilot AI reviewed Jul 7, 2025

View reviewed changes

masayaOgushi added 4 commits July 7, 2025 22:16

fix: add langchain-nvidia-ai-endpoint, remove pyproject-toml

42d6eaf

fix: copilot advice base

250acca

fix: remove extra test

6c09697

fix: multilingual translation dependencies

890f2a9

fix: make pre_commit related issues

2ee28d1

trebedea requested changes Jul 11, 2025

View reviewed changes

masayaOgushi added 8 commits July 15, 2025 11:13

fix: Remove redundant test files

03102ef

fix: add wanring, common process

0c9f1ef

docs: add configurable endpoint examples to langproviders README

d788905

- Add YAML examples for RivaTranslator endpoint configuration - Document local mode parameter behavior - Update existing examples for consistency Helps users configure RivaTranslator endpoints via YAML.

fix: add white space

0d82ce3

fix: None value case support

4650acb

Fix: README, pyptoject.toml

6b45738

- README: remove hf_args - pyproject.toml: update dependency for translation

	print("cache_file: ", self.cache_file)
	logging.debug(f"cache_file: {self.cache_file}")

	logging.debug(f"langauge provision service: {language_service['language']}")
	logging.debug(f"language provision service: {language_service['language']}")

	return content
	return hashlib.sha256(content.encode('utf-8')).hexdigest()

	- M2M100: Multilingual translation model (supports 100 languages)
	- M2M100: Multilingual Many-to-Many translation models (supports 100 languages)

	High-quality translation service using the DeepL API.
	High-quality translation service using the DeepL API. Requires DeepL API key for using it.

	- Supports 29 languages
	- Supports 29 languages (check official website for exact number)

	Translation service using NVIDIA Riva.
	Translation service using NVIDIA Riva. Requires an API key for using it.

Feature/moderation hallucination eval multilingual translation #1265

Are you sure you want to change the base?

Feature/moderation hallucination eval multilingual translation #1265

Uh oh!

Conversation

SnowMasaya commented Jul 7, 2025

Description

feat: Add multilingual translation support for evaluation pipeline(moderation and hallucination)

📋 Summary

🚀 Key Features

🌍 Multilingual Translation System

Dependencies Added

Architecture

Translation Provider System

Core Components

🔄 Usage Examples

Basic Translation Configuration

CLI Usage

🧪 Testing

Configuration

Translation Service Configuration

Breaking Changes

📝 Documentation

🎯 Impact

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 7, 2025

Documentation preview

Uh oh!

trebedea left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment