-
Notifications
You must be signed in to change notification settings - Fork 41
Open
Description
Bug Report: Language Code Incompatibility with WhisperLiveKit
Summary
NLLW does not recognize zh as a valid language code for Chinese, causing integration issues with WhisperLiveKit.
Environment
- NLLW version: 0.1.4
- WhisperLiveKit: latest
- Python: 3.10+
Steps to Reproduce
-
Install WhisperLiveKit and NLLW:
pip install whisperlivekit nllw
-
Run the translation server:
python -m whisperlivekit.basic_server --lan zh --target-language eng_Latn
-
Observe the error.
Expected Behavior
The server should start successfully and translate Chinese speech to English.
Actual Behavior
ValueError: Unknown language identifier: zh
Root Cause
WhisperLiveKit passes the --lan parameter to both Whisper and NLLW:
- Whisper expects
zhfor Chinese - NLLW only accepts
zh-CNfor Chinese
This creates a conflict where no single value works for both systems:
| Parameter | Whisper | NLLW |
|---|---|---|
--lan zh |
✅ Works | ❌ Error |
--lan zh-CN |
❌ Error | ✅ Works |
Proposed Solution
Support multiple language codes per entry by allowing language_code to be a list:
# Before
{"name": "Chinese (Simplified)", "nllb": "zho_Hans", "language_code": "zh-CN"}
# After
{"name": "Chinese (Simplified)", "nllb": "zho_Hans", "language_code": ["zh-CN", "zh"]}This approach:
- Maintains backward compatibility (
zh-CNstill works) - Adds support for Whisper's
zhcode - Avoids duplicate entries in the language list
- Is extensible for other languages with similar issues
Workaround
Currently, there is no workaround when using NLLW translation mode with WhisperLiveKit for Chinese.
Users can only use Whisper's direct English translation mode:
python -m whisperlivekit.basic_server --lan zh --direct-english-translationRelated
- WhisperLiveKit: https://github.com/QuentinFuxa/WhisperLiveKit
- Whisper language codes: https://github.com/openai/whisper/blob/main/whisper/tokenizer.py
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels