Skip to content

Commit 92aa9ad

Browse files
authored
Merge pull request #216 from Kondasamy/feat/gemini-tts-language-support
Modify Gemini TTS to support multi-language and update docs
2 parents 343458e + 4c88fe4 commit 92aa9ad

File tree

2 files changed

+26
-6
lines changed

2 files changed

+26
-6
lines changed

podcastfy/tts/providers/gemini.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ def generate_audio(self, text: str, voice: str = "en-US-Journey-F",
3434
3535
Args:
3636
text (str): Text to convert to speech
37-
voice (str): Voice ID/name to use
37+
voice (str): Voice ID/name to use (format: "{language-code}-{name}-{gender}")
3838
model (str): Optional model override
3939
4040
Returns:
@@ -52,11 +52,12 @@ def generate_audio(self, text: str, voice: str = "en-US-Journey-F",
5252
text=text
5353
)
5454

55-
# Set voice parameters
55+
# Parse language code from voice ID (e.g., "en-IN" from "en-IN-Journey-D")
56+
language_code = "-".join(voice.split("-")[:2])
57+
5658
voice_params = texttospeech_v1beta1.VoiceSelectionParams(
57-
language_code="en-US",
59+
language_code=language_code,
5860
name=voice,
59-
ssml_gender=texttospeech_v1beta1.SsmlVoiceGender.FEMALE
6061
)
6162

6263
# Set audio config

usage/conversation_custom.md

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,24 @@ Podcastfy uses the default TTS configuration stored in [podcastfy/conversation_c
5858
- `model`: "tts-1-hd"
5959
- The OpenAI TTS model to use.
6060

61+
### Gemini Multi-Speaker TTS
62+
- `default_voices`:
63+
- `question`: "R"
64+
- Default voice for questions using Gemini Multi-Speaker TTS.
65+
- `answer`: "S"
66+
- Default voice for answers using Gemini Multi-Speaker TTS.
67+
- `model`: "en-US-Studio-MultiSpeaker"
68+
- Model to use for Gemini Multi-Speaker TTS.
69+
- `language`: "en-US"
70+
- Language of the voices.
71+
72+
### Gemini TTS
73+
- `default_voices`:
74+
- `question`: "en-US-Journey-D"
75+
- Default voice for questions using Gemini TTS.
76+
- `answer`: "en-US-Journey-O"
77+
- Default voice for answers using Gemini TTS.
78+
6179
### Edge TTS
6280

6381
- `default_voices`:
@@ -189,7 +207,8 @@ creativity: 0.7
189207
- The `output_language` defines both the language of the transcript and the language of the audio. Here's some relevant information:
190208
- Bottom-line: non-English transcripts are good enough but non-English audio is work-in-progress.
191209
- Transcripts are generated using Google's Gemini 1.5 Pro by default, which supports 100+ languages. Other user-defined models may or may not support non-English languages.
192-
- Audio is generated using `openai` (default), `elevenlabs`, `gemini`,or `edge` TTS models.
193-
- The `gemini`(Google) TTS model is English only.
210+
- Audio is generated using `openai` (default), `elevenlabs`, `gemini`, `geminimulti` or `edge` TTS models.
211+
- The `gemini`(Google) TTS model supports multiple languages and can be controlled by the `output_language` parameter and respective voice choices. Eg. `output_language="Tamil"`, `question="ta-IN-Standard-A"`, `answer="ta-IN-Standard-B"`. Refer to [Google Cloud Text-to-Speech documentation](https://cloud.google.com/text-to-speech/docs/voices) for more details.
212+
- The `geminimulti`(Google) TTS model supports only English voices. Also, not every Google Cloud project might have access to multi-speaker voices (Eg. `en-US-Studio-MultiSpeaker`). In case if you get - `"Multi-speaker voices are only available to allowlisted projects."`, you can fallback to `gemini` TTS model.
194213
- The `openai` TTS model supports multiple languages automatically, however non-English voices still present sub-par quality in my experience.
195214
- The `elevenlabs` TTS model has English voices by default, in order to use a non-English voice you would need to download a custom voice for the target language in your `elevenlabs` account settings and then set the `text_to_speech.elevenlabs.default_voices` parameters to the voice you want to use in the [config.yaml file](https://github.com/pedroslopez/podcastfy/blob/main/podcastfy/config.yaml) (this config file is only available in the source code of the project, not in the pip package, hence if you are using the pip package you will not be able to change the ElevenLabs voice). For more information on ElevenLabs voices, visit [ElevenLabs Voice Library](https://elevenlabs.io/voice-library)

0 commit comments

Comments
 (0)