-
-
Notifications
You must be signed in to change notification settings - Fork 3
Add blog post on Asturian TTS voice cloning methodology #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
JarbasAl
wants to merge
3
commits into
master
Choose a base branch
from
ast
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,76 @@ | ||
| --- | ||
| title: "*Cloning Voices for Endangered Languages: Building a Text-to-Speech Model for Asturian" | ||
| excerpt: "Have you ever wanted to hear a computer speak in a voice you love, or in a language that's not commonly supported by big tech? That's exactly the challenge we tackled for Asturian (ast), a beautiful Romance language spoken in Asturias, Spain." | ||
| coverImage: "/assets/blog/ast/thumb.png" | ||
| date: "2025-11-30T00:00:00.000Z" | ||
| author: | ||
| name: JarbasAl | ||
| picture: "https://avatars.githubusercontent.com/u/33701864" | ||
| ogImage: | ||
| url: "/assets/blog/ast/thumb.png" | ||
| --- | ||
|
|
||
| ## **Cloning Voices for Endangered Languages: Building a Text-to-Speech Model for Asturian** | ||
|
|
||
| Have you ever wanted to hear a computer speak in a voice you love, or in a language that's not commonly supported by big tech? | ||
|
|
||
| That's exactly the challenge we tackled for **Asturian (ast)**, a beautiful Romance language spoken in Asturias, Spain. We're excited to announce the release of `phoonnx_ast_miro_unicode` on Hugging Face, a new Text-to-Speech (TTS) model that can speak Asturian! | ||
|
|
||
| This wasn't an easy feat. Building a high-quality TTS model usually requires a massive dataset of a single speaker's voice, carefully recorded and transcribed. For languages like Asturian, such datasets are incredibly rare, if not non-existent. | ||
|
|
||
| So, how did we do it? We used a clever, hybrid approach that combines existing resources with cutting-edge voice cloning technology. | ||
|
|
||
| ### **The "Low-Resource" Challenge** | ||
|
|
||
| Imagine you want a computer to speak with a very specific voice – perhaps your own, or that of a beloved family member. Now imagine you only have a few minutes of that person speaking. That's our "low-resource donor voice." | ||
|
|
||
| At the same time, we have access to large **Automatic Speech Recognition (ASR)** datasets, like **Mozilla Common Voice**, which contain recordings of *many different people* speaking Asturian. The problem is, it's not a single, consistent voice. | ||
|
|
||
| Our goal was to "transfer" the specific sound of our donor voice onto the vast amount of text available in these multi-speaker ASR datasets. | ||
|
|
||
| ### **Our Hybrid Solution: A Step-by-Step Journey** | ||
|
|
||
| Here's a simplified look at the process we followed (for a more detailed, technical explanation, check out our **[Whitepaper on Hybrid TTS Dataset Synthesis]()**): | ||
|
|
||
| 1. **Gathering Our Raw Materials:** | ||
|
|
||
| - We started with text and audio from two great Asturian datasets: a subset of [Common Voice Scripted Speech 23.0 - Asturian](https://datacollective.mozillafoundation.org/datasets/cmflnuzw4hnmeuo2e6ea7ojbd) and the [Fleurs Asturian subset](https://huggingface.co/datasets/google/fleurs). These provided us with many text transcripts and their corresponding multi-speaker audio. | ||
| - We also had a short recording of our "donor voice" – the target voice we wanted the TTS model to learn. | ||
|
|
||
| 2. **Audio Quality Filtering and Preparation:** | ||
|
|
||
| - We converted all audio to a standard format and ensured the volume was consistent across all recordings (normalization). | ||
| - We trimmed silence from the beginning and end of each recording. | ||
| - We filtered out recordings where people spoke too fast or too slow (outliers based on **Words-Per-Minute**), keeping only the most natural and consistent segments. This focused our dataset on the best quality transcripts. | ||
|
|
||
| 3. **The Magic of Voice Cloning (Zero-Shot Revoicing):** | ||
|
|
||
| - This is where modern AI comes in! Instead of training a complex model from scratch, we used an **off-the-shelf zero-shot voice cloning solution**. | ||
| - This system was given a short reference clip of the **donor voice**. It uses this clip to learn the unique qualities of the voice. | ||
| - We then fed our filtered ASR dataset into this cloning system. The original multi-speaker audio was discarded; the cloning tool simply generates new audio in our target donor voice. The result? A new dataset of Asturian audio, all spoken in a single, consistent voice! | ||
|
|
||
| 4. **Training the Final TTS Model:** | ||
|
|
||
| * With our brand-new, high-quality, single-speaker Asturian dataset, we could finally train our TTS model. | ||
|
|
||
| ### **The Result:** | ||
|
|
||
| This entire process culminated in the [`phoonnx_ast_miro_unicode`](https://huggingface.co/OpenVoiceOS/phoonnx_ast_miro_unicode) and [`phoonnx_ast_dii_unicode`](https://huggingface.co/OpenVoiceOS/phoonnx_ast_dii_unicode) models, now available on Hugging Face. | ||
|
|
||
| The synthesized datasets, a critical output of this methodology, are also publicly available: **[TigreGotico/tts_vc_mcv-scripted-v23.0_ast_miro](https://huggingface.co/datasets/TigreGotico/tts_vc_mcv-scripted-v23.0_ast_miro)** + **[TigreGotico/tts_vc_mcv-scripted-v23.0_ast_dii](https://huggingface.co/datasets/TigreGotico/tts_vc_mcv-scripted-v23.0_ast_dii)** | ||
|
|
||
| These models are a significant step forward for Asturian language technology. It demonstrates how modern AI, combined with careful data preparation, can empower underserved languages and bring them into the digital age. We're excited to see what developers and enthusiasts will build with it! | ||
|
|
||
| --- | ||
|
|
||
| ## Help Us Build Voice for Everyone | ||
|
|
||
| OpenVoiceOS is more than software, it’s a mission. If you believe voice assistants should be open, inclusive, and user-controlled, here’s how you can help: | ||
|
|
||
| - **💸 Donate**: Help us fund development, infrastructure, and legal protection. | ||
| - **📣 Contribute Open Data**: Share voice samples and transcriptions under open licenses. | ||
| - **🌍 Translate**: Help make OVOS accessible in every language. | ||
|
|
||
| We're not building this for profit. We're building it for people. With your support, we can keep voice tech transparent, private, and community-owned. | ||
|
|
||
| 👉 [Support the project here](https://www.openvoiceos.org/contribution) | ||
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.