Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 76 additions & 0 deletions _posts/2025-11-30-ast.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
---
title: "*Cloning Voices for Endangered Languages: Building a Text-to-Speech Model for Asturian"
excerpt: "Have you ever wanted to hear a computer speak in a voice you love, or in a language that's not commonly supported by big tech? That's exactly the challenge we tackled for Asturian (ast), a beautiful Romance language spoken in Asturias, Spain."
coverImage: "/assets/blog/ast/thumb.png"
date: "2025-11-30T00:00:00.000Z"
author:
name: JarbasAl
picture: "https://avatars.githubusercontent.com/u/33701864"
ogImage:
url: "/assets/blog/ast/thumb.png"
---

## **Cloning Voices for Endangered Languages: Building a Text-to-Speech Model for Asturian**

Have you ever wanted to hear a computer speak in a voice you love, or in a language that's not commonly supported by big tech?

That's exactly the challenge we tackled for **Asturian (ast)**, a beautiful Romance language spoken in Asturias, Spain. We're excited to announce the release of `phoonnx_ast_miro_unicode` on Hugging Face, a new Text-to-Speech (TTS) model that can speak Asturian!

This wasn't an easy feat. Building a high-quality TTS model usually requires a massive dataset of a single speaker's voice, carefully recorded and transcribed. For languages like Asturian, such datasets are incredibly rare, if not non-existent.

So, how did we do it? We used a clever, hybrid approach that combines existing resources with cutting-edge voice cloning technology.

### **The "Low-Resource" Challenge**

Imagine you want a computer to speak with a very specific voice – perhaps your own, or that of a beloved family member. Now imagine you only have a few minutes of that person speaking. That's our "low-resource donor voice."

At the same time, we have access to large **Automatic Speech Recognition (ASR)** datasets, like **Mozilla Common Voice**, which contain recordings of *many different people* speaking Asturian. The problem is, it's not a single, consistent voice.

Our goal was to "transfer" the specific sound of our donor voice onto the vast amount of text available in these multi-speaker ASR datasets.

### **Our Hybrid Solution: A Step-by-Step Journey**

Here's a simplified look at the process we followed (for a more detailed, technical explanation, check out our **[Whitepaper on Hybrid TTS Dataset Synthesis]()**):

1. **Gathering Our Raw Materials:**

- We started with text and audio from two great Asturian datasets: a subset of [Common Voice Scripted Speech 23.0 - Asturian](https://datacollective.mozillafoundation.org/datasets/cmflnuzw4hnmeuo2e6ea7ojbd) and the [Fleurs Asturian subset](https://huggingface.co/datasets/google/fleurs). These provided us with many text transcripts and their corresponding multi-speaker audio.
- We also had a short recording of our "donor voice" – the target voice we wanted the TTS model to learn.

2. **Audio Quality Filtering and Preparation:**

- We converted all audio to a standard format and ensured the volume was consistent across all recordings (normalization).
- We trimmed silence from the beginning and end of each recording.
- We filtered out recordings where people spoke too fast or too slow (outliers based on **Words-Per-Minute**), keeping only the most natural and consistent segments. This focused our dataset on the best quality transcripts.

3. **The Magic of Voice Cloning (Zero-Shot Revoicing):**

- This is where modern AI comes in! Instead of training a complex model from scratch, we used an **off-the-shelf zero-shot voice cloning solution**.
- This system was given a short reference clip of the **donor voice**. It uses this clip to learn the unique qualities of the voice.
- We then fed our filtered ASR dataset into this cloning system. The original multi-speaker audio was discarded; the cloning tool simply generates new audio in our target donor voice. The result? A new dataset of Asturian audio, all spoken in a single, consistent voice!

4. **Training the Final TTS Model:**

* With our brand-new, high-quality, single-speaker Asturian dataset, we could finally train our TTS model.

### **The Result:**

This entire process culminated in the [`phoonnx_ast_miro_unicode`](https://huggingface.co/OpenVoiceOS/phoonnx_ast_miro_unicode) and [`phoonnx_ast_dii_unicode`](https://huggingface.co/OpenVoiceOS/phoonnx_ast_dii_unicode) models, now available on Hugging Face.

The synthesized datasets, a critical output of this methodology, are also publicly available: **[TigreGotico/tts_vc_mcv-scripted-v23.0_ast_miro](https://huggingface.co/datasets/TigreGotico/tts_vc_mcv-scripted-v23.0_ast_miro)** + **[TigreGotico/tts_vc_mcv-scripted-v23.0_ast_dii](https://huggingface.co/datasets/TigreGotico/tts_vc_mcv-scripted-v23.0_ast_dii)**

These models are a significant step forward for Asturian language technology. It demonstrates how modern AI, combined with careful data preparation, can empower underserved languages and bring them into the digital age. We're excited to see what developers and enthusiasts will build with it!

---

## Help Us Build Voice for Everyone

OpenVoiceOS is more than software, it’s a mission. If you believe voice assistants should be open, inclusive, and user-controlled, here’s how you can help:

- **💸 Donate**: Help us fund development, infrastructure, and legal protection.
- **📣 Contribute Open Data**: Share voice samples and transcriptions under open licenses.
- **🌍 Translate**: Help make OVOS accessible in every language.

We're not building this for profit. We're building it for people. With your support, we can keep voice tech transparent, private, and community-owned.

👉 [Support the project here](https://www.openvoiceos.org/contribution)
Binary file added public/assets/blog/ast/thumb.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.