Add blog post on Asturian TTS voice cloning methodology #25

JarbasAl · 2025-11-24T23:05:48Z

TODO - do not merge until female voice + audio samples added

Summary by CodeRabbit

Documentation
- Published a blog post announcing an Asturian text‑to‑speech release: describes a hybrid low‑resource workflow (data collection, audio preprocessing and filtering, zero‑shot donor voice cloning), methodology, and links to the final TTS models and datasets with guidance for contributors.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-11-24T23:06:00Z

Walkthrough

New blog post added documenting a hybrid low-resource Asturian TTS build using donor voice cloning, covering data sources, preprocessing, WPM filtering, zero-shot cloning, and released artifacts; no code or exported/public declarations changed. (50 words)

Changes

Cohort / File(s)	Summary
Blog Post Addition `\_posts/2025-11-30-ast.md`	New markdown article with YAML front matter (title, excerpt, coverImage, date, author, ogImage). Content details hybrid low-resource TTS methodology: data gathering (Common Voice Scripted Speech, Fleurs Asturian), audio normalization and silencing, WPM-based filtering, zero-shot donor voice cloning, final TTS models/datasets and Hugging Face links.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Verify YAML front matter syntax and consistency with other posts
Check markdown rendering, headings, images and external link validity
Confirm clarity and accuracy of methodological steps and artifact links

Poem

🐰 I hopped to a mic beneath moonlit code,

Collected voices on a winding road,
I trimmed and tuned each gentle tone,
Cloned a donor so Asturian sings home,
Bytes and carrots — a voice now grown. 🎤🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check	✅ Passed	The title accurately describes the main change: adding a new blog post about Asturian TTS voice cloning methodology, which matches the file addition in the changeset.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch ast

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

_posts/2025-11-30-ast.md (2)

17-17: Consider rephrasing to avoid overused word.

Line 17 uses "exactly," which is flagged as overused. Minor style improvement.

- That's exactly the challenge we tackled for **Asturian (ast)**, a beautiful Romance language spoken in Asturias, Spain.
+ That's precisely the challenge we tackled for **Asturian (ast)**, a beautiful Romance language spoken in Asturias, Spain.

42-44: Reduce repetitive sentence starters.

Three consecutive sentences begin with "We." Consider combining or rewording for better flow.

- We converted all audio to a standard format and ensured the volume was consistent across all recordings (normalization).
- We trimmed silence from the beginning and end of each recording.
- We filtered out recordings where people spoke too fast or too slow (outliers based on **Words-Per-Minute**), keeping only the most natural and consistent segments. This focused our dataset on the best quality transcripts.
+ We converted all audio to a standard format, ensured consistent volume (normalization), and trimmed silence from the beginning and end of each recording. Finally, we filtered out recordings where people spoke too fast or too slow (outliers based on **Words-Per-Minute**), keeping only the most natural and consistent segments to focus on the highest quality transcripts.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f93132e and 0ce8889.

⛔ Files ignored due to path filters (1)

public/assets/blog/ast/thumb.png is excluded by !**/*.png

📒 Files selected for processing (1)

_posts/2025-11-30-ast.md (1 hunks)

🧰 Additional context used

🪛 LanguageTool

_posts/2025-11-30-ast.md

[style] ~17-~17: Consider an alternative for the overused word “exactly”.
Context: ...ommonly supported by big tech? That's exactly the challenge we tackled for **Asturian...

(EXACTLY_PRECISELY)

[style] ~44-~44: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... beginning and end of each recording. - We filtered out recordings where people sp...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 markdownlint-cli2 (0.18.1)

_posts/2025-11-30-ast.md

33-33: No empty links

(MD042, no-empty-links)

🔇 Additional comments (1)

_posts/2025-11-30-ast.md (1)

58-60: External links verified as accessible.

Both Hugging Face URLs (model and dataset) return HTTP 200 responses and are ready for publication.

_posts/2025-11-30-ast.md

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

_posts/2025-11-30-ast.md (1)
33-33: Fix broken empty link reference (duplicate of past review).

This issue was flagged in a previous review and remains unresolved. The link to the whitepaper has no URL: [Whitepaper on Hybrid TTS Dataset Synthesis]().

Either add the URL or remove the link formatting:
- Here's a simplified look at the process we followed (for a more detailed, technical explanation, check out our **[Whitepaper on Hybrid TTS Dataset Synthesis]()**):
+ Here's a simplified look at the process we followed (for a more detailed, technical explanation, check out our Whitepaper on Hybrid TTS Dataset Synthesis):
Or if the whitepaper URL is available:
- Here's a simplified look at the process we followed (for a more detailed, technical explanation, check out our **[Whitepaper on Hybrid TTS Dataset Synthesis]()**):
+ Here's a simplified look at the process we followed (for a more detailed, technical explanation, check out our **[Whitepaper on Hybrid TTS Dataset Synthesis](https://example.com/whitepaper)**):

🧹 Nitpick comments (1)

_posts/2025-11-30-ast.md (1)

40-44: Reduce consecutive sentence repetition.

Multiple consecutive sentences begin with "We" (lines 42, 43, 44). Consider rewording for better flow:

  - We converted all audio to a standard format and ensured the volume was consistent across all recordings (normalization).
- - We trimmed silence from the beginning and end of each recording.
- - We filtered out recordings where people spoke too fast or too slow (outliers based on **Words-Per-Minute**), keeping only the most natural and consistent segments. This focused our dataset on the best quality transcripts.
+ - Silence was trimmed from the beginning and end of each recording.
+ - Recordings where people spoke too fast or too slow (outliers based on **Words-Per-Minute**) were filtered out, keeping only the most natural and consistent segments to focus the dataset on the best quality transcripts.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9403c30 and bd25641.

📒 Files selected for processing (1)

_posts/2025-11-30-ast.md (1 hunks)

🧰 Additional context used

🪛 LanguageTool

_posts/2025-11-30-ast.md

[style] ~17-~17: Consider an alternative for the overused word “exactly”.
Context: ...ommonly supported by big tech? That's exactly the challenge we tackled for **Asturian...

(EXACTLY_PRECISELY)

[style] ~44-~44: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... beginning and end of each recording. - We filtered out recordings where people sp...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 markdownlint-cli2 (0.18.1)

_posts/2025-11-30-ast.md

33-33: No empty links

(MD042, no-empty-links)

🔇 Additional comments (1)

_posts/2025-11-30-ast.md (1)

1-76: Empty link unresolved and audio sample files missing—manual verification required.

The empty Whitepaper link on line 33 remains unfixed: [Whitepaper on Hybrid TTS Dataset Synthesis](). Additionally, no audio sample files exist in public/assets/blog/ast/—only the thumbnail image is present, while other blog posts contain audio samples (.wav files).

Given the PR description states "TODO - do not merge until female voice + audio samples added," please manually verify:

Whether the merge-blocking condition has been satisfied

The current PR status and description

If the whitepaper link should be populated with a URL or removed entirely

asturian

0ce8889

coderabbitai bot reviewed Nov 24, 2025

View reviewed changes

_posts/2025-11-30-ast.md Show resolved Hide resolved

JarbasAl added 2 commits November 24, 2025 23:19

urls

9403c30

urls

bd25641

coderabbitai bot reviewed Nov 24, 2025

View reviewed changes

JarbasAl marked this pull request as draft November 24, 2025 23:23

JarbasAl changed the title ~~asturian~~ Add blog post on Asturian TTS voice cloning methodology Nov 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add blog post on Asturian TTS voice cloning methodology #25

Add blog post on Asturian TTS voice cloning methodology #25

Uh oh!

JarbasAl commented Nov 24, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 24, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Add blog post on Asturian TTS voice cloning methodology #25

Are you sure you want to change the base?

Add blog post on Asturian TTS voice cloning methodology #25

Uh oh!

Conversation

JarbasAl commented Nov 24, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JarbasAl commented Nov 24, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 24, 2025 •

edited

Loading