feat(whisperx): add whisperx backend for transcription with speaker diarization #8299

eureka928 · 2026-01-30T12:54:35Z

Description

This PR adds a new WhisperX Python backend that provides transcription with speaker diarization (identifying who is speaking), word-level timestamps, and forced alignment via pyannote-audio.

Closes #3375

Key changes:

Extends the gRPC TranscriptSegment message with a speaker field (backward-compatible — existing backends leave it empty)
Maps the new Speaker field through the Go schema (core/schema/transcription.go) and backend mapper (core/backend/transcript.go)
Adds the full backend/python/whisperx/ backend with gRPC server, requirements for CPU/CUDA 12/CUDA 13/ROCm, and unit tests
Registers the backend in the Makefile, backend/index.yaml, and CI workflow

Speaker diarization requires a HuggingFace token (HF_TOKEN env var) with access to pyannote models, and is activated by setting diarize=true in the transcription request.

Notes for Reviewers

The alignment model is cached per language to avoid reloading on every transcription call
The diarization pipeline is lazily initialized and reused across calls
Timestamp handling matches the existing faster-whisper convention

Signed commits

Yes, I signed my commits.

netlify · 2026-01-30T12:54:40Z

✅ Deploy Preview for localai ready!

Name	Link
🔨 Latest commit	`0d10ffb`
🔍 Latest deploy log	https://app.netlify.com/projects/localai/deploys/697d1dd8fe8eca000813625c
😎 Deploy Preview	https://deploy-preview-8299--localai.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

eureka928 · 2026-01-30T13:01:49Z

@mudler @neurocis nice to meet you and glad to put the first PR

Would you review my PR?

Thank you for your time

backend/python/whisperx/requirements-cpu.txt

backend/python/whisperx/requirements-cublas12.txt

backend/python/whisperx/requirements-cublas13.txt

eureka928 · 2026-01-30T20:39:42Z

Hi @mudler I have updated the code based on your feedback.
Please let me know if you have any further feedback after your review.

Add speaker field to the gRPC TranscriptSegment message and map it through the Go schema, enabling backends to return speaker labels. Signed-off-by: eureka928 <[email protected]>

Add Python gRPC backend using WhisperX for speech-to-text with word-level timestamps, forced alignment, and speaker diarization via pyannote-audio when HF_TOKEN is provided. Signed-off-by: eureka928 <[email protected]>

Signed-off-by: eureka928 <[email protected]>

…ments Address review feedback: - Use --extra-index-url for CPU torch wheels to reduce size - Remove torch version pins, let uv resolve compatible versions Signed-off-by: eureka928 <[email protected]>

eureka928 · 2026-01-31T08:18:19Z

Hi @mudler hope you're having good weekend
Would you give me more feedback after review?
Thank you and have a nice weekend

eureka928 force-pushed the feat/whisperx-backend branch from 4dcf358 to 7bf3852 Compare January 30, 2026 12:57

github-actions bot added the dependencies label Jan 30, 2026

mudler reviewed Jan 30, 2026

View reviewed changes

backend/python/whisperx/requirements-cpu.txt Outdated Show resolved Hide resolved

mudler reviewed Jan 30, 2026

View reviewed changes

backend/python/whisperx/requirements-cublas12.txt Outdated Show resolved Hide resolved

mudler reviewed Jan 30, 2026

View reviewed changes

backend/python/whisperx/requirements-cublas13.txt Outdated Show resolved Hide resolved

eureka928 added 6 commits January 30, 2026 21:52

feat(proto): add speaker field to TranscriptSegment for diarization

0cda7d5

Add speaker field to the gRPC TranscriptSegment message and map it through the Go schema, enabling backends to return speaker labels. Signed-off-by: eureka928 <[email protected]>

feat(whisperx): register whisperx backend in Makefile

9963563

Signed-off-by: eureka928 <[email protected]>

feat(whisperx): add whisperx meta and image entries to index.yaml

0d0446d

Signed-off-by: eureka928 <[email protected]>

ci(whisperx): add build matrix entries for CPU, CUDA 12/13, and ROCm

9f3cb4c

Signed-off-by: eureka928 <[email protected]>

fix(whisperx): unpin torch versions and use CPU index for cpu require…

0d10ffb

…ments Address review feedback: - Use --extra-index-url for CPU torch wheels to reduce size - Remove torch version pins, let uv resolve compatible versions Signed-off-by: eureka928 <[email protected]>

eureka928 force-pushed the feat/whisperx-backend branch from 3e5133d to 0d10ffb Compare January 30, 2026 21:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(whisperx): add whisperx backend for transcription with speaker diarization #8299

feat(whisperx): add whisperx backend for transcription with speaker diarization #8299

eureka928 commented Jan 30, 2026 •

edited

Loading

Uh oh!

netlify bot commented Jan 30, 2026 •

edited

Loading

Uh oh!

eureka928 commented Jan 30, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eureka928 commented Jan 30, 2026 •

edited

Loading

Uh oh!

eureka928 commented Jan 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

feat(whisperx): add whisperx backend for transcription with speaker diarization #8299

Are you sure you want to change the base?

feat(whisperx): add whisperx backend for transcription with speaker diarization #8299

Conversation

eureka928 commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for localai ready!

Uh oh!

eureka928 commented Jan 30, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eureka928 commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eureka928 commented Jan 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eureka928 commented Jan 30, 2026 •

edited

Loading

netlify bot commented Jan 30, 2026 •

edited

Loading

eureka928 commented Jan 30, 2026 •

edited

Loading