This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
livekit-wakeword — wake word detection library using frozen ONNX feature extraction with trainable PyTorch classifiers. Hybrid architecture: ONNX mel spectrogram + speech embedding → PyTorch DNN/RNN classifier head.
Always use uv for package management. Always use git for version separation (branches, commits).
# Install
uv sync # All deps including optional groups
uv sync --group dev # Dev only
# Test
uv run pytest tests/ # All tests
uv run pytest tests/test_config.py # Single file
uv run pytest -k "test_name" # Single test
uv run pytest --cov=src/livekit/wakeword tests/ # With coverage
# Lint & format
uv run ruff check src/ tests/ # Lint (rules: E, F, I, UP)
uv run ruff format src/ tests/ # Auto-format
uv run mypy src/livekit/wakeword/ # Type check (strict mode)
# CLI (entry point: livekit-wakeword = livekit.wakeword.cli:app)
uv run livekit-wakeword setup [--config YAML] # Data deps; Piper if piper_vits / VoxCPM snapshot if voxcpm; else always Piper when no --config
uv run livekit-wakeword generate <config> # VITS TTS + SLERP speaker blending + adversarial negatives
uv run livekit-wakeword augment <config> # Augment + extract features → .npy
uv run livekit-wakeword train <config> # 3-phase adaptive training
uv run livekit-wakeword export <config> # Export classifier to ONNX
uv run livekit-wakeword run <config> # Full pipeline (generate→augment→extract→train→export)Raw audio (16kHz) → MelSpectrogramFrontend (ONNX) → SpeechEmbedding (ONNX) → Classifier (PyTorch) → [0,1]
n_fft=512, hop=160, n_mels=32 76×32×1 → 96-dim 16×96 → 1 score
config.py— Pydantic models + YAML loading (load_config(path));TtsBackend,PiperTtsConfig,VoxCpmTtsConfig,piper_checkpoint_path,voxcpm_local_model_pathcli.py— Typer CLI with all commandsmodels/feature_extractor.py—MelSpectrogramFrontend(ONNX primary, torchaudio fallback) andSpeechEmbedding(ONNX only)classifier.py—DNNClassifier(FC+LayerNorm),RNNClassifier(Bi-LSTM),build_classifier()factorypipeline.py—WakeWordClassifier(training wrapper for classifier head)
data/generate.py— Synthetic clip orchestration (run_generate); default TTS viatts/backends (tts_backendin config)tts/—SpeechSynthesizerprotocol,get_tts_backend(),PiperVitsBackend,VoxCpmBackendpiper/— Piper-style VITS:generate_samples(904-speaker SLERP),vits/model,vits_utils.py,defaults.py(checkpoint paths/URLs),text.py(CMUDict phrase prep)augment.py—AudioAugmentor(EQ, distortion, RIR, background mixing) for all 6 splits; positives aligned to END of window, negatives/backgrounds center-paddeddataset.py—WakeWordDataset(memory-mapped .npy, mixed-class batch generator)features.py— Extract features through ONNX pipeline → .npy files
training/trainer.py—WakeWordTrainerwith 3-phase training (full → refinement → fine-tuning), hard example mining, adaptive negative weighting, checkpoint averagingmetrics.py— FPPH (false positives per hour), recall, balanced accuracy
export/onnx.py— Export classifier to ONNX with optional INT8 quantizationinference/model.py—WakeWordModelclass for simple prediction APIlistener.py—WakeWordListenerclass for async microphone detection
swift/—LiveKitWakeWordSwift package: ONNX Runtime-based pipeline for iOS 16+ / macOS 14+.WakeWordModel(stateless predict) +WakeWordListener(actor aroundAVAudioEngine). The mel + embedding.onnxfiles fromsrc/livekit/wakeword/resources/are bundled as package resources; classifier.onnxfiles are loaded from disk. ORT's CoreML Execution Provider (ANE/GPU/CPU) is used by default viaExecutionProvider.coreML. Depends on the official ORT SPM package.examples/ios_wakeword/— SwiftUI demo app (iOS + macOS) that consumes theswift/package via a local SPM dependency (path: ../../swift).WakewordEnginewrapsAVAudioEngine+ a 2 s Int16 ring buffer + backgroundWakeWordModel.predict();ContentViewrenders score/volume graphs and an execution-provider picker. Generated fromproject.ymlviaxcodegen.
- Feature extraction is numpy-based (ONNX runtime), not torch tensors. Both frozen models (
melspectrogram.onnx,embedding_model.onnx) are bundled with the package viaimportlib.resources(seeresources/__init__.py). - Embedding shape: always
(batch, 16, 96)— 16 timesteps of 96-dim vectors. Last 16 steps taken or left-padded. - Model sizes (tiny/small/medium/large) map to
layer_dimandn_blocksin config. Factory:build_classifier(model_type, model_size). - Training loss: BCE with hard example mining (only non-trivial predictions contribute) and linearly increasing negative class weight.
- Checkpoint averaging: final model averages top checkpoints by 90th-pct accuracy and 10th-pct FPPH.
- Config format: YAML loaded via
load_config(path). Seeconfigs/hey_livekit.yamlfor reference.
For detailed documentation on each pipeline stage, see docs/:
- docs/overview.md — Architecture and data flow
- docs/data-generation.md — TTS synthesis and adversarial negatives
- docs/augmentation.md — Audio transforms and alignment
- docs/feature-extraction.md — Mel spectrograms and embeddings
- docs/training.md — 3-phase training and checkpoint averaging
- docs/export-and-inference.md — ONNX export and Python API
- Python 3.11+, line length 100
- Ruff for linting/formatting, mypy strict mode
- Build system: hatchling, src layout (
src/livekit/wakeword/)