Whisper Interface v2

A GUI application for transcribing audio/video files using the specialized Whisper Large V3 Turbo model.

Features

User-friendly GUI: Simple and intuitive interface
GPU acceleration: Faster performance with GPU, fallback to CPU if unavailable
Wide format support: Works with multiple audio and video formats
Automatic file splitting: Handles large files seamlessly
Multiple output formats: Export transcriptions in XLSX, SRT, TXT, VTT, TSV, or JSON
Timestamps toggle: Option to include or exclude timestamps in exports (enabled by default)
Dark/Light mode: Choose your preferred theme
Progress tracking: Real-time progress updates with sound notifications
100% offline processing: No internet required after model download
Color-coded progress: Easy-to-read progress updates in the terminal
Detailed logging: Logs saved locally in the /logs directory

Installation & Usage

Windows

Install Python 3.8 or newer from python.org.
Install ffmpeg by running the following commands in Command Prompt (as Administrator):
```
winget install ffmpeg
setx PATH "%PATH%;C:\Program Files\ffmpeg\bin"
```
Double-click install\setup.bat to:
- Create a virtual environment
- Install all required dependencies
Double-click install\setup_download_model.bat to:
- Download the Swiss German Whisper model (approximately 3GB)
Double-click run.bat to start the application.

macOS/Linux

Install Python 3.8 or newer.
Install ffmpeg:
- macOS: brew install ffmpeg
- Linux: sudo apt install ffmpeg (or equivalent for your distribution)
Make the scripts executable:
```
chmod +x install/setup.sh run.sh
```
Run the setup script:
```
./install/setup.sh
```
This will:
- Create a virtual environment
- Install all required dependencies
Download the model:
```
./install/setup_download_model.sh
```
- This will download the Swiss German Whisper model (approximately 3GB).
Start the application:
```
./run.sh
```

Changing the Model

The application uses the Swiss German Whisper Large V3 Turbo model by default. If you want to use a different Whisper model, you can modify the hardcoded model ID in the setup_download_model.bat (Windows) or setup_download_model.sh (Linux/macOS) file.

Windows: Open install\setup_download_model.bat and change the MODEL_ID value:
```
set MODEL_ID="your-new-model-id"
```
Linux/macOS: Open install/setup_download_model.sh and change the MODEL_ID value:
```
MODEL_ID="your-new-model-id"
```

You can find other Whisper models on Hugging Face.

Usage

The application works completely offline after the initial setup. No internet connection is required for transcription.

Launch the application by running run.bat (Windows) or ./run.sh (Linux/macOS).
Click "Choose audio files" to select one or more audio/video files.
Use the "Include Timestamps" checkbox to toggle timestamps in the exported files (enabled by default).
Select the desired output format(s).
Click "Start" to begin transcription.

The transcribed files will be saved in the same directory as the input files, with the same name but different extensions based on the selected output formats.

Configuring Audio Settings

You can configure several audio-related settings in the file src/audio_config.py. These settings are explained in detail within that file. Please refer to it for further customization options.

Supported File Formats

Audio

MP3, M4A, M4B, M4P, FLAC
OGG, OGA, MOGG, WAV, WMA
MMF, AA, AAX

Video

MP4, M4V, MKV, WEBM
AVI, MOV, WMV, FLV
And many more video formats

Output Formats

XLSX: Excel spreadsheet with timestamps and text
SRT: SubRip subtitle format
TXT: Plain text
VTT: WebVTT subtitle format
TSV: Tab-separated values
JSON: Structured data format

Changes from Original Version

Runs the model on GPU, with fallback to CPU
Uses the specialized Swiss German Whisper Large V3 Turbo model
Improved code organization with separate modules
Enhanced progress tracking
Better error handling and resource management
Added a GUI checkbox to toggle timestamps in exports (enabled by default)

Credits

Model: Whisper Large V3 Turbo Swiss German
Original Whisper model: OpenAI
Whisper Interface v2: Dimitri Gerster (gerdix)

Known Bugs

Button 'Abort' doesn't work
App looses connection after first transcription is finished
App has no clean Exit Button

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
docs		docs
install		install
src		src
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
run.bat		run.bat
run.sh		run.sh
sound_effect_finished.wav		sound_effect_finished.wav

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whisper Interface v2

Features

Installation & Usage

Windows

macOS/Linux

Changing the Model

Usage

Configuring Audio Settings

Supported File Formats

Audio

Video

Output Formats

Changes from Original Version

Credits

Known Bugs

About

Releases

Packages

Languages

License

differentstuff/Whisper-GUI-v3turbo-swissgerman

Folders and files

Latest commit

History

Repository files navigation

Whisper Interface v2

Features

Installation & Usage

Windows

macOS/Linux

Changing the Model

Usage

Configuring Audio Settings

Supported File Formats

Audio

Video

Output Formats

Changes from Original Version

Credits

Known Bugs

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages