Converse with large language models using speech.
- Open: Powered by state-of-the-art open-source speech processing models.
- Efficient: Light enough to run on consumer hardware, with low latency.
- Self-hosted: Entire pipeline runs offline, limited only by compute power.
- Modular: Switching LLM providers is as simple as changing an environment variable.

-
Run
setup-unix.sh
orsetup-win.bat
depending on your platform. This will download the required model weights and compile the binaries needed for Sage. -
For text generation, you can either self-host an LLM using Ollama, or opt for a third-party provider.
-
If you're using Ollama, add the
OLLAMA_MODEL
variable to the .env file to specify the model you'd like to use. (Example:OLLAMA_MODEL=deepseek-r1:7b
) -
Among the third-party providers, Sage supports the following out of the box:
- Deepseek
- OpenAI
- Anthropic
- Together.ai
-
To use a provider, add a
<PROVIDER>_API_KEY
variable to the .env file. (Example:OPENAI_API_KEY=xxxxxxxxxxxxxxxxxxxxxxx
) -
To choose which model should be used for a given provider, use the
<PROVIDER>_MODEL
variable. (Example:DEEPSEEK_MODEL=deepseek-chat
)
- Start the project with
bun start
. The first run on macOS is slow (~20 minutes on M1 Pro), since the ANE service compiles the Whisper CoreML model to a device-specific format. Next runs are faster.
- Optimize the pipeline.
- Make it easier to run. (Dockerize?)
- Release as a library?