Skip to content
forked from farshed/sage

Self-hosted voice chat with LLMs

License

Notifications You must be signed in to change notification settings

GoudaCouda/sage

 
 

Repository files navigation

Sage

Converse with large language models using speech.

  • Open: Powered by state-of-the-art open-source speech processing models.
  • Efficient: Light enough to run on consumer hardware, with low latency.
  • Self-hosted: Entire pipeline runs offline, limited only by compute power.
  • Modular: Switching LLM providers is as simple as changing an environment variable.

How it works


Sage architecture

Requirements

  • Bun
  • Rust
  • Ollama (Alternatively, you can use a third-party provider)

Run

  1. Run setup-unix.sh or setup-win.bat depending on your platform. This will download the required model weights and compile the binaries needed for Sage.

  2. For text generation, you can either self-host an LLM using Ollama, or opt for a third-party provider.

  • If you're using Ollama, add the OLLAMA_MODEL variable to the .env file to specify the model you'd like to use. (Example: OLLAMA_MODEL=deepseek-r1:7b)

  • Among the third-party providers, Sage supports the following out of the box:

    1. Deepseek
    2. OpenAI
    3. Anthropic
    4. Together.ai
  • To use a provider, add a <PROVIDER>_API_KEY variable to the .env file. (Example: OPENAI_API_KEY=xxxxxxxxxxxxxxxxxxxxxxx)

  • To choose which model should be used for a given provider, use the <PROVIDER>_MODEL variable. (Example: DEEPSEEK_MODEL=deepseek-chat)

  1. Start the project with bun start. The first run on macOS is slow (~20 minutes on M1 Pro), since the ANE service compiles the Whisper CoreML model to a device-specific format. Next runs are faster.

Future work

  1. Optimize the pipeline.
  2. Make it easier to run. (Dockerize?)
  3. Release as a library?

About

Self-hosted voice chat with LLMs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Rust 71.8%
  • TypeScript 10.0%
  • HTML 8.5%
  • JavaScript 7.0%
  • Shell 1.5%
  • Batchfile 1.2%