AI Frameworks + llama.cpp = ❤️ #11564

ochafik · 2025-01-31T21:53:25Z

ochafik
Jan 31, 2025
Collaborator

Proposing a living doc about all the frameworks that work with (or should work with) llama.cpp, at any level.
The list is long so let's keep it roughly sorted by decreasing community contributions or stars or something ✌️ (direct edits from contributors / suggestions of edits in comments highly welcome, I've probably made a gazillion mistakes and omissions already!)

Part of the goal is to identify which projects would benefit from a documentation update or small patches for direct support. For instance, a few Python projects only document the (amazing) llama-cpp-python bindings and could use instructions on how to also use llama-server (our canonical OpenAI-compatible server)

Projects with some integration (non exhaustive list!)

KoboldCpp (OSS)
- Type: desktop app, API
- Integration: fork of llama.cpp
- Known llama.cpp contributors: @ggerganov, @slaren, @JohannesGaessler, @ngxson & too many to count / large overlap 🤗🤗🤗🤗🤗🤗🤗
GPT4All (OSS
- Type: desktop app, openai compatible server, Python bindings
- Integration: links w/ libllama?
- Known llama.cpp contributors: @cebtenzzre (PRs) 🤗
HuggingFace Inference Endpoints
- Type: on demand API for any GGUF
- Integration: uses llama-server (see Hugging Face Inference Endpoints now supports GGUF out of the box! #9669, revshare goes to ggml.ai)
- Known llama.cpp contributors: @ngxson (PRs) 🤗
llamafile (OSS)
- Type: universal standalone CLI + local (web) app
- Integration: includes much of llama.cpp / modified server
- Known llama.cpp contributors: @jart (PRs) 🤗
llama-cpp-python (OSS):
- Type: Python bindings, OpenAI compatible server
- Integration: links w/ libllama + includes llama.cpp's JSON schema conversion
- Known llama.cpp contributors: @abetlen (PRs) 🤗
LocalAI (OSS)
- Type: multimodal OpenAI-compatible API
- Integration: links w/ libllama?
- Known llama.cpp contributors: @mudler (PRs) 🤗
Ollama (OSS)
- Type: inference server, curated GGUF model repository
  - Note: llama-server can only partially use Ollama models (custom incompatible chat template format)
- Integration: links ggml
LM Studio (closed-source)
- Type: inference server + chat UI
- Integration: linking libllama (?)
PydanticAI (OSS)
- Type: agentic framework
- Integration:
  - Through llama-server (OpenAI + base_url override: example)
  - Through Ollama: needs docs ⚠️
LlamaIndex (OSS)
- Type: agentic++ framework
- Integration:
  - Through llama-server (OpenAI like integration w/ api_base): needs docs ⚠️
  - Through llama-cpp-python (labeled LlamaCPP)
  - Through Ollama
  - Through LM Studio
  - Through llamafile
LangChain (OSS)
- Type: agentic++ framework
- Integration:
  - Through llama-server (using OpenAI + base_uri override): needs docs ⚠️
  - Through Ollama (API)
Langroid (OSS)
- Type: agentic framework
- Integration:
  - Through llama-server ✅
  - Through Ollama
LiteLLM (OSS)
- Type: router
- Integration:
  - Through llama-server (OpenAI-compatible endpoint doc): needs docs ⚠️
  - Through Ollama (code)
Rivet (OSS)
- Type: visual workflow app (MacOS X)
- Integration:
  - Through llama-server: not yet / needs help ❌
  - Through Ollama via rivet-ollama-plugin
Outlines
- Integration:
  - Through llama-server: not yet / needs help ❌
    - Note: OpenAI model doesn't support a base_url override?
  - Through llama-cpp-python (code)
    - Note: doesn't seem to leverage grammar / json schema support
txtai (OSS)
- Integration (docs
  - Through llama-server: not yet / needs help ❌
  - Through llama-cpp-python (labelled llama.cpp)
  - Through LiteLLM
n8n (OSS)
- Integration:
  - Through llama-server (OpenAI Chat Model Node + override Base URL / API key): needs docs ⚠️
  - Through Ollama
Langflow (OSS)
- Integration
  - Through llama-server: not yet / needs help ❌
    - Note: OpenAI integration lacks a base_url param
  - Through Ollama
  - Through LM Studio
CrewAI (OSS)
- Integration
  - Through llama-server or Ollama (use OpenAI + override base_url & api_key): needs docs ⚠️
Flowise (OSS)
- Integration:
  - Through llama-server: not yet / needs help ❌
    - Note: OpenAI connector (through Langchain / LlamaIndex) doesn't seem to support base_uri override
  - Through Langchain + Ollama
  - Through LlamaIndex + Ollama
AutoGen (OSS)
- Integration
  - Through llama-server (use OpenAI + override base_uri): needs docs ⚠️
  - Through Ollama
Semantic Kernel (OSS)
- Integration:
  - Through llama-server: not yet / needs help ❌
    - Note: OpenAI supports base_url override
  - Through Ollama
llama-cpp-agent
- Type: agentic framework
- Integration:
  - Through llama-server ✅
  - Through llama-cpp-python
DSPy (OSS)
- Type: agentic framework
- Integration:
  - Through llama-server: not yet / needs help ❌
  - Through Ollama
Transformers Agent
- Integration:
  - Through HuggingFace Inference Endpoints (llama-server powered)
  - Should be feasible to add an OpenAI-compatible inference engine for self-hosting
Quivr core (OSS)
- Type: RAG libraries
- Integration:
  - With llama-cpp (using LLMEndpointConfig + llm_base_url)

Projects w/o integration

OpenRouter (partially? OSS):
- Type: router
- No official integration (containers don't seem OSS, see doc)

donohara · 2025-01-31T22:20:51Z

donohara
Jan 31, 2025

Great idea! Hoping to contribute where I can.

1 reply

ochafik Jan 31, 2025
Collaborator Author

Thanks! I've flagged the projects where llama-server is supported (usually through OpenAI connector) but needs docs (low-hanging fruit; maybe just filing them a bug could be enough), vs. where I think there's actual work needed to support it (sometimes just a matter of allowing a base_url override in their OpenAI connector)

cebtenzzre · 2025-02-11T22:35:24Z

cebtenzzre
Feb 11, 2025
Collaborator

The GPT4All desktop app provides a local OpenAI-compatible server with LocalDocs support, and there is also a Python binding for our backend although it hasn't been updated in a while.

We are unfortunately stuck on an old version of llama.cpp because we rely on many patches for the Kompute backend, which we have not had time to rebase on the new backend API.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AI Frameworks + llama.cpp = ❤️ #11564

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

AI Frameworks + llama.cpp = ❤️ #11564

Uh oh!

Uh oh!

ochafik Jan 31, 2025 Collaborator

Projects with some integration (non exhaustive list!)

Projects w/o integration

Replies: 2 comments · 1 reply

Uh oh!

donohara Jan 31, 2025

Uh oh!

Uh oh!

ochafik Jan 31, 2025 Collaborator Author

Uh oh!

cebtenzzre Feb 11, 2025 Collaborator

ochafik
Jan 31, 2025
Collaborator

Replies: 2 comments 1 reply

donohara
Jan 31, 2025

ochafik Jan 31, 2025
Collaborator Author

cebtenzzre
Feb 11, 2025
Collaborator