Awesome-LLMOps

🎉 An awesome & curated list of best LLMOps tools. But more about LLMOps.

LLMOps

Name	Stats	About
BentoML		Build Production-Grade AI Applications
Dify		One API for plugins and datasets, one interface for prompt engineering and visual operation, all for creating powerful AI applications
FastChat		An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Flowise		Drag & drop UI to build your customized LLM flow
Haystack		🔍 LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
LangChain		⚡ Building applications with LLMs through composability ⚡
LiteLLM		lightweight package to simplify LLM API calls - Azure, OpenAI, Cohere, Anthropic, Replicate. Manages input/output translation
LLaMa-Factory		Easy-to-use LLM fine-tuning framework (LLaMA, BLOOM, Mistral, Baichuan, Qwen, ChatGLM)
LlamaIndex		LlamaIndex is a data framework for your LLM applications
Mem0		The memory layer for Personalized AI
Open WebUI		User-friendly WebUI for LLMs (Formerly Ollama WebUI)
PrivateGPUT		Interact with your documents using the power of GPT, 100% privately, no data leaks
Swift		SWIFT supports training(PreTraining/Fine-tuning/RLHF), inference, evaluation and deployment of 350+ LLMs and 90+ MLLMs (multimodal large models).

MLOps

Name	Stats	About
Flyte		Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
Kubeflow		Machine Learning Toolkit for Kubernetes
Kserve		Standardized Serverless ML Inference Platform on Kubernetes
llmaz		☸️ Easy, advanced inference platform for large language models on Kubernetes.
Metaflow		🚀 Build and manage real-life data science projects with ease!
MLflow		Open source platform for the machine learning lifecycle
Seldon-Core		An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models.
ZenML		ZenML 🙏: Build portable, production-ready MLOps pipelines. https://zenml.io.

Inference

Name	Stats	About
DeepSpeed-MII		MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Inference		A fast, easy-to-use, production-ready inference server for computer vision supporting deployment of many popular model architectures and fine-tuned models.
ipex-llm		Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
LMDeploy		LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
MaxText		A simple, performant and scalable Jax LLM!
llama.cpp		LLM inference in C/C++
MInference		To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
MLC LLM		Universal LLM Deployment Engine with ML Compilation
MLServer		MLServer aims to provide an easy way to start serving your machine learning models through a REST and gRPC interface, fully compliant with KFServing's V2 Dataplane spec.
Nanoflow		A throughput-oriented high-performance serving framework for LLMs
Ollama		Get up and running with Llama 3, Mistral, Gemma 2, and other large language models.
OpenLLM		Operating LLMs in production
OpenVINO		OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
Ratchet		A cross-platform browser ML framework.
RayServe		Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
RouteLLM		A framework for serving and evaluating LLM routers - save LLM costs without compromising quality.
SGLang		SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.
transformers.js		State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!
Triton Inference Server		The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Text Generation Inference		Large Language Model Text Generation Inference
vLLM		A high-throughput and memory-efficient inference and serving engine for LLMs
web-llm		A high-throughput and memory-efficient inference and serving engine for LLMs
zml		High performance AI inference stack. Built for production.

Training

Name	Stats	About
ColossalAI		Making large AI models cheaper, faster and more accessible
Ludwig		Low-code framework for building custom LLMs, neural networks, and other AI models
MLX		MLX: An array framework for Apple silicon

FineTune

Name	Stats	About
Axolotl		Go ahead and axolotl questions
torchtune		A Native-PyTorch Library for LLM Fine-tuning
unsloth		Finetune Llama 3, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory

Agent

Name	Stats	About
AutoGPT		An experimental open-source attempt to make GPT-4 fully autonomous.
MetaGPT		🌟 The Multi-Agent Framework: Given one line Requirement, return PRD, Design, Tasks, Repo
PydanticAI		Agent Framework / shim to use Pydantic with LLMs
Swarm		Framework for building, orchestrating and deploying multi-agent systems. Managed by OpenAI Solutions team. Experimental framework.
XAgent		An Autonomous LLM Agent for Complex Task Solving

Evaluation

Name	Stats	About
AgentBench		A Comprehensive Benchmark to Evaluate LLMs as Agents
lm-evaluation-harness		A framework for few-shot evaluation of language models.
LongBench		LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding

DB Store

Name	Stats	About
chroma		the AI-native open-source embedding database
deeplake		Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
Faiss		A library for efficient similarity search and clustering of dense vectors.
milvus		A cloud-native vector database, storage for next generation AI applications
weaviate		Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database.

Observation

Name	Stats	About
OpenLLMetry		Open-source observability for your LLM application, based on OpenTelemetry
Helicone AI		🧊 The open-source LangSmith alternative for logging, monitoring, and debugging AI applications.
phoenix		ML Observability in a Notebook - Uncover Insights, Surface Problems, Monitor, and Fine Tune your Generative LLM, CV and Tabular Models
wandb		🔥 A tool for visualizing and tracking your machine learning experiments. This repo contains the CLI and Python API.

Alignment

Name	Stats	About
OpenRLHF		An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Fulll Tuning & Iterative DPO & LoRA & Mixtralll Tuning & Iterative DPO & LoRA & Mixtralll Tuning & Iterative DPO & LoRA & Mixtralll Tuning & Iterative DPO & LoRA & Mixtralll Tuning & Iterative DPO & LoRA & Mixtralll Tuning & Iterative DPO & LoRA & Mixtrall Tuning & Iterative DPO & LoRA & Mixtral)
Self-RLHF		Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback