Build better LLM apps — faster, smarter, production-ready.
A curated, list of 100+ libraries and frameworks for AI engineers building with Large Language Models. This toolkit includes battle-tested tools, frameworks, templates, and reference implementations for developing, deploying, and optimizing LLM-powered systems.
| Tool | Description | Language | License |
|---|---|---|---|
| Pinecone | Managed vector database for production AI applications | API/SDK | Commercial |
| Weaviate | Open-source vector database with GraphQL API | Go | BSD-3 |
| Qdrant | Vector similarity search engine with extended filtering | Rust | Apache-2.0 |
| Chroma | Open-source embedding database for LLM apps | Python | Apache-2.0 |
| Milvus | Cloud-native vector database for scalable similarity search | Go/C++ | Apache-2.0 |
| FAISS | Library for efficient similarity search and clustering | C++/Python | MIT |
| Tool | Description | Language | License |
|---|---|---|---|
| LangChain | Framework for developing LLM applications | Python/JS | MIT |
| LlamaIndex | Data framework for LLM applications | Python | MIT |
| Haystack | End-to-end NLP framework for production | Python | Apache-2.0 |
| DSPy | Framework for algorithmically optimizing LM prompts | Python | MIT |
| Semantic Kernel | SDK for integrating AI into conventional programming languages | C#/Python/Java | MIT |
| Langflow | Visual no-code platform for building and deploying LLM workflows | Python/TypeScript | MIT |
| Flowise | Drag-and-drop UI for creating LLM chains and agents | TypeScript | MIT |
| Promptflow | Workflow orchestration for LLM pipelines, evaluation, and deployment | Python | MIT |
| Tool | Description | Language | License |
|---|---|---|---|
| Docling | AI-powered toolkit converting PDF, DOCX, PPTX, HTML, images into structured JSON/Markdown with layout, OCR, table, and code recognition | Python | MIT |
| pdfplumber | Drill through PDFs at a character level, extract text & tables, and visually debug extraction | Python | MIT |
| PyMuPDF (fitz) | Lightweight, high-performance PDF parser for text/image extraction and manipulation | Python / C | AGPL-3.0 |
| PDF.js | Browser-based PDF renderer with text extraction capabilities | JavaScript | Apache-2.0 |
| Camelot | Extracts structured tabular data from PDFs into DataFrames and CSVs | Python | MIT |
| Llama Parse | Structured parsing of PDFs and documents optimized for LLMs | Python | Apache-2.0 |
| MegaParse | Universal parser for PDFs, HTML, and semi-structured documents | Python | Apache-2.0 |
| ExtractThinker | Intelligent document extraction framework with schema mapping | Python | MIT |
| PyMuPDF4LLM | Wrapper around PyMuPDF for LLM-ready text, tables, and image extraction | Python | Apache-2.0 |
| Tool | Description | Language | License |
|---|---|---|---|
| RAGFlow | Open-source RAG engine based on deep document understanding | Python | Apache-2.0 |
| Verba | Retrieval Augmented Generation (RAG) chatbot | Python | BSD-3 |
| PrivateGPT | Interact with documents using local LLMs | Python | Apache-2.0 |
| AnythingLLM | All-in-one AI application for any LLM | JavaScript | MIT |
| Quivr | Your GenAI second brain | Python/TypeScript | Apache-2.0 |
| Jina | Cloud-native neural search framework for multimodal RAG | Python | Apache-2.0 |
| txtai | All-in-one embeddings database for semantic search and workflows | Python | Apache-2.0 |
| FastGraph RAG | Graph-based RAG framework for structured retrieval | Python | MIT |
| Chonkie | Chunking utility for efficient document processing in RAG | Python | - |
| SQLite-Vec | Vector search extension for SQLite, useful in lightweight RAG setups | C/Python | MIT |
| FlashRAG | Low-latency RAG research toolkit with modular design and benchmarks | Python | - |
| Llmware | Lightweight framework for building RAG-based apps | Python | Apache-2.0 |
| Vectara | Managed RAG platform with APIs for retrieval and generation | Python/Go | Commercial |
| GPTCache | Semantic cache for LLM responses to accelerate RAG pipelines | Python | Apache-2.0 |
| Tool | Description | Language | License |
|---|---|---|---|
| Ragas | Evaluation framework for RAG pipelines | Python | Apache-2.0 |
| LangSmith | Platform for debugging, testing, and monitoring LLM applications | API/SDK | Commercial |
| Phoenix | ML observability for LLM, vision, language, and tabular models | Python | Apache-2.0 |
| DeepEval | LLM evaluation framework for unit testing LLM outputs | Python | Apache-2.0 |
| TruLens | Evaluation and tracking for LLM experiments | Python | MIT |
| Inspect | Framework for large language model evaluations | Python | Apache-2.0 |
| UpTrain | Open-source tool to evaluate and improve LLM applications | Python | Apache-2.0 |
| Weave | Experiment tracking, debugging, and logging for LLM workflows | Python | Apache-2.0 |
| Giskard | Open-source testing framework for ML/LLM applications | Python | Apache-2.0 |
| Lighteval | Lightweight and fast evaluation framework from Hugging Face | Python | Apache-2.0 |
| LangTest | NLP/LLM test suite for robustness, bias, and quality | Python | Apache-2.0 |
| PromptBench | Benchmarking framework for evaluating prompts | Python | MIT |
| EvalPlus | Advanced evaluation framework for code generation models | Python | Apache-2.0 |
| FastChat | Framework for chat-based LLM benchmarking and evaluation | Python | Apache-2.0 |
| judges | Human + AI judging framework for LLM evaluation | Python | Apache-2.0 |
| Evals | OpenAI's framework for creating and running LLM evaluations | Python | MIT |
| AgentEvals | Evaluation framework for autonomous AI agents | Python | Apache-2.0 |
| UQLM | Unified framework for evaluating quality of LLMs | Python | Apache-2.0 |
| LLMBox | Toolkit for evaluation + training of LLMs | Python | Apache-2.0 |
| Opik | DevOps platform for evaluation, monitoring, and observability | Python | Apache-2.0 |
| PydanticAI Evals | Built-in evaluation utilities for PydanticAI agents | Python | MIT |
| LLM Transparency Tool | Framework for probing and evaluating LLM transparency | Python | Apache-2.0 |
| AnnotateAI | Annotation and evaluation framework for LLM datasets | Python | Apache-2.0 |
| Tool | Description | Language | License |
|---|---|---|---|
| Hugging Face Hub | Client library for Hugging Face Hub | Python | Apache-2.0 |
| MLflow | Platform for ML lifecycle management | Python | Apache-2.0 |
| Weights & Biases | Developer tools for ML | Python | MIT |
| DVC | Data version control for ML projects | Python | Apache-2.0 |
| Comet ML | Experiment tracking and visualization for ML/LLM workflows | Python | MIT |
| ClearML | End-to-end MLOps platform with LLM support | Python | Apache-2.0 |
| Tool | Description | Language | License |
|---|---|---|---|
| Firecrawl | AI-powered web crawler that extracts and structures content for LLM pipelines | TypeScript | MIT |
| Scrapy | Fast, high-level web crawling & scraping framework | Python | BSD-3 |
| Playwright | Web automation & scraping with headless browsers | TypeScript/Python/Java/.NET | Apache-2.0 |
| BeautifulSoup | Easy HTML/XML parsing for quick scraping tasks | Python | MIT |
| Selenium | Browser automation framework (supports scraping) | Multiple | Apache-2.0 |
| Apify SDK | Web scraping & automation platform SDK | Python/JavaScript | Apache-2.0 |
| Newspaper3k | News & article extraction library | Python | MIT |
| Data Prep Kit | Toolkit for cleaning, transforming, and preparing datasets for LLMs | Python | Apache-2.0 |
| ScrapeGraphAI | Use LLMs to extract structured data from websites and documents | Python | MIT |
| Crawlee | Web scraping and crawling framework for large-scale data collection | TypeScript | Apache-2.0 |
| Tool | Description | Language | License |
|---|---|---|---|
| Promptify | Prompt engineering toolkit for NLP/LLM tasks | Python | Apache-2.0 |
| PromptSource | Toolkit for creating, sharing, and managing prompts | Python | Apache-2.0 |
| Promptimizer | Microsoft toolkit for optimizing prompts via evaluation | Python | MIT |
| Py-Priompt | Library for prioritizing and optimizing LLM prompts | Python | MIT |
| Selective Context | Context selection and compression for efficient prompting | Python | MIT |
| LLMLingua | Prompt compression via token selection and ranking | Python | MIT |
| betterprompt | Prompt experimentation & optimization framework | Python | Apache-2.0 |
| PCToolkit | Toolkit for prompt compression and efficiency | Python | Apache-2.0 |
| Tool | Description | Language | License |
|---|---|---|---|
| Instructor | Structured LLM outputs with Pydantic schema validation | Python | MIT |
| XGrammar | Grammar-based constrained generation for LLMs | Python | Apache-2.0 |
| Outlines | Controlled generation with regex, CFGs, and schemas | Python | MIT |
| Guidance | Programmatic control of LLM outputs with constraints | Python | MIT |
| LMQL | Query language for structured interaction with LLMs | Python | Apache-2.0 |
| Jsonformer | Efficient constrained decoding for valid JSON outputs | Python | MIT |
| Framework | Description | Language | License |
|---|---|---|---|
| AutoGen | Multi-agent conversation framework | Python | CC-BY-4.0 |
| CrewAI | Framework for orchestrating role-playing autonomous AI agents | Python | MIT |
| LangGraph | Build resilient language agents as graphs | Python | MIT |
| AgentOps | Python SDK for AI agent monitoring, LLM cost tracking, benchmarking | Python | MIT |
| Swarm | Educational framework for exploring ergonomic, lightweight multi-agent orchestration | Python | MIT |
| Agency Swarm | An open-source agent framework designed to automate your workflows | Python | MIT |
| Multi-Agent Systems | Research into multi-agent systems and applications | Python | MIT |
| Auto-GPT | Autonomous AI agent for task execution using GPT models | Python | MIT |
| BabyAGI | Task-driven autonomous agent inspired by AGI | Python | MIT |
| SuperAGI | Infrastructure for building and managing autonomous agents | Python | MIT |
| Phidata | Build AI agents with memory, tools, and knowledge | Python | MIT |
| MemGPT | Self-improving agents with infinite context via memory management | Python | MIT |
| Griptape | Framework for building AI agents with structured pipelines and memory | Python | Apache-2.0 |
| mem0 | AI memory framework for storing & retrieving agent context across sessions | Python | MIT |
| Memoripy | Lightweight persistent memory library for LLMs and agents | Python | MIT |
| Memobase | Database-like persistent memory for conversational agents | Python | MIT |
| Letta (MemGPT) | Long-term memory management for LLM agents | Python | MIT |
| Agno | Framework for building AI agents with RAG, workflows, and memory | Python | Apache-2.0 |
| Agents SDK | SDK from Vercel for building agentic workflows and applications | TypeScript | Apache-2.0 |
| Smolagents | Lightweight agent framework from Hugging Face | Python | Apache-2.0 |
| Pydantic AI | Agent framework built on Pydantic for structured reasoning | Python | MIT |
| CAMEL | Multi-agent framework enabling role-play and collaboration | Python | Apache-2.0 |
| BeeAI | LLM agent framework for AI-driven workflows and automation | Python | Apache-2.0 |
| gradio-tools | Integrate external tools into agents via Gradio apps | Python | Apache-2.0 |
| Composio | Tool orchestration framework to connect 100+ APIs for agents | Python | Apache-2.0 |
| Atomic Agents | Modular agent framework with tool usage and reasoning | Python | Apache-2.0 |
| Memary | Memory-augmented agent framework for persistent context | Python | MIT |
| Browser Use | Framework for browser automation with AI agents | Python | Apache-2.0 |
| OpenWebAgent | Agents for interacting with and extracting from the web | Python | Apache-2.0 |
| Lagent | Lightweight agent framework from InternLM | Python | Apache-2.0 |
| LazyLLM | Agent framework for lazy evaluation and efficient execution | Python | Apache-2.0 |
| Swarms | Enterprise agent orchestration framework (“Agency Swarm”) | Python | MIT |
| ChatArena | Multi-agent simulation platform for research and evaluation | Python | Apache-2.0 |
| AgentStack | Agent orchestration framework (different from Agency Swarm) | Python | Apache-2.0 |
| Archgw | Agent runtime for structured workflows and graph execution | Python | Apache-2.0 |
| Flow | Low-code agent workflow framework for LLMs | Python | Apache-2.0 |
| Langroid | Framework for building multi-agent conversational systems | Python | Apache-2.0 |
| Agentarium | Platform for creating multi-agent environments | Python | Apache-2.0 |
| Upsonic | Agent framework focused on context management and tool use | Python | Apache-2.0 |
| Tool | Description | Language | License |
|---|---|---|---|
| PyTorch Lightning | High-level PyTorch interface for LLMs | Python | Apache-2.0 |
| unsloth | Fine-tune LLMs faster with less memory | Python | Apache-2.0 |
| Axolotl | Post-training pipeline for AI models | Python | Apache-2.0 |
| LLaMA-Factory | Easy & efficient LLM fine-tuning | Python | Apache-2.0 |
| PEFT | Parameter-Efficient Fine-Tuning library | Python | Apache-2.0 |
| DeepSpeed | Distributed training & inference optimization | Python | MIT |
| TRL | Train transformer LMs with reinforcement learning | Python | Apache-2.0 |
| Transformers | Pretrained models for text, vision, and audio tasks | Python | Apache-2.0 |
| LitGPT | Train and fine-tune LLMs lightning fast | Python | Apache-2.0 |
| Mergoo | Merge multiple LLM experts efficiently | Python | Apache-2.0 |
| Ludwig | Low-code framework for custom LLMs | Python | Apache-2.0 |
| txtinstruct | Framework for training instruction-tuned models | Python | Apache-2.0 |
| xTuring | Fast fine-tuning of open-source LLMs | Python | Apache-2.0 |
| RL4LMs | RL library to fine-tune LMs to human preferences | Python | Apache-2.0 |
| torchtune | PyTorch-native library for fine-tuning LLMs | Python | BSD-3 |
| Accelerate | Library to easily train on multiple GPUs/TPUs with mixed precision | Python | Apache-2.0 |
| BitsandBytes | 8-bit optimizers and quantization for efficient LLM training | Python | MIT |
| Lamini | Python SDK for building and fine-tuning LLMs with Lamini API | Python | Apache-2.0 |
| Tool | Description | Language | License |
|---|---|---|---|
| LLM Compressor | Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment | Python | Apache-2.0 |
| LightLLM | Lightweight Python-based LLM inference and serving framework with easy scalability and high performance | Python | Apache-2.0 |
| vLLM | High-throughput and memory-efficient inference and serving engine for LLMs | Python | Apache-2.0 |
| torchchat | Run PyTorch LLMs locally on servers, desktop, and mobile | Python | MIT |
| TensorRT-LLM | NVIDIA library for optimizing LLM inference with TensorRT | C++/Python | Apache-2.0 |
| WebLLM | High-performance in-browser LLM inference engine | TypeScript/Python | Apache-2.0 |
| Tool | Description | Language | License |
|---|---|---|---|
| JailbreakEval | Automated evaluators for assessing jailbreak attempts | Python | MIT |
| EasyJailbreak | Easy-to-use Python framework to generate adversarial jailbreak prompts | Python | Apache-2.0 |
| Guardrails | Add guardrails to large language models | Python | MIT |
| LLM Guard | Security toolkit for LLM interactions | Python | Apache-2.0 |
| AuditNLG | Reduce risks in generative AI systems for language | Python | MIT |
| NeMo Guardrails | Toolkit for adding programmable guardrails to LLM conversational systems | Python | Apache-2.0 |
| Garak | LLM vulnerability scanner | Python | MIT |
| DeepTeam | LLM red teaming framework | Python | Apache-2.0 |
| MarkLLM | Watermarking toolkit for LLM outputs | Python | Apache-2.0 |
| LLMSanitize | Security toolkit for sanitizing LLM inputs/outputs | Python | MIT |
| Tool | Description | Language | License |
|---|---|---|---|
| Reflex | Build full-stack web apps powered by LLMs with Python-only workflows and reactive UIs. | Python | Apache-2.0 |
| Gradio | Create quick, interactive UIs for LLM demos and prototypes. | Python | Apache-2.0 |
| Streamlit | Build and share AI/ML apps fast with Python scripts and interactive widgets. | Python | Apache-2.0 |
| Taipy | End-to-end Python framework for building production-ready AI apps with dashboards and pipelines. | Python | Apache-2.0 |
| AI SDK UI | Vercel’s AI SDK for building chat & generative UIs | TypeScript | Apache-2.0 |
| Simpleaichat | Minimal Python interface for prototyping conversational LLMs | Python | MIT |
| Chainlit | Framework for building and debugging LLM apps with a rich UI | Python | Apache-2.0 |
| Tool | Description | Language | License |
|---|---|---|---|
| Ollama | Get up and running with large language models locally | Go | MIT |
| LM Studio | Desktop app for running local LLMs | - | Commercial |
| GPT4All | Open-source chatbot ecosystem | C++ | MIT |
| LocalAI | Self-hosted OpenAI-compatible API | Go | MIT |
| LiteLLM | Lightweight OpenAI-compatible gateway for multiple LLM providers | Python | MIT |
| AI Gateway | Gateway for managing LLM requests, caching, and routing | Python | Apache-2.0 |
| Langcorn | Serve LangChain applications via FastAPI with production-ready endpoints | Python | MIT |
| LitServe | High-speed GPU inference server with autoscaling and batch support | Python | Apache-2.0 |
| Tool | Description | Language | License |
|---|---|---|---|
| DataDreamer | Framework for creating synthetic datasets to train & evaluate LLMs | Python | Apache-2.0 |
| fabricator | Data generation toolkit for crafting synthetic training data | Python | MIT |
| Promptwright | Toolkit for prompt engineering, evaluation, and dataset curation | Python | Apache-2.0 |
| EasyInstruct | Instruction data generation framework for large-scale LLM training | Python | Apache-2.0 |
| Text Machina | Dataset generation framework for robust AI training | Python | Apache-2.0 |
| Platform | Description | Pricing | Features |
|---|---|---|---|
| Clarifai | Lightning-fast compute for AI models & agents | Free tier + Pay-as-you-go | Pre-trained models, Deploy your own models on Dedicated compute, Model training, Workflow automation |
| Modal | Serverless platform for AI/ML workloads | Pay-per-use | Serverless GPU, Auto-scaling |
| Replicate | Run open-source models with a cloud API | Pay-per-use | Pre-built models, Custom training |
| Together AI | Cloud platform for open-source models | Various | Open models, Fine-tuning |
| Anyscale | Ray-based platform for AI applications | Enterprise | Distributed training, Serving |
| RouteLLM | Dynamic router for selecting best LLMs based on cost & performance | Open-source | Cost optimization, Multi-LLM routing |
We welcome contributions! This toolkit grows stronger with community input.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-tool) - Add your contribution (new tool, template, or tutorial)
- Submit a pull request
- Quality over quantity - Focus on tools and resources that provide real value
- Production-ready - Include tools that work in real-world scenarios
- Well-documented - Provide clear descriptions and usage examples
- Up-to-date - Ensure tools are actively maintained
Get weekly AI engineering insights, tool reviews, and exclusive demos and AI Projects delivered to your inbox:
📧 Subscribe to AI Engineering Newsletter →
Join 100,000+ engineers building better LLM applications
Built with ❤️ for the AI Engineering community
Star ⭐ this repo if you find it helpful!
