A local-first LLM agent with RAG-based knowledge base, tool execution, and intelligent fallback to external providers.
- 📁 Project-Aware CLI: Defaults file tools and local data to the current working directory (override with
--project-rootorPROJECT_ROOT) - 🧾 Code Mode File Writes: Code mode outputs
FILE:blocks that the CLI applies directly to disk - 📄 Cross-Mode File Writes: File-create requests in other modes use the code-mode prompt and apply
FILE:blocks automatically - ⚙️ Agentic Auto Mode: Agentic runs always execute in auto mode without approval prompts (no shift+tab interruption)
- 📥 /ingest Command: Ingest a file, directory, or glob of files into the KB from chat
- 🧩 Ingest Dedup Fix: Avoid duplicate IDs when ingesting identical chunks in a single batch
- 🤖 Agentic Mode: Compact, persistent control loop with one-tool-per-step execution
- 📁 Agent State on Disk:
.agent/state, logs, and scratch outputs for long-running tasks - 🧰 Minimal Tool Set: Added
searchandwrite_filetools for grounded workflows - ✅ Dual Execution Modes: Per-step approvals or auto-run until completion
- ⚡ Agentic Health Summary: Fast, deterministic summary for common system health checks
- 📦 Codebase Summary: Deterministic overview when reading
README.md/AGENTS.md - 🔎 Smarter Search:
searchnow supports file globs like*.pyand ignores.agent/and.git/ - 🧾 JSON Retry Guard: Agentic loop retries once if model output isn't valid JSON
- 🧯 No-Match Guard: Stops repeated empty searches and suggests a real review path
- 📚 Codebase Bootstrap: Forces README/AGENTS/CLAUDE read before LLM for repo questions
- ☕ Multi-Goal Runs: Chain tasks with “then/and then/after that” and auto-advance goals
- ✂️ Comma Chaining: Split multi-goal prompts on commas for fire-and-forget tasks
- 🧰 Tool Arg Validation: Rejects malformed tool calls and retries
- 🧪 Review Bootstrap: Starts code-review goals with a TODO/FIXME/BUG scan
- 🧪 Pytest Bootstrap: Runs pytest once for test-related goals with a summary
- 🔊 TTS Tool + Service: Adds a Qwen3-TTS service and
ttstool (service-first with local fallback) - 🔈 TTS Bootstrap: Detects TTS requests, generates audio, and tracks last output for playback
- 🔉 TTS Playback Fallbacks: Remembers last audio path across runs and plays via available system tools (non-blocking)
- ⏱️ Agentic Timing: Shows per-step tool/runtime durations in the CLI
- 🎯 10 Agent Modes: Expanded mode set including agentic workflows
- Local-First: Prioritizes local Ollama models, only falling back to external providers when needed
- RAG Knowledge Base: ChromaDB-powered vector store with document ingestion and retrieval
- Intelligent Routing: LangGraph state machine routes queries to retrieval, tools, web search, or direct generation
- 10 Agent Modes: Specialized modes (chat, plan, agentic, ask, execute, code, image, research, debug, creative)
- Agentic Loop: Persistent, capped control loop with compact state and strict JSON actions
- On-Disk Agent State:
.agent/logs and summaries keep prompts tiny - Web Research: DuckDuckGo search with page content crawling and source synthesis
- Document Grading: LLM-based relevance grading with automatic query rewriting
- Multi-Provider Fallback: Automatic fallback chain (Ollama → Claude → GPT-4 → Gemini → Grok)
- Knowledge Base Updates: Automatically extracts and stores facts from external provider responses
- Sandboxed Tool Execution: Safe bash command execution with validation and approval
- TTS Integration: Qwen3-TTS service with agent tool wrapper (service-first, local fallback)
- Context Window Management: Track token usage and dynamically adjust context window size
- Runtime Model Switching: Switch between Ollama models at runtime with interactive selector (applies to planner/agentic helpers)
- Persistent State: Remembers last used model and mode between sessions
- Tab Completion: Auto-complete commands, modes, and models
- Conversation Memory: Agent remembers previous exchanges within a session
- Intelligent Suggestions: Provides contextual follow-up suggestions after each response
- Natural Language Tools: LLM interprets queries like "check disk space" and generates appropriate commands
- Multiple Interfaces: Interactive CLI and REST API with real-time status display
User Query
│
▼
┌─────────┐ ┌───────────┐ ┌─────────┐
│ Router │────▶│ Retriever │────▶│ Grader │
└─────────┘ └───────────┘ └─────────┘
│ │
│ [tool needed] ┌────────┴────────┐
▼ ▼ ▼
┌──────────┐ [docs relevant] [docs irrelevant]
│Tool Exec │ │ │
└──────────┘ ▼ ▼
│ ┌────────────┐ ┌───────────┐
│ │ Local Gen │ │ Rewrite │──┐
│ └────────────┘ └───────────┘ │
│ │ ▲ │
│ │ └────────┘
│ │ [max retries exceeded]
│ │ │
│ │ ▼
│ │ ┌────────────────┐
│ │ │External Fallback│
│ │ └────────────────┘
│ │ │
│ │ ▼
│ │ ┌────────────────┐
│ │ │ KB Updater │
│ │ └────────────────┘
│ │ │
└────────────────────┴────────────────────┘
│
▼
Response
The agent supports 10 specialized modes optimized for different tasks:
| Mode | Purpose | Routing Bias | Temperature | Use Case |
|---|---|---|---|---|
| chat | General conversation | Balanced | 0.7 | Default mode for mixed tasks |
| plan | Multi-step planning | generate | 0.3 | Breaking down complex tasks |
| agentic | Agentic loop control | Balanced | 0.2 | Multi-step tool orchestration |
| ask | Knowledge retrieval | retrieve | 0.5 | Querying the knowledge base |
| execute | Tool/bash execution | tool | 0.3 | Running system commands |
| code | Programming assistance | generate | 0.3 | Code generation and review |
| image | Image generation | image | 0.7 | Stable Diffusion prompts |
| research | Web search | web | 0.5 | Research with web crawling |
| debug | Verbose tracing | any | 0.5 | Debugging routing decisions |
| creative | Uncensored generation | generate | 0.9 | Creative and unrestricted output |
Switch modes with Shift+Tab or /mode <name>.
- Python 3.11+
- Ollama (for local LLM)
- Optional:
ddgspackage for web research (pip install ddgs)
cd /home/tyrel/projects/llm
pip install -e .# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull required models
ollama pull mistral:7b
ollama pull nomic-embed-text
# Start Ollama (usually starts automatically)
ollama servepython scripts/run.py checkpython scripts/run.py chatBy default, the CLI treats the current working directory as the project root for file tools and local data storage. Override with --project-root or PROJECT_ROOT.
Commands in chat:
/model- Show current model/model <name>- Switch to a different model (e.g.,/model deepseek-r1:7b)/models- Interactive model selector with numbered menu/mode- Show current mode/mode <name>- Switch to specific mode (chat, plan, ask, execute, code, image, research, debug, creative)/modes- List all available modes/context- Show current context window size/context <size>- Set context window (e.g.,/context 16384)/plan [task]- Enter planning mode for multi-step tasks/ingest <path>- Ingest a file or directory into the knowledge base/clear- Clear conversation history/stats- Show knowledge base, model, and context stats/help- Show help/quit- ExitShift+Tab- Cycle between modes!<command>- Execute shell command directly (e.g.,!ls -la)
Code mode supports direct file writes: responses formatted with FILE: path blocks are applied to disk automatically.
The CLI shows:
- Current model, mode, and context window on welcome screen
- Persistent state (restores last model/mode on startup)
- Real-time elapsed time while processing
- Processing steps (routing, retrieving, grading, etc.)
- Token usage per query with input/output breakdown
- Context window utilization percentage
- Grounding status:
Grounded (KB),Grounded (Tools),Local,External, orShell - Follow-up suggestions after each response
python scripts/run.py query "What is machine learning?"
# Use a specific model
python scripts/run.py query "Explain quantum computing" --model deepseek-r1:7bpython scripts/run.py serveAPI endpoints available at http://localhost:8000/docs:
| Endpoint | Method | Description |
|---|---|---|
/api/v1/query |
POST | Query the agent (supports optional model field) |
/api/v1/model |
GET | Get current model |
/api/v1/model |
POST | Switch model |
/api/v1/models |
GET | List available Ollama models |
/api/v1/ingest/text |
POST | Ingest raw text |
/api/v1/ingest/file |
POST | Ingest a file |
/api/v1/ingest/directory |
POST | Ingest a directory |
/api/v1/kb/stats |
GET | Get KB statistics |
/api/v1/kb/search |
POST | Search the KB |
/api/v1/kb/clear |
DELETE | Clear the KB |
/api/v1/history/clear |
POST | Clear chat history |
# Ingest a directory (recursive by default)
python scripts/ingest.py directory ./docs
# Ingest a single file
python scripts/ingest.py file ./document.pdf
# Ingest raw text
python scripts/ingest.py text "Some content to remember" --source "manual-input"
# View KB statistics
python scripts/ingest.py stats
# Clear the knowledge base
python scripts/ingest.py clear --yesSupported file types:
- Documents:
.md,.txt,.pdf - Code:
.py,.js,.ts,.java,.go,.rs,.c,.cpp,.rb,.php,.swift,.kt,.scala,.sh - Config:
.yaml,.yml,.json,.toml,.ini,.xml - Web:
.html,.css
Copy .env.example to .env and configure:
cp .env.example .envKey settings:
| Variable | Default | Description |
|---|---|---|
PROJECT_ROOT |
repo root (CLI defaults to cwd) | Project root for file tools and local data |
OLLAMA_BASE_URL |
http://localhost:11434 |
Ollama server URL |
OLLAMA_MODEL |
mistral:7b |
Local LLM model |
OLLAMA_EMBEDDING_MODEL |
nomic-embed-text |
Embedding model |
RETRIEVER_K |
4 |
Number of documents to retrieve |
CHUNK_SIZE |
1000 |
Document chunk size |
RELEVANCE_THRESHOLD |
0.7 |
Minimum relevance score |
FALLBACK_ENABLED |
true |
Enable external fallback |
BASH_REQUIRE_APPROVAL |
true |
Require approval for dangerous commands |
For fallback to external providers, add API keys to .env:
ANTHROPIC_API_KEY=your-anthropic-key
OPENAI_API_KEY=your-openai-key
GOOGLE_API_KEY=your-google-key
XAI_API_KEY=your-xai-keyFallback priority: Local Ollama → Claude → GPT-4o → Gemini → Grok
llm/
├── config/
│ ├── settings.py # Pydantic settings
│ └── providers.yaml # LiteLLM provider config
├── src/
│ ├── agent/
│ │ ├── graph.py # LangGraph state machine
│ │ ├── nodes.py # Workflow node implementations
│ │ └── state.py # Agent state schema
│ ├── llm/
│ │ ├── local.py # Ollama integration
│ │ └── selector.py # Local-first provider selection
│ ├── knowledge/
│ │ ├── vectorstore.py # ChromaDB operations
│ │ ├── retriever.py # RAG retrieval logic
│ │ ├── grader.py # Document relevance grading
│ │ └── updater.py # KB update from external sources
│ ├── ingestion/
│ │ ├── pipeline.py # Document ingestion orchestrator
│ │ └── loaders.py # File type loaders
│ ├── tools/
│ │ ├── registry.py # Tool registration
│ │ └── bash.py # Sandboxed bash execution
│ └── api/
│ ├── main.py # FastAPI application
│ └── routes.py # API endpoints
├── scripts/
│ ├── ingest.py # CLI for document ingestion
│ └── run.py # Main entry point
└── data/
├── documents/ # Source documents
└── chroma_db/ # Vector store
| Component | Technology |
|---|---|
| Agent Framework | LangGraph |
| Local LLM | Ollama (mistral:7b) |
| Vector Database | ChromaDB |
| Multi-Provider | LiteLLM |
| Embeddings | nomic-embed-text |
| API Framework | FastAPI |
| CLI Framework | Typer + Rich |
pip install -e ".[dev]"pytestruff check .
ruff format .mypy srcMIT