fleet-rlm is a web workspace for running recursive language-model tasks on top of DSPy and Daytona sandboxes. You chat with a ReAct agent in the browser; when a task is larger than a single context window, the agent delegates pieces to isolated sub-sandboxes, each running a bounded dspy.RLM per arXiv 2512.24601v2.
Who it's for. DSPy users who want a UI-driven workspace for long-context tasks, recursive decomposition, and sandboxed code execution — without hand-rolling the transport, persistence, and sandbox plumbing.
What it removes. Writing your own WebSocket transport, session persistence, Daytona sandbox lifecycle, execution-trace UI, and recursive-delegation policy around a DSPy program. fleet-rlm ships all of that behind a single uv run fleet web.
Try it in 30 seconds. See Quick Start below.
Docs · Contributing · Changelog · arXiv paper
Solo-maintained by @Zochory. External contributions welcome — see CONTRIBUTING.md. No SLA; issues are reviewed as capacity allows.
Two layers, both dspy.*, both real:
- Chat surface —
dspy.ReActfor interactive turn-taking. Lives atsrc/fleet_rlm/runtime/agent/agent.pyasFleetAgent. - Recursive engine —
dspy.RLMrunning inside a child Daytona sandbox. Built insrc/fleet_rlm/runtime/models/builders.py; the recursive sub-query variant isbuild_recursive_subquery_rlm(). Implements Algorithm 1 from arXiv 2512.24601v2: inputs stored as REPL variables, sub-queries bounded bymax_iterationsandmax_llm_calls.
The chat agent does not directly hand a task to a child RLM. Delegation is mediated by a specific ReAct tool, delegate_to_rlm, registered the same way as any other tool in the agent's tool registry:
User prompt
↓
FleetAgent (dspy.ReAct, host LLM)
│ decides the task exceeds one context and picks the tool:
↓
delegate_to_rlm(query, context="", document_url="")
│ — src/fleet_rlm/runtime/tools/rlm_delegate.py
│ — reads the active Daytona interpreter from a ContextVar
│ — checks remaining LLM-call budget; returns error if exhausted
│ — interpreter.build_delegate_child() ← isolated child Daytona sandbox
│ — optionally fetches document_url into the child's context
↓
build_recursive_subquery_rlm(
interpreter=child,
max_iterations=min(child.rlm_max_iterations, remaining_budget),
max_llm_calls=remaining_budget,
)
│ constructs the dspy.RLM bound to the child sandbox
↓
rlm(prompt=query, context=...)
│ child RLM runs REPL-variable-mode: may call llm_query(),
│ sub_rlm(), sub_rlm_batched() to recurse further inside its sandbox
↓
{"status": "ok", "answer": "..."} ← bubbles back into the ReAct trace
Two entry points exist, and they share one budget:
delegate_to_rlm()— from the host ReAct agent's tool registry (above).sub_rlm()/sub_rlm_batched()— from Python code already running inside adspy.RLMsandbox, reaching back out through the Daytona bridge to spawn a further child.
Both go through DaytonaInterpreter.build_delegate_child() so child creation follows one backend-owned policy (default: RLM_CHILD_ISOLATION_MODE=auto — fork the parent sandbox if no durable volume is mounted, otherwise create a clean child with a child-specific volume_subpath). rlm_max_llm_calls is a single shared semantic-call budget across the entire recursive tree; sub_rlm_batched() caps sibling parallelism at 4.
Full details, including the local-workspace-snapshot fallback when a parent turn has no repo_url to recreate in the child, live in docs/architecture.md.
Add fleet-rlm to a uv-managed project and launch the Web UI:
# Create a project if you do not already have one
uv init
# Add fleet-rlm to the environment
uv add fleet-rlm
# Start the Web UI + API server
uv run fleet webOpen http://127.0.0.1:8000.
If you already have a uv project, skip uv init and just run uv add fleet-rlm.
Published installs already include built frontend assets, so end users do not need pnpm, vp, or a separate frontend build step.
uv run fleet webThis starts the main product surface with:
Workbenchfor adaptive chat and runtime executionVolumesfor runtime-backed file browsingOptimizationfor DSPy evaluation and optimization workflowsSettingsfor runtime configuration and diagnostics
uv run fleet-rlm chat --trace-mode compactuv run fleet-rlm serve-api --host 127.0.0.1 --port 8000fleet-rlm exposes a Daytona-only runtime contract:
execution_moderemains a per-turn execution hint.- Requests may include
repo_url,repo_ref,context_paths, andbatch_concurrency. - Durable mounted roots remain
memory/,artifacts/,buffers/, andmeta/.
The product is goal-first rather than repo-first. Repositories are one possible source of context, alongside local files, staged documents, pasted content, and URLs.
This package exposes two command entrypoints:
fleet: lightweight launcher for terminal chat andfleet webfleet-rlm: fuller Typer CLI for API and Daytona flows
Common commands:
# Web UI
uv run fleet web
# Terminal chat
uv run fleet
uv run fleet-rlm chat --trace-mode verbose
# FastAPI server
uv run fleet-rlm serve-api --port 8000
# Experimental Daytona validation
uv run fleet-rlm daytona-smoke --repo https://github.com/qredence/fleet-rlm.git --ref mainThe current frontend/backend contract centers on:
/health/readyGET /api/v1/auth/meGET /api/v1/sessions/state/api/v1/runtime/*POST /api/v1/traces/feedback/api/v1/ws/execution/api/v1/ws/execution/events
When AUTH_MODE=entra, HTTP and WebSocket access use real Entra bearer-token validation plus Neon-backed tenant admission. Runtime settings writes are intentionally limited to APP_ENV=local.
The canonical schema lives in openapi.yaml.
From the repo root:
uv sync --all-extras
uv run fleet webFrontend contributors should use pnpm inside src/frontend:
cd src/frontend
pnpm install --frozen-lockfile
pnpm run dev
pnpm run api:check
pnpm run type-check
pnpm run lint:robustness
pnpm run test:unit
pnpm run buildThis repo explicitly uses pnpm for frontend work even though the packaged frontend is built with Vite+ under the hood.
The maintained backend is easiest to read in this order:
- Recursive DSPy runtime core
src/fleet_rlm/runtime/agent/*src/fleet_rlm/runtime/models/*src/fleet_rlm/integrations/daytona/*
- Thin transport shell
src/fleet_rlm/api/main.pysrc/fleet_rlm/api/routers/ws/*src/fleet_rlm/api/runtime_services/*
- Offline DSPy quality and optimization layer
src/fleet_rlm/runtime/quality/*
That means:
runtime/agent/agent.pyandruntime/agent/runtime.pyare the main cognition loop.integrations/daytona/interpreter.pyandintegrations/daytona/runtime.pyare the execution and durable-memory substrate.- FastAPI/WebSocket modules are transport: auth, request parsing, session extraction, lifecycle, and event-envelope delivery.
The supported app surfaces are Workbench, Volumes, Optimization, and Settings. Legacy taxonomy, skills, memory, and analytics routes are no longer first-class product surfaces and should fall through to /404.
- Keep the backend thin: transport + sandbox orchestration only, no business logic in API layers.
- Preserve one shared frontend and WebSocket contract instead of parallel runtime modes.
- Ship a UI that surfaces the runtime's streaming events, code execution, and artifacts rather than hiding them.
- Expose both a user-facing Web UI and integration surfaces for CLI, HTTP, and WebSocket workflows.
Common maintenance commands from the repo root:
# Clear caches and local generated artifacts
make clean
# Regenerate the canonical FastAPI schema after backend contract or doc-metadata changes
uv run python scripts/openapi_tools.py generate
# Validate the schema quality improvements in-flight
uv run python scripts/openapi_tools.py validate
# Sync frontend OpenAPI artifacts after the root spec changes
cd src/frontend
pnpm run api:syncRepo-level validation:
make test-fast
make quality-gate
make release-artifacts
make release-check
# Focused backend/runtime regression lane
uv run pytest -q tests/ui/server/test_api_contract_routes.py tests/ui/server/test_router_runtime.py tests/ui/ws/test_chat_stream.py tests/unit/integrations/daytona/test_config.py tests/unit/integrations/daytona/test_runtime.py tests/unit/integrations/daytona/test_interpreter.py tests/unit/runtime/agent/test_chat_agent_runtime.py -m "not live_llm and not live_daytona and not benchmark"Focused docs validation:
uv run python scripts/check_docs_quality.py
uv run python scripts/validate_release.py hygiene
uv run python scripts/validate_release.py metadataUse this order for Daytona work:
- Set
DAYTONA_API_KEY,DAYTONA_API_URL, and optionalDAYTONA_TARGET. - Run
uv run fleet-rlm daytona-smoke --repo <url> [--ref <branch-or-sha>].
In local/default-local source checkouts, Daytona config resolution prefers repo .env / .env.local values over inherited shell exports so branch-local validation uses the checkout's intended credentials.
This repo treats DAYTONA_API_BASE_URL as a misconfiguration. Use DAYTONA_API_URL instead.