Skip to content

Qredence/fleet-rlm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,559 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

fleet-rlm

PyPI version License: MIT CI PyPI Downloads

thumbnail

fleet-rlm is a web workspace for running recursive language-model tasks on top of DSPy and Daytona sandboxes. You chat with a ReAct agent in the browser; when a task is larger than a single context window, the agent delegates pieces to isolated sub-sandboxes, each running a bounded dspy.RLM per arXiv 2512.24601v2.

Who it's for. DSPy users who want a UI-driven workspace for long-context tasks, recursive decomposition, and sandboxed code execution — without hand-rolling the transport, persistence, and sandbox plumbing.

What it removes. Writing your own WebSocket transport, session persistence, Daytona sandbox lifecycle, execution-trace UI, and recursive-delegation policy around a DSPy program. fleet-rlm ships all of that behind a single uv run fleet web.

Try it in 30 seconds. See Quick Start below.

Docs · Contributing · Changelog · arXiv paper

Project Status

Solo-maintained by @Zochory. External contributions welcome — see CONTRIBUTING.md. No SLA; issues are reviewed as capacity allows.

Architecture at a Glance

Two layers, both dspy.*, both real:

  • Chat surfacedspy.ReAct for interactive turn-taking. Lives at src/fleet_rlm/runtime/agent/agent.py as FleetAgent.
  • Recursive enginedspy.RLM running inside a child Daytona sandbox. Built in src/fleet_rlm/runtime/models/builders.py; the recursive sub-query variant is build_recursive_subquery_rlm(). Implements Algorithm 1 from arXiv 2512.24601v2: inputs stored as REPL variables, sub-queries bounded by max_iterations and max_llm_calls.

How the ReAct Agent Delegates to dspy.RLM

The chat agent does not directly hand a task to a child RLM. Delegation is mediated by a specific ReAct tool, delegate_to_rlm, registered the same way as any other tool in the agent's tool registry:

User prompt
   ↓
FleetAgent  (dspy.ReAct, host LLM)
   │   decides the task exceeds one context and picks the tool:
   ↓
delegate_to_rlm(query, context="", document_url="")
   │   — src/fleet_rlm/runtime/tools/rlm_delegate.py
   │   — reads the active Daytona interpreter from a ContextVar
   │   — checks remaining LLM-call budget; returns error if exhausted
   │   — interpreter.build_delegate_child()   ← isolated child Daytona sandbox
   │   — optionally fetches document_url into the child's context
   ↓
build_recursive_subquery_rlm(
    interpreter=child,
    max_iterations=min(child.rlm_max_iterations, remaining_budget),
    max_llm_calls=remaining_budget,
)
   │   constructs the dspy.RLM bound to the child sandbox
   ↓
rlm(prompt=query, context=...)
   │   child RLM runs REPL-variable-mode: may call llm_query(),
   │   sub_rlm(), sub_rlm_batched() to recurse further inside its sandbox
   ↓
{"status": "ok", "answer": "..."}        ← bubbles back into the ReAct trace

Two entry points exist, and they share one budget:

  1. delegate_to_rlm() — from the host ReAct agent's tool registry (above).
  2. sub_rlm() / sub_rlm_batched() — from Python code already running inside a dspy.RLM sandbox, reaching back out through the Daytona bridge to spawn a further child.

Both go through DaytonaInterpreter.build_delegate_child() so child creation follows one backend-owned policy (default: RLM_CHILD_ISOLATION_MODE=auto — fork the parent sandbox if no durable volume is mounted, otherwise create a clean child with a child-specific volume_subpath). rlm_max_llm_calls is a single shared semantic-call budget across the entire recursive tree; sub_rlm_batched() caps sibling parallelism at 4.

Full details, including the local-workspace-snapshot fallback when a parent turn has no repo_url to recreate in the child, live in docs/architecture.md.

Quick Start

Add fleet-rlm to a uv-managed project and launch the Web UI:

# Create a project if you do not already have one
uv init

# Add fleet-rlm to the environment
uv add fleet-rlm

# Start the Web UI + API server
uv run fleet web

Open http://127.0.0.1:8000.

If you already have a uv project, skip uv init and just run uv add fleet-rlm.

Published installs already include built frontend assets, so end users do not need pnpm, vp, or a separate frontend build step.

Primary Workflows

Use the Web UI

uv run fleet web

This starts the main product surface with:

  • Workbench for adaptive chat and runtime execution
  • Volumes for runtime-backed file browsing
  • Optimization for DSPy evaluation and optimization workflows
  • Settings for runtime configuration and diagnostics

Use terminal chat

uv run fleet-rlm chat --trace-mode compact

Run the API directly

uv run fleet-rlm serve-api --host 127.0.0.1 --port 8000

Runtime Contract

fleet-rlm exposes a Daytona-only runtime contract:

  • execution_mode remains a per-turn execution hint.
  • Requests may include repo_url, repo_ref, context_paths, and batch_concurrency.
  • Durable mounted roots remain memory/, artifacts/, buffers/, and meta/.

The product is goal-first rather than repo-first. Repositories are one possible source of context, alongside local files, staged documents, pasted content, and URLs.

CLI Surfaces

This package exposes two command entrypoints:

  • fleet: lightweight launcher for terminal chat and fleet web
  • fleet-rlm: fuller Typer CLI for API and Daytona flows

Common commands:

# Web UI
uv run fleet web

# Terminal chat
uv run fleet
uv run fleet-rlm chat --trace-mode verbose

# FastAPI server
uv run fleet-rlm serve-api --port 8000

# Experimental Daytona validation
uv run fleet-rlm daytona-smoke --repo https://github.com/qredence/fleet-rlm.git --ref main

HTTP and WebSocket Contract

The current frontend/backend contract centers on:

  • /health
  • /ready
  • GET /api/v1/auth/me
  • GET /api/v1/sessions/state
  • /api/v1/runtime/*
  • POST /api/v1/traces/feedback
  • /api/v1/ws/execution
  • /api/v1/ws/execution/events

When AUTH_MODE=entra, HTTP and WebSocket access use real Entra bearer-token validation plus Neon-backed tenant admission. Runtime settings writes are intentionally limited to APP_ENV=local.

The canonical schema lives in openapi.yaml.

Source Development

From the repo root:

uv sync --all-extras
uv run fleet web

Frontend contributors should use pnpm inside src/frontend:

cd src/frontend
pnpm install --frozen-lockfile
pnpm run dev
pnpm run api:check
pnpm run type-check
pnpm run lint:robustness
pnpm run test:unit
pnpm run build

This repo explicitly uses pnpm for frontend work even though the packaged frontend is built with Vite+ under the hood.

Repo Layout

The maintained backend is easiest to read in this order:

  1. Recursive DSPy runtime core
    • src/fleet_rlm/runtime/agent/*
    • src/fleet_rlm/runtime/models/*
    • src/fleet_rlm/integrations/daytona/*
  2. Thin transport shell
    • src/fleet_rlm/api/main.py
    • src/fleet_rlm/api/routers/ws/*
    • src/fleet_rlm/api/runtime_services/*
  3. Offline DSPy quality and optimization layer
    • src/fleet_rlm/runtime/quality/*

That means:

  • runtime/agent/agent.py and runtime/agent/runtime.py are the main cognition loop.
  • integrations/daytona/interpreter.py and integrations/daytona/runtime.py are the execution and durable-memory substrate.
  • FastAPI/WebSocket modules are transport: auth, request parsing, session extraction, lifecycle, and event-envelope delivery.

The supported app surfaces are Workbench, Volumes, Optimization, and Settings. Legacy taxonomy, skills, memory, and analytics routes are no longer first-class product surfaces and should fall through to /404.

Design Principles

  • Keep the backend thin: transport + sandbox orchestration only, no business logic in API layers.
  • Preserve one shared frontend and WebSocket contract instead of parallel runtime modes.
  • Ship a UI that surfaces the runtime's streaming events, code execution, and artifacts rather than hiding them.
  • Expose both a user-facing Web UI and integration surfaces for CLI, HTTP, and WebSocket workflows.

Maintenance Commands

Common maintenance commands from the repo root:

# Clear caches and local generated artifacts
make clean

# Regenerate the canonical FastAPI schema after backend contract or doc-metadata changes
uv run python scripts/openapi_tools.py generate

# Validate the schema quality improvements in-flight
uv run python scripts/openapi_tools.py validate

# Sync frontend OpenAPI artifacts after the root spec changes
cd src/frontend
pnpm run api:sync

Validation

Repo-level validation:

make test-fast
make quality-gate
make release-artifacts
make release-check

# Focused backend/runtime regression lane
uv run pytest -q tests/ui/server/test_api_contract_routes.py tests/ui/server/test_router_runtime.py tests/ui/ws/test_chat_stream.py tests/unit/integrations/daytona/test_config.py tests/unit/integrations/daytona/test_runtime.py tests/unit/integrations/daytona/test_interpreter.py tests/unit/runtime/agent/test_chat_agent_runtime.py -m "not live_llm and not live_daytona and not benchmark"

Focused docs validation:

uv run python scripts/check_docs_quality.py
uv run python scripts/validate_release.py hygiene
uv run python scripts/validate_release.py metadata

Daytona Notes

Use this order for Daytona work:

  1. Set DAYTONA_API_KEY, DAYTONA_API_URL, and optional DAYTONA_TARGET.
  2. Run uv run fleet-rlm daytona-smoke --repo <url> [--ref <branch-or-sha>].

In local/default-local source checkouts, Daytona config resolution prefers repo .env / .env.local values over inherited shell exports so branch-local validation uses the checkout's intended credentials.

This repo treats DAYTONA_API_BASE_URL as a misconfiguration. Use DAYTONA_API_URL instead.

Documentation Map

About

DSPy's Recursive Language Model (RLM) with Daytona Sandbox for secure cloud-based code execution

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Sponsor this project

 

Packages

 
 
 

Contributors