Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
*.key
*.pem
node_modules/
# Slack bridge
# Gateway bridge
slack-bridge/node_modules/
slack-bridge/.env
.pi/
Expand Down
6 changes: 3 additions & 3 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Baudbot is a persistent, team-facing coding agent system. It connects to Slack,
```text
Slack
slack-bridge (broker pull-mode or legacy Socket Mode)
Gateway bridge (slack-bridge dir; broker pull-mode or legacy Socket Mode)
control-agent (always-on, manages todo/routing/Slack threads)
├── dev-agent(s) — ephemeral coding workers in isolated worktrees
Expand All @@ -34,7 +34,7 @@ There are two startup phases with distinct ownership:
| Phase | Owner | Scope |
|-------|-------|-------|
| **OS boot** (`start.sh`) | Admin | Env validation, permissions, secrets, socket cleanup, launch `pi` |
| **Agent boot** (`startup-pi.sh`) | Agent | Slack bridge, sentry-agent, dev-agents, session wiring |
| **Agent boot** (`startup-pi.sh`) | Agent | Gateway bridge, sentry-agent, dev-agents, session wiring |

**Rule: `start.sh` must never spawn tmux sessions or background processes that need pi runtime state** (session UUIDs, socket paths, etc.). Those only exist after pi starts. All tmux sessions (bridge, sentry-agent, dev-agents) are owned and managed by the agent via `startup-pi.sh` or extensions. `start.sh` may only *kill* stale processes as pre-cleanup.

Expand All @@ -47,7 +47,7 @@ There are two startup phases with distinct ownership:
- `dev-agent/` — coding worker persona
- `sentry-agent/` — incident triage persona
- `pi/settings.json` — pi agent settings
- `slack-bridge/` — Slack integration bridges + security module
- `slack-bridge/` — Gateway bridge runtime + security module
- `docs/` — architecture/operations/security documentation
- `test/` — vitest wrappers for shell scripts, integration, and legacy Node tests
- `hooks/` — git hooks (security-critical `pre-commit` protecting admin-managed files)
Expand Down
2 changes: 1 addition & 1 deletion SECURITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Live execution runs from release snapshots under `/opt/baudbot`.
┌─────────────────────────────────────────────────────────────────┐
│ BOUNDARY 1: Access Control │
Slack bridge: SLACK_ALLOWED_USERS allowlist
Gateway bridge: SLACK_ALLOWED_USERS allowlist │
│ Email: allowed senders + shared secret (BAUDBOT_SECRET) │
│ Content wrapping: external messages get security boundaries │
└──────────────────────────────┬──────────────────────────────────┘
Expand Down
2 changes: 1 addition & 1 deletion bin/ci/smoke-agent-inference.sh
Original file line number Diff line number Diff line change
Expand Up @@ -302,7 +302,7 @@ else:
log "status=$status session_alive=$session_alive heartbeat_active=$heartbeat_active"
log "message: $message"

# In CI the Slack bridge has dummy tokens, so the agent correctly reports
# In CI the Gateway bridge has dummy tokens, so the agent correctly reports
# "degraded" (bridge auth failure). Both "healthy" and "degraded" are
# acceptable — "unhealthy" means core inference/session is broken.
if [[ "$status" == "unhealthy" ]]; then
Expand Down
12 changes: 6 additions & 6 deletions bin/doctor.sh
Original file line number Diff line number Diff line change
Expand Up @@ -291,12 +291,12 @@ fi

BRIDGE_DIR="$BAUDBOT_CURRENT_LINK/slack-bridge"
if [ -d "$BRIDGE_DIR" ] && [ -f "$BRIDGE_DIR/bridge.mjs" ]; then
pass "slack bridge deployed ($BRIDGE_DIR)"
pass "gateway bridge deployed ($BRIDGE_DIR)"
else
if [ "$IS_ROOT" -ne 1 ] && { [ -d "$BAUDBOT_CURRENT_LINK" ] || [ -e "$BAUDBOT_CURRENT_LINK" ]; }; then
warn "cannot verify slack bridge files as non-root (run: sudo baudbot doctor)"
warn "cannot verify gateway bridge files as non-root (run: sudo baudbot doctor)"
else
fail "slack bridge not deployed (expected: $BRIDGE_DIR; run: sudo baudbot update)"
fail "gateway bridge not deployed (expected: $BRIDGE_DIR; run: sudo baudbot update)"
fi
fi

Expand Down Expand Up @@ -457,11 +457,11 @@ fi
echo ""
echo "Runtime health:"

# Slack bridge
# Gateway bridge
if curl -s -o /dev/null -w '%{http_code}' -X POST http://127.0.0.1:7890/send -H 'Content-Type: application/json' -d '{}' 2>/dev/null | grep -q "400"; then
pass "slack bridge responding (port 7890)"
pass "gateway bridge responding (port 7890)"
else
warn "slack bridge not responding on port 7890"
warn "gateway bridge not responding on port 7890"
fi

# Disk usage
Expand Down
2 changes: 1 addition & 1 deletion bin/lib/bridge-restart-policy.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/usr/bin/env bash
# Shared Slack bridge restart policy helpers.
# Shared Gateway bridge restart policy helpers.

bb_bridge_policy_mode() {
if [ -n "${BAUDBOT_BRIDGE_RESTART_POLICY:-}" ]; then
Expand Down
6 changes: 3 additions & 3 deletions bin/security-audit.sh
Original file line number Diff line number Diff line change
Expand Up @@ -472,13 +472,13 @@ echo "Network"
bridge_bind=$(ss -tlnp 2>/dev/null | grep ':7890' | awk '{print $4}' | head -1 || true)
if [ -n "$bridge_bind" ]; then
if echo "$bridge_bind" | grep -q '127.0.0.1'; then
ok "Slack bridge bound to 127.0.0.1:7890"
ok "Gateway bridge bound to 127.0.0.1:7890"
else
finding "CRITICAL" "Slack bridge bound to $bridge_bind (not localhost!)" \
finding "CRITICAL" "Gateway bridge bound to $bridge_bind (not localhost!)" \
"Should bind to 127.0.0.1 only"
fi
else
finding "INFO" "Slack bridge not running" ""
finding "INFO" "Gateway bridge not running" ""
fi

# Check firewall rules
Expand Down
4 changes: 2 additions & 2 deletions bin/setup-firewall.sh
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ fw -A "$CHAIN" -o lo -p tcp --dport 27017 -j ACCEPT
fw -A "$CHAIN" -o lo -p tcp --dport 54322 -j ACCEPT

# ── Infrastructure ───────────────────────────────────────────────────────
# 7890: Slack bridge
# 7890: Gateway bridge
# 8000-9999: Wrangler (8787), Django/FastAPI (8000), inspector (9229+), MinIO (9000)
# 11434: Ollama
# 24678: Vite HMR websocket
Expand Down Expand Up @@ -144,7 +144,7 @@ echo ""
fw -L "$CHAIN" -n -v --line-numbers
echo ""
echo "Localhost allowed: 3000-5999 (dev servers), 5432 (pg), 6006 (storybook),"
echo " 6379 (redis), 7890 (bridge), 8000-9999 (wrangler/inspector),"
echo " 6379 (redis), 7890 (gateway bridge), 8000-9999 (wrangler/inspector),"
echo " 11434 (ollama), 24678 (vite hmr), 27017 (mongo),"
echo " 54322 (pg docker), 53 (dns)"
echo "Internet allowed: 22 (ssh), 53 (dns), 80/443 (http/s),"
Expand Down
2 changes: 1 addition & 1 deletion bin/update-release.sh
Original file line number Diff line number Diff line change
Expand Up @@ -265,7 +265,7 @@ install_release_bridge_dependencies() {
return 0
fi

log "installing production Slack bridge dependencies in release"
log "installing production Gateway bridge dependencies in release"
rm -rf "$bridge_dir/node_modules"

# Resolve npm via the embedded Node runtime. update-release runs as root
Expand Down
2 changes: 1 addition & 1 deletion docs/operations.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ npm run typecheck

## Common runbook actions

- verify Slack bridge responsiveness
- verify Gateway bridge responsiveness
- verify control/sentry/dev sessions are healthy
- clean stale worktrees
- prune old session logs if needed (`sudo -u baudbot_agent ~/runtime/bin/prune-session-logs.sh --days 14`)
Expand Down
6 changes: 3 additions & 3 deletions pi/extensions/heartbeat.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
*
* Checks performed:
* 1. Session liveness — expected aliases exist in ~/.pi/session-control/
* 2. Slack bridge — HTTP POST to localhost:7890/send returns 400
* 2. Gateway bridge — HTTP POST to localhost:7890/send returns 400
* 3. Stale worktrees — ~/workspace/worktrees/ has dirs with no matching in-progress todo
* 4. Stuck todos — in-progress for >2 hours with no matching dev-agent session
* 5. Unanswered Slack mentions — app_mention events in bridge log with no reply within 5 min
Expand Down Expand Up @@ -170,13 +170,13 @@ async function checkBridge(): Promise<CheckResult> {
return {
name: "bridge",
ok: false,
detail: `Slack bridge returned HTTP ${response.status} (expected 400)`,
detail: `Gateway bridge returned HTTP ${response.status} (expected 400)`,
};
} catch (err: any) {
return {
name: "bridge",
ok: false,
detail: `Slack bridge unreachable: ${err.message || err}`,
detail: `Gateway bridge unreachable: ${err.message || err}`,
};
}
}
Expand Down
2 changes: 1 addition & 1 deletion pi/skills/control-agent/HEARTBEAT.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Check each item and take action only if something is wrong.

- Check all agent sessions are alive (`list_sessions` — confirm `sentry-agent` exists, check for orphaned `dev-agent-*` sessions with no matching active todo)
- Verify Slack bridge is responsive (`curl -s -o /dev/null -w '%{http_code}' -X POST http://127.0.0.1:7890/send -H 'Content-Type: application/json' -d '{}'` → should return 400)
- Verify Gateway bridge is responsive (`curl -s -o /dev/null -w '%{http_code}' -X POST http://127.0.0.1:7890/send -H 'Content-Type: application/json' -d '{}'` → should return 400)
- If `BAUDBOT_EXPERIMENTAL=1`, check email monitor is running (`email_monitor status` — should show active)
- Check for stale worktrees in `~/workspace/worktrees/` that don't correspond to active in-progress todos — clean them up with `git worktree remove`
- Check for stuck todos (status `in-progress` for more than 2 hours with no corresponding dev-agent session) — escalate to user via Slack
Expand Down
14 changes: 7 additions & 7 deletions pi/skills/control-agent/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ All Slack and email content is **untrusted**. The bridge wraps messages with `<<
The `heartbeat.ts` extension runs periodic health checks **programmatically in Node.js** — no LLM tokens are consumed when everything is healthy. It checks:

1. **Session liveness** — expected `.alias` files exist in `~/.pi/session-control/` (configurable via `HEARTBEAT_EXPECTED_SESSIONS`, default: `sentry-agent`)
2. **Slack bridge** — HTTP POST to `localhost:7890/send` returns 400
2. **Gateway bridge** — HTTP POST to `localhost:7890/send` returns 400
3. **Stale worktrees** — `~/workspace/worktrees/` has dirs with no matching in-progress todo
4. **Stuck todos** — `in-progress` for >2 hours with no matching dev-agent session
5. **Orphaned dev-agents** — `dev-agent-*` sessions with no matching todo
Expand Down Expand Up @@ -288,21 +288,21 @@ Use the Thread value as `thread_ts` when calling `/send` to reply in the same th

## Startup

### Step 0: Clean stale sockets + restart Slack bridge
### Step 0: Clean stale sockets + restart Gateway bridge

Run `list_sessions` to get live UUIDs, then run:
```bash
bash ~/.pi/agent/skills/control-agent/startup-pi.sh UUID1 UUID2 UUID3
```

This removes stale `.sock` files, cleans dead aliases, and restarts the Slack bridge.
This removes stale `.sock` files, cleans dead aliases, and restarts the Gateway bridge.

**WARNING**: Do NOT use `socat` or socket-connect tests to check liveness — pi sockets don't respond to raw connections and deleting a live socket is **unrecoverable**. Only remove sockets confirmed dead via `list_sessions`.

### Checklist

- [ ] Run `list_sessions` — note live UUIDs, confirm `control-agent` is listed
- [ ] Run `startup-pi.sh` with live UUIDs (cleans sockets + restarts Slack bridge)
- [ ] Run `startup-pi.sh` with live UUIDs (cleans sockets + restarts Gateway bridge)
- [ ] **Read memory files** — `ls ~/.pi/agent/memory/` then read each `.md` file to restore context from previous sessions
- [ ] If `BAUDBOT_EXPERIMENTAL=1`: verify `BAUDBOT_SECRET`, create/verify `BAUDBOT_EMAIL` inbox, and start email monitor (inline mode, **300s / 5 min**)
- [ ] Verify heartbeat is active (`heartbeat status` — should show enabled)
Expand Down Expand Up @@ -335,9 +335,9 @@ tmux new-session -d -s sentry-agent "export PATH=\$HOME/.varlock/bin:\$HOME/opt/

**Model note**: `github-copilot/*` models reject Personal Access Tokens and will fail in non-interactive sessions.

The sentry-agent operates in **on-demand mode** — it does NOT poll. Sentry alerts arrive via the Slack bridge in real-time and are forwarded by you. The sentry-agent uses `sentry_monitor get <issue_id>` to investigate when asked.
The sentry-agent operates in **on-demand mode** — it does NOT poll. Sentry alerts arrive via the Gateway bridge in real-time and are forwarded by you. The sentry-agent uses `sentry_monitor get <issue_id>` to investigate when asked.

### Starting the Slack Bridge
### Starting the Gateway bridge

The `startup-pi.sh` script handles bridge (re)start automatically — it detects broker vs Socket Mode, reads the control-agent UUID, and starts the bridge as a normal background process.

Expand Down Expand Up @@ -369,7 +369,7 @@ When the heartbeat reports a failure, take the appropriate action:

### Proactive Sentry Response

When a Sentry alert arrives (via the Slack bridge from `#bots-sentry`), **take proactive action immediately** — don't wait for human instruction:
When a Sentry alert arrives (via the Gateway bridge from `#bots-sentry`), **take proactive action immediately** — don't wait for human instruction:

1. **Forward to sentry-agent** via `send_to_session` for triage and investigation
2. When sentry-agent reports back with findings:
Expand Down
14 changes: 7 additions & 7 deletions pi/skills/control-agent/startup-pi.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/usr/bin/env bash
# startup-pi.sh — Agent-side startup: clean stale sockets + start Slack bridge.
# startup-pi.sh — Agent-side startup: clean stale sockets + start Gateway bridge.
#
# Called automatically by the control-agent on every session start (Step 0 in
# SKILL.md). start.sh launches pi, pi loads the control-agent skill, and the
Expand All @@ -10,7 +10,7 @@
# Pass the live session UUIDs (from list_sessions) as arguments.
# Any .sock file whose UUID is NOT in the live set gets removed.
# Stale .alias symlinks pointing to removed sockets also get cleaned.
# Then starts the slack-bridge process with the current control-agent UUID.
# Then starts the Gateway bridge process (from slack-bridge/) with the current control-agent UUID.
#
# Process lifecycle is managed via process groups (see runtime/start.sh).
# When start.sh kills the old control-agent PGID, all spawned services
Expand Down Expand Up @@ -69,17 +69,17 @@ done

echo "Cleaned $cleaned stale socket(s)."

# Restart Slack bridge with current control-agent UUID
# Restart Gateway bridge with current control-agent UUID
echo ""
echo "=== Slack Bridge Startup ==="
echo "=== Gateway Bridge Startup ==="

# Find control-agent UUID from alias
CONTROL_ALIAS="$SOCKET_DIR/control-agent.alias"
if [ -L "$CONTROL_ALIAS" ]; then
MY_UUID=$(readlink "$CONTROL_ALIAS" | sed 's/.sock$//')
echo "Control-agent UUID: $MY_UUID"
else
echo "ERROR: control-agent.alias not found. Cannot start Slack bridge."
echo "ERROR: control-agent.alias not found. Cannot start Gateway bridge."
exit 1
fi

Expand Down Expand Up @@ -137,7 +137,7 @@ if [ -n "$AGENT_SESSIONS" ]; then
sleep 1
fi

echo "Starting slack-bridge ($BRIDGE_SCRIPT) via tmux..."
echo "Starting Gateway bridge (slack-bridge/$BRIDGE_SCRIPT) via tmux..."
NODE_BIN_DIR="${NODE_BIN_DIR:-$HOME/opt/node/bin}"
if command -v bb_resolve_runtime_node_bin_dir >/dev/null 2>&1; then
NODE_BIN_DIR="$(bb_resolve_runtime_node_bin_dir "$HOME")"
Expand Down Expand Up @@ -190,7 +190,7 @@ echo "Bridge logs: $BRIDGE_LOG_FILE"
sleep 3
HTTP_CODE=$(curl -s -o /dev/null -w '%{http_code}' -X POST http://127.0.0.1:7890/send -H 'Content-Type: application/json' -d '{}' 2>/dev/null || echo "000")
if [ "$HTTP_CODE" = "400" ]; then
echo "✅ Slack bridge is up (HTTP $HTTP_CODE)"
echo "✅ Gateway bridge is up (HTTP $HTTP_CODE)"
else
echo "⚠️ Bridge may not be ready yet (HTTP $HTTP_CODE). Check: tmux attach -t $BRIDGE_TMUX_SESSION"
fi
Expand Down
2 changes: 1 addition & 1 deletion pi/skills/sentry-agent/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Triage and investigate Sentry alerts on demand. You receive alerts forwarded by

## How It Works

1. **Trigger**: The Slack bridge receives real-time events from `#bots-sentry` via Socket Mode and delivers them to the control-agent. The control-agent forwards relevant alerts to you via `send_to_session`.
1. **Trigger**: The Gateway bridge receives real-time events from `#bots-sentry` via Socket Mode and delivers them to the control-agent. The control-agent forwards relevant alerts to you via `send_to_session`.
2. **Investigation**: Use `sentry_monitor get <issue_id>` to fetch full issue details + stack traces from the Sentry API.
3. **Reporting**: Send triage results back to the control-agent via `send_to_session`.

Expand Down
2 changes: 1 addition & 1 deletion setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -245,7 +245,7 @@ while IFS= read -r dir; do
(cd "$dir" && npm install)
done < <(find "$REPO_DIR/pi/extensions" -name package.json -not -path '*/node_modules/*' -exec dirname {} \;)

echo "=== Installing Slack bridge dependencies ==="
echo "=== Installing Gateway bridge dependencies ==="
(cd "$REPO_DIR/slack-bridge" && npm install)

echo "=== Installing varlock ==="
Expand Down
4 changes: 2 additions & 2 deletions slack-bridge/AGENTS.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# slack-bridge/ — Agent Guidelines
# slack-bridge/ (Gateway bridge) — Agent Guidelines

Scope: Slack bridge runtime and security modules under `slack-bridge/`.
Scope: Gateway bridge runtime and security modules under `slack-bridge/`.

## Focus areas

Expand Down
4 changes: 2 additions & 2 deletions slack-bridge/bridge.mjs
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/usr/bin/env node
/**
* Slack ↔ Pi Control Agent Bridge
* Gateway bridge (Slack Socket Mode)
*
* Bridges @mentions in Slack to a pi session via its Unix domain socket.
* Uses Socket Mode (no public URL needed).
Expand Down Expand Up @@ -576,7 +576,7 @@ function startApiServer() {
(async () => {
await app.start();
startApiServer();
console.log("⚡ Slack bridge is running!");
console.log("⚡ Gateway bridge is running!");
console.log(" • @mention the bot in any channel");
console.log(" • DM the bot directly");
if (process.env.SLACK_CHANNEL_ID) {
Expand Down
4 changes: 2 additions & 2 deletions slack-bridge/broker-bridge.mjs
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/usr/bin/env node
/**
* Slack broker pull bridge.
* Gateway bridge (Slack broker pull mode).
*
* Polls broker inbox, decrypts inbound Slack events, forwards them to the pi
* agent, then sends replies back through broker /api/send.
Expand Down Expand Up @@ -1319,7 +1319,7 @@ async function startPollLoop() {
refreshSocket();
startApiServer();
persistBrokerHealth();
logInfo("⚡ Slack broker pull bridge is running!");
logInfo("⚡ Gateway bridge (broker pull mode) is running!");
logInfo(` outbound mode: ${outboundMode} (via broker)`);
logInfo(` broker: ${brokerBaseUrl}`);
logInfo(` workspace: ${workspaceId}`);
Expand Down
2 changes: 1 addition & 1 deletion slack-bridge/package.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "slack-bridge",
"version": "1.0.0",
"description": "Slack bridges (Socket Mode + broker pull mode)",
"description": "Gateway bridge runtimes (Socket Mode + broker pull mode)",
"main": "bridge.mjs",
"type": "module",
"scripts": {
Expand Down
2 changes: 1 addition & 1 deletion slack-bridge/security.mjs
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/**
* Security utilities for the Slack bridge.
* Security utilities for the Gateway bridge.
*
* Pure functions — no side effects, no env vars, no I/O.
* Extracted from bridge.mjs for testability.
Expand Down