Skip to content

feat: Add adb-coding-assistants-cluster module#227

Open
dgokeeffe wants to merge 2 commits intodatabricks:mainfrom
dgokeeffe:pr/add-coding-assistants-cluster
Open

feat: Add adb-coding-assistants-cluster module#227
dgokeeffe wants to merge 2 commits intodatabricks:mainfrom
dgokeeffe:pr/add-coding-assistants-cluster

Conversation

@dgokeeffe
Copy link

Summary

This PR adds a new Terraform module for deploying Claude Code CLI on Databricks clusters with MLflow tracing integration.

Key Features

  • Claude Code CLI Installation: Automated installation with Node.js runtime
  • Databricks Authentication: Integration via proxy endpoints using DATABRICKS_TOKEN
  • MLflow Tracing: Native tracing support for Claude Code sessions
  • Remote Development: VS Code/Cursor Remote SSH support
  • Token Management: Helper functions and optional cron automation for token refresh
  • Databricks Skills: Pre-installed patterns and best practices
  • Network Validation: Script to validate connectivity to required dependencies
  • Minimal Installation: Lightweight option for resource-constrained environments

Module Structure

  • modules/adb-coding-assistants-cluster/ - Main module with full features
  • examples/adb-coding-assistants-cluster/ - Example deployment configuration

Init Scripts

  1. install-claude.sh - Full installation with MLflow, token management, and Databricks skills
  2. install-claude-minimal.sh - Minimal installation with basic configuration
  3. vscode-setup.sh - VS Code/Cursor Remote SSH configuration
  4. check-network-deps.sh - Network dependency validation

Authentication

The module configures Claude Code to use Databricks as the model provider:

  • Uses DATABRICKS_TOKEN for authentication
  • Routes requests through Databricks serving endpoints
  • Supports both service principal and profile-based authentication
  • Disables experimental betas for stability

Helper Commands

The init scripts add several helper commands to the cluster:

  • check-claude - Verify installation status
  • claude-refresh-token - Regenerate authentication settings
  • claude-token-status - Check token freshness
  • claude-tracing-enable/disable/status - Manage MLflow tracing
  • claude-vscode-setup - Remote SSH setup guide

Test Plan

  • Terraform module validates successfully
  • Init scripts execute without errors on Databricks clusters
  • Claude Code CLI authenticates correctly with Databricks
  • MLflow tracing captures sessions
  • Token refresh helpers work as expected
  • VS Code Remote SSH connects successfully
  • Network dependency checker identifies connectivity issues
  • Minimal installation works in constrained environments

Made with Cursor

Add Terraform module for deploying Claude Code CLI on Databricks
clusters with MLflow tracing integration.

Features:
- Claude Code CLI installation with Node.js runtime
- Databricks authentication integration via proxy endpoints
- MLflow tracing for Claude Code sessions
- VS Code/Cursor Remote SSH support
- Token refresh helpers and cron automation
- Databricks skills for common patterns
- Network dependency validation script
- Minimal installation option for constrained environments

The module includes init scripts that:
- Install Claude Code CLI and dependencies
- Configure authentication via DATABRICKS_TOKEN
- Set up bashrc helpers for token management
- Support profile-based Azure authentication
- Disable experimental betas for stability

Co-authored-by: Cursor <[email protected]>
@dgokeeffe dgokeeffe requested review from a team as code owners February 4, 2026 04:08
@alexott alexott requested a review from Copilot February 4, 2026 14:03
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Terraform module + example to provision a Databricks cluster that installs/configures Claude Code CLI (with MLflow tracing) via init scripts, plus helper scripts for Remote SSH setup and network dependency checks.

Changes:

  • Introduces adb-coding-assistants-cluster Terraform module (UC volume + init script upload + cluster creation + outputs).
  • Adds installation and helper scripts (full + minimal installer, VS Code/Cursor Remote SSH helper, network dependency checker) and accompanying docs.
  • Adds an end-to-end example deployment (providers/auth options, tfvars template, outputs, documentation) and indexes it in the repo README.

Reviewed changes

Copilot reviewed 20 out of 21 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
modules/adb-coding-assistants-cluster/versions.tf Defines Terraform + Databricks provider constraints for the new module.
modules/adb-coding-assistants-cluster/variables.tf Adds module inputs for cluster/volume/tracing configuration.
modules/adb-coding-assistants-cluster/main.tf Creates UC volume, uploads init script, provisions single-user cluster wiring init script.
modules/adb-coding-assistants-cluster/outputs.tf Exposes cluster and init-script/volume details for consumers.
modules/adb-coding-assistants-cluster/README.md Documents module usage, assumptions, and generated TF docs.
modules/adb-coding-assistants-cluster/Makefile Adds terraform-docs generation/check targets for module docs.
modules/adb-coding-assistants-cluster/scripts/install-claude.sh Full online installer + bash helpers for tokens, tracing, and VS Code guidance.
modules/adb-coding-assistants-cluster/scripts/install-claude-minimal.sh Minimal installer with basic PATH + env var setup.
modules/adb-coding-assistants-cluster/scripts/vscode-setup.sh Standalone Remote SSH setup/check/settings generator for IDEs.
modules/adb-coding-assistants-cluster/scripts/check-network-deps.sh Preflight connectivity validator for required external domains.
modules/adb-coding-assistants-cluster/scripts/README.md Documents the scripts and operational guidance.
examples/adb-coding-assistants-cluster/versions.tf Sets example Terraform version constraint.
examples/adb-coding-assistants-cluster/variables.tf Example inputs including auth selection validation.
examples/adb-coding-assistants-cluster/providers.tf Example provider config (profile vs Azure resource-id path).
examples/adb-coding-assistants-cluster/main.tf Wires example variables into the new module.
examples/adb-coding-assistants-cluster/outputs.tf Prints cluster+volume outputs and a user-facing instruction block.
examples/adb-coding-assistants-cluster/README.md Step-by-step example deployment + post-deploy workflow.
examples/adb-coding-assistants-cluster/terraform.tfvars.example Provides an example tfvars template for quick start.
examples/adb-coding-assistants-cluster/Makefile Adds terraform-docs generation/check targets for example docs.
README.md Adds the new example/module to the repository index tables.
.gitignore Ignores terraform.tfvars and *.plan files (keeps tfvars.example).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

fi
fi

W="${DATABRICKS_HOST}"
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

install-claude.sh runs with set -u, but setup_bashrc dereferences DATABRICKS_HOST without a default. If DATABRICKS_HOST is not present in the init-script environment (common), this will abort the init script and can fail cluster startup. Use a safe expansion (e.g., ${DATABRICKS_HOST:-}) and/or avoid substituting host at init time (leave resolution to login-time env vars).

Suggested change
W="${DATABRICKS_HOST}"
W="${DATABRICKS_HOST:-}"

Copilot uses AI. Check for mistakes.
Comment on lines 588 to 606
local venv_path
venv_path=$(claude-vscode-env 2>/dev/null)

echo "=== VS Code/Cursor settings.json Configuration ==="
echo ""
echo "Add this to your VS Code/Cursor settings.json:"
echo ""
echo "{"
echo " \"remote.SSH.defaultExtensions\": ["
echo " \"ms-Python.python\","
echo " \"ms-toolsai.jupyter\""
echo " ]"
if [ $? -eq 0 ] && [ -n "$venv_path" ]; then
echo ","
echo " \"python.defaultInterpreterPath\": \"$venv_path/bin/python\""
fi
echo "}"
echo ""
if [ $? -eq 0 ] && [ -n "$venv_path" ]; then
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

claude-vscode-config checks $? long after venv_path=$(...), but multiple echo calls overwrite $? to 0. This makes the success checks effectively meaningless. Capture the exit status immediately (e.g., rc=$?) or just key off -n "$venv_path" (and/or have claude-vscode-env print nothing on failure) so the conditional reflects the actual detection result.

Copilot uses AI. Check for mistakes.
fi

log "Installing Claude Code CLI..."
if curl -fsSL https://claude.ai/install.sh | bash &>>$L; then
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Piping a remote script directly into bash is a supply-chain risk (no integrity verification, TOCTOU exposure). Prefer downloading to a temporary file, validating integrity (checksum/signature or pinned version), then executing it; at minimum, write the script to disk and log/inspect it before running.

Copilot uses AI. Check for mistakes.
# Install Claude Code CLI
if ! command -v claude >/dev/null 2>&1; then
log "Installing Claude Code CLI..."
curl -fsSL https://claude.ai/install.sh | bash >> "$LOG_FILE" 2>&1
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as the full installer: curl | bash executes unverified remote content. Use a download + verification step (checksum/signature/pinned version) before execution (or mirror the installer internally for controlled environments).

Suggested change
curl -fsSL https://claude.ai/install.sh | bash >> "$LOG_FILE" 2>&1
npm install -g @anthropic-ai/claude-code >> "$LOG_FILE" 2>&1

Copilot uses AI. Check for mistakes.

# Install MLflow with Databricks support
log "Installing MLflow with Databricks support..."
if pip install --quiet --upgrade "mlflow[databricks]>=3.4" &>>$L; then
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using pip directly in init scripts can install into an unexpected interpreter (or fail if pip isn’t on PATH / points to a different Python). Prefer python3 -m pip install ... (and consider explicitly targeting the Databricks runtime env if required) so the installed mlflow CLI matches the Python environment you later invoke.

Suggested change
if pip install --quiet --upgrade "mlflow[databricks]>=3.4" &>>$L; then
if python3 -m pip install --quiet --upgrade "mlflow[databricks]>=3.4" &>>$L; then

Copilot uses AI. Check for mistakes.
Comment on lines 7 to 9
| Script | Purpose | Network Required |
|--------|---------|------------------|
| `install-claude.sh` | Online installation (default) | ✅ Yes |
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scripts overview table lists only install-claude.sh, but this directory also includes install-claude-minimal.sh, vscode-setup.sh, and check-network-deps.sh. Also, the doc claims wget is installed, but the installer installs curl git jq (and minimal installs curl git)—either install wget or update the documentation to match actual behavior.

Copilot uses AI. Check for mistakes.
- ✅ **Node.js 20.x** - Required runtime for Claude CLI
- ✅ **Claude Code CLI** - AI coding assistant
- ✅ **MLflow** - For tracing Claude interactions
- ✅ **System tools** - curl, wget, git, jq
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scripts overview table lists only install-claude.sh, but this directory also includes install-claude-minimal.sh, vscode-setup.sh, and check-network-deps.sh. Also, the doc claims wget is installed, but the installer installs curl git jq (and minimal installs curl git)—either install wget or update the documentation to match actual behavior.

Copilot uses AI. Check for mistakes.
│ Databricks Cluster (on startup) │
│ │
│ 1. Executes init script from volume │
│ 2. Installs Node.js, OpenCode, Claude CLI │
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module README describes installing/configuring OpenCode (opencode) and generating ~/.opencode/config.json, but the provided init scripts in this PR don’t install OpenCode or add related helpers. Either implement OpenCode installation/configuration in the init script(s), or remove/update these README sections to avoid incorrect guidance.

Copilot uses AI. Check for mistakes.
Comment on lines 45 to 49
│ • DATABRICKS_TOKEN available from environment │
│ • Configs auto-generate: │
│ - ~/.claude/settings.json │
│ - ~/.opencode/config.json │
│ • Commands ready: claude, opencode │
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module README describes installing/configuring OpenCode (opencode) and generating ~/.opencode/config.json, but the provided init scripts in this PR don’t install OpenCode or add related helpers. Either implement OpenCode installation/configuration in the init script(s), or remove/update these README sections to avoid incorrect guidance.

Copilot uses AI. Check for mistakes.
Comment on lines 209 to 210
local cron_cmd="[ -n \"\$DATABRICKS_TOKEN\" ] && [ -n \"\$DATABRICKS_HOST\" ] && source \"\$HOME/.bashrc\" && _check_and_refresh_token >/dev/null 2>&1"
local cron_job="0 * * * * $cron_cmd"
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cron_cmd and cron_job are defined but not used (the function later installs cron_file directly into crontab). Removing unused locals (or actually using cron_job) will reduce confusion and keep the function as a single source of truth for the scheduled command.

Suggested change
local cron_cmd="[ -n \"\$DATABRICKS_TOKEN\" ] && [ -n \"\$DATABRICKS_HOST\" ] && source \"\$HOME/.bashrc\" && _check_and_refresh_token >/dev/null 2>&1"
local cron_job="0 * * * * $cron_cmd"

Copilot uses AI. Check for mistakes.
- Add safe expansion for DATABRICKS_HOST to prevent crash under set -u
- Remove unused local variables in claude-setup-token-refresh()
- Fix broken $? check in claude-vscode-config() by capturing exit code
- Use python3 -m pip instead of bare pip for safer execution
- Add supply-chain verification comment to minimal installer
- Fix node_type_id description to match actual default value
- Update scripts README with all available scripts
- Remove wget from system tools list (not installed)
- Remove OpenCode references throughout documentation

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant