Skip to content

RFC: subtree-sync skills from databricks-agent-skills/experimental#530

Closed
jamesbroadhead wants to merge 3 commits into
databricks-solutions:mainfrom
jamesbroadhead:sync-skills-from-das
Closed

RFC: subtree-sync skills from databricks-agent-skills/experimental#530
jamesbroadhead wants to merge 3 commits into
databricks-solutions:mainfrom
jamesbroadhead:sync-skills-from-das

Conversation

@jamesbroadhead
Copy link
Copy Markdown
Contributor

Summary (RFC / draft)

Proposes a git subtree-based back-link from this repo's databricks-skills/
to databricks/databricks-agent-skills/experimental/.

Paired with databricks/databricks-agent-skills#73
which establishes the experimental/ directory on the d-a-s side.

What changes

  • databricks-skills/imported/ — new directory, added as a git subtree
    of the experimental-only branch of databricks-agent-skills. Contains the
    25 imported skills (the 26 a-d-k skills minus the dropped databricks-model-serving).
  • .github/workflows/sync-skills-from-das.yml — weekly cron + manual
    dispatch. Runs git subtree pull, opens an auto-PR if there is drift.
  • databricks-skills/SYNC.md — operator runbook + mechanism explainer +
    trade-offs vs submodule / rsync / fork.
  • databricks-skills/README.md — banner block pointing at imported/ and
    SYNC.md, with explicit "do not edit imported/ here" note.

How it works

git subtree can't pull a subdirectory of a remote repo directly — the remote
needs to publish a branch whose root tree is what you want. So:

  • On d-a-s: a workflow runs git subtree split --prefix=experimental --branch=experimental-only
    after each push to main and force-pushes the result. The branch's root is
    the contents of experimental/.
  • On this repo: git subtree pull --prefix=databricks-skills/imported <d-a-s-url> experimental-only --squash
    brings drift in, recorded as a squashed merge commit referencing the
    upstream SHA. git log --grep "Squashed 'databricks-skills/imported/'"
    shows the full sync history; git blame on an imported skill points back
    to its upstream commit.

Status

Draft / RFC. The subtree was added from a one-shot preview branch
(experimental-only-preview)
pushed manually to d-a-s for this demo. Before merge:

  1. d-a-s side — land #73
    first so experimental/ exists on main.
  2. d-a-s side — separate follow-up PR there to add the
    experimental-only auto-publish workflow.
  3. Here — swap the workflow + SYNC.md to reference experimental-only
    (not -preview), re-run subtree-add against the stable branch, and force
    the squash commits in this PR to be replaced.

Open questions

  1. What about the 26 legacy top-level skills? This RFC adds imported/ as
    a new directory; the existing skills at databricks-skills/<name>/ stay
    put. Long-term we probably want all skills to live under imported/ and
    for install_skills.sh to read from one place — but that migration is
    out of scope here.
  2. Conflict policy. If a-d-k locally edits a file under imported/,
    git subtree pull will conflict. SYNC.md says "don't edit" but doesn't
    enforce. Should we add a pre-commit hook that rejects edits under
    imported/?
  3. Sync cadence. Weekly + manual dispatch. Daily? Or trigger on upstream
    push via repository_dispatch?
  4. Auto-merge. Auto-PR is currently for-review-only. Would we want it
    to auto-merge if CI is green?

Alternatives considered

  • rsync from a fresh clone — simpler workflow, no experimental-only
    branch needed. Trade-off: loses the squashed-merge audit trail and
    git log/blame provenance.
  • git submodule — can't reference a subdirectory of the target repo,
    and end users would need git submodule update to see skill content.
  • Hard fork — diverges silently; defeats the back-link goal.

SYNC.md captures these trade-offs in more detail.

This pull request and its description were written by Claude.

git-subtree-dir: databricks-skills/imported
git-subtree-split: b8781b713f0c80e7e827288e10b3e5db692f6084
Proposes a subtree-based back-link from databricks-solutions/ai-dev-kit
to databricks/databricks-agent-skills/experimental/, replacing the
current ad-hoc copy.

- `databricks-skills/imported/` is added as a `git subtree` of the
  `experimental-only` branch of databricks-agent-skills (a branch whose
  root tree mirrors experimental/, produced via `git subtree split` on
  that side after every push to main).
- `.github/workflows/sync-skills-from-das.yml` runs weekly (and on
  manual dispatch), pulls the subtree, and opens an auto-PR on drift.
- `databricks-skills/SYNC.md` documents the mechanism, manual-sync
  command, and trade-offs vs submodule / rsync / fork.
- `databricks-skills/README.md` gets a banner pointing at imported/.

Paired with databricks/databricks-agent-skills#73 (which establishes
the experimental/ directory on d-a-s side) and a follow-up PR there to
add the split-publish workflow that maintains experimental-only.

For this RFC the subtree was added from a one-shot preview branch
(experimental-only-preview) pushed manually to d-a-s. The follow-up
will replace that with the auto-maintained experimental-only branch.
lennartkats-db pushed a commit to databricks/databricks-agent-skills that referenced this pull request May 24, 2026
)

## Summary

Adds an `experimental/` directory containing 19 agent skills from
[databricks-solutions/ai-dev-kit](https://github.com/databricks-solutions/ai-dev-kit)
`databricks-skills/`, imported as a snapshot on a **best-effort basis**.
Excluded: `databricks-model-serving` (TODO #1b — different surface than
stable, heavy MCP coupling) and `databricks-spark-declarative-pipelines`
(TODO #5 — different surface than stable `databricks-pipelines`).
`databricks-lakebase-provisioned` is not in the upstream `experimental`
branch either, so absent here.

The manifest now exposes both stable and experimental skills in a
**single `skills` map**. Each entry carries a `repo_dir` field
(`"skills"` or `"experimental"`) that points to the directory the skill
lives in. Consumers derive experimental state from `repo_dir` — there is
no parallel `experimental_skills` map and no per-skill `experimental`
bool.

Paired with
[databricks/cli#5243](databricks/cli#5243) which
teaches `databricks experimental aitools skills install` to:
- read `repo_dir` and skip experimental entries by default,
- install all of them with `--experimental`,
- install one by name (with `--experimental` required and
`-experimental` suffix on the install dir).

## Source

Synced from
[`f9b404b`](databricks-solutions/ai-dev-kit@f9b404b)
on `databricks-solutions/ai-dev-kit:experimental`.

Initial import was
[`9c7a5b3`](databricks-solutions/ai-dev-kit@9c7a5b3)
(head of [a-d-k PR
#533](databricks-solutions/ai-dev-kit#533) on
the `appkit-on-experimental` branch). PR #533 has since merged into
`experimental` (`7b07f18`), and the branch has two further commits worth
pulling:
-
[`0ebc38b`](databricks-solutions/ai-dev-kit@0ebc38b)
"Surface silent failures in installer + dashboard skill" — updates
`databricks-aibi-dashboards/SKILL.md` (CLI flag JSON-vs-flag form).
-
[`f9b404b`](databricks-solutions/ai-dev-kit@f9b404b)
"Replace mas_manager.py with native supervisor-agents CLI" — updates
`databricks-agent-bricks/SKILL.md` + `2-supervisor-agents.md` to the new
supervisor-agents CLI group (Beta, CLI 0.299.2+); removes the 667-line
`scripts/mas_manager.py` shim.

Both pulled in commit `10baa35`. The fork branch's installer-side fixes
(`5d2e6ac` / `39c349c` / `dd2257c`) are a-d-k tooling and don't touch
`databricks-skills/`, so nothing to pull from there.

~~**Landing dependency**: a-d-k PR #533 should merge before this PR so
the first periodic sync from a-d-k doesn't conflict.~~ **Resolved** — PR
#533 merged upstream. The rename (`databricks-app-python` →
`databricks-apps-python`) is preserved in the merged version, which is
what prevents a 3rd skill-name collision with d-a-s's stable
`databricks-apps`.

## Direction caveat — please read

In the Apr 28 thread ([Slack
link](https://databricks.slack.com/archives/C0AKALZU65P/p1778088227285599?thread_ts=1774540245.454779&cid=C0AKALZU65P)),
Dustin's stated plan was to move `databricks-agent-skills` skills
**into** `ai-dev-kit`'s `experimental` branch as defaults. **This PR
goes the other direction** (a-d-k content → d-a-s/experimental). I don't
see any d-a-s commits from Dustin yet, and the timing has slipped.
Opening this so we have something concrete to iterate on — happy to drop
it if the original direction is still preferred.

## TODOs / caveats for iteration

1. **Name collisions.** Resolved in this PR:

- **1a. `databricks-jobs` — merged into stable.** Imported the
comprehensive reference content from a-d-k's `databricks-jobs` skill
into `skills/databricks-jobs/`, bumping version to `0.2.0`. The merged
skill keeps stable's scaffolding workflow + `parent: databricks-core`
hierarchy + Codex `agents/openai.yaml` + compatibility note, and adds
the experimental's full task-types reference (9 types), trigger types
(6), notifications/health/retries/queues, and 7 worked end-to-end
examples. Layered structure: SKILL.md as overview + four reference files
(`task-types.md`, `triggers-schedules.md`,
`notifications-monitoring.md`, `examples.md`). Cleanups during merge:
dropped trigger-spam description, normalized
`/Workspace/Users/user@example.com/...` paths to
`/Workspace/Shared/...`. The experimental copy is removed.

With the single-map manifest shape, collisions are no longer possible —
`_add_skill` raises if the same skill name shows up under both `skills/`
and `experimental/`, so any future drift fails generation loudly.

- **1b. `databricks-model-serving` — dropped from this PR.** After a
deep compare, the two skills cover almost entirely different surfaces:
stable is **ops-focused** (manage existing endpoints via CLI:
`serving-endpoints
create/get/query/update-config/build-logs/put-ai-gateway/get-permissions/...`,
AI Gateway, traffic config, app integration via `databricks-apps`
skill); experimental is **dev-focused** (build & ship MLflow models /
GenAI agents: autolog → `mlflow.pyfunc.log_model` →
`databricks.agents.deploy()` → query, with full Classical ML / Custom
PyFunc / `ResponsesAgent` + LangGraph / UCFunctionToolkit /
VectorSearchRetrieverTool coverage). Near-zero content overlap.
Experimental version also has heavy MCP-tool dependency (60+ refs to
ai-dev-kit's `manage_serving_endpoint`, `manage_workspace_files`,
`manage_jobs`, `manage_job_runs`, `execute_code` that don't exist in the
d-a-s/`databricks experimental aitools` flow). Removed
`experimental/databricks-model-serving/` from this PR; manifest
regenerated. **Follow-up**: port the high-value dev-side content into
the stable skill — classical-ml autolog patterns
(`mlflow.{sklearn,xgboost,lightgbm,pytorch,tensorflow,spark}.autolog()`),
Custom PyFunc signatures, `ResponsesAgent` pattern with the
`create_text_output_item` helper-method gotcha, `UCFunctionToolkit` +
`VectorSearchRetrieverTool` with resource passthrough for auth, the
Foundation Model API endpoint table. Strip MCP refs; replace with
CLI/SDK equivalents. Owners: @databricks/eng-apps-devex (per
CODEOWNERS).
2. ~~**CODEOWNERS for `experimental/`**~~ **Resolved.** Per @simonfaltum
review: the top 10 a-d-k contributors (>=10 commits at import time) are
now Code Owners of `/experimental/` alongside the d-a-s maintainers
(@lennartkats-db, @simonfaltum, @databricks/eng-apps-devex), so their
review satisfies the Required-Code-Owner-Review branch protection.
Maintainer review still works as an alternate path.
3. ~~**No sync mechanism with upstream a-d-k.**~~ **Resolved with a
paired RFC.** Two-part plan:
- **Pre-lock (this PR)**: periodic manual re-syncs from upstream
`ai-dev-kit` into `experimental/`. Documented in
`experimental/README.md`.
- **Post-lock (follow-up)**: invert the direction. a-d-k becomes the
consumer; `databricks-skills/imported/` in a-d-k is a `git subtree` of
this repo's `experimental/`. RFC PR opened against a-d-k:
databricks-solutions/ai-dev-kit#530 (draft). To
make subtree work, d-a-s needs to publish an `experimental-only` branch
via `git subtree split --prefix=experimental` after every push to main —
that's a small workflow to add here in a follow-up PR. A one-shot
preview branch `experimental-only-preview` was pushed to this repo to
enable the RFC demo and should be deleted once the auto-publish workflow
lands.
4. ~~**No agent metadata.**~~ **Resolved.** Imported skills install fine
on Codex CLI — the missing `agents/openai.yaml` was a cosmetic gap, not
a functional blocker (skill files still get copied; only the marketplace
UI metadata is absent). `scripts/skills.py` now auto-generates
`agents/openai.yaml` + copies shared assets for each experimental skill
on `generate`, using SKILL.md frontmatter as the source. Stubs are only
written when missing, so upstream a-d-k can override by shipping its own
files in the skill. The auto-generated names are titlecased from the
skill key — most look good (`Databricks Iceberg`, `Databricks Genie`); a
few degrade gracefully (`Databricks Aibi Dashboards`). Refining those is
a follow-up.
5. ~~**`databricks-pipelines` was deliberately excluded.**~~
**Resolved.** a-d-k doesn't ship a `databricks-pipelines` skill under
that name, but it *does* ship `databricks-spark-declarative-pipelines`
covering the same product. After a deep compare, that experimental
version covers a different surface than stable: scaffolding (`databricks
pipelines init` + bundle/MCP workflow A/B/C), DLT migration guide,
language-selection rules, per-language performance reference. The stable
skill covers feature reference (decision tree, common traps, format
options, fine-grained per-feature × per-language refs). Partial overlap;
experimental's DAB-coupled workflow is the exact concern Dustin flagged
in the Apr 28 Slack thread for demo-generator flows. **Removed
`experimental/databricks-spark-declarative-pipelines/` from this PR**.
**Follow-up TODO** (post-merge): port the high-value pieces into stable
`skills/databricks-pipelines/` — DLT migration guide, workflow A/B/C
decision matrix, per-language performance reference, language-selection
rules. Strip MCP-tool refs. Owners: @lennartkats-db / @camielstee-db
(per CODEOWNERS).
6. ~~**`spark-python-data-source` naming exception.**~~ **Kept as-is.**
The skill is about the OSS Apache Spark 4+ PySpark DataSource API
(building custom connector libraries), not a Databricks product — only
lightly flavored with Databricks idioms. The convention break is
acceptable given the content.
7. ~~**Versioning.**~~ **Resolved.** Bumped the
`extract_version_from_skill` fallback in `scripts/skills.py` from
`0.0.0` → `0.0.1` so the manifest never reports `0.0.0` (which some
tools treat as \"unset\"). Applies to skills that currently have no
explicit `version:` in their SKILL.md frontmatter. Skills with an
explicit version are unchanged. The change is sync-safe: when upstream
a-d-k eventually adds version fields, those win; until then, the
manifest reports the floor.
8. ~~**`installed_dir` for experimental skills.**~~ **All experimental
skills install under a `-experimental` suffix.** Every experimental
skill installs to `~/.claude/skills/<name>-experimental/` regardless of
whether there's a stable skill with the colliding base name. Implemented
in [databricks/cli#5243](databricks/cli#5243)
via a new `SourceName` field on `SkillMeta`: the install-side manifest
key (and install dir) carry the `-experimental` suffix; `SourceName`
preserves the unsuffixed name for fetching from `experimental/<name>/`
in this repo. Users see at a glance which installed skills are
experimental.
9. ~~**Excluded a-d-k content.**~~ **Confirmed scope.** Excluded:
`TEMPLATE/` (template, not a skill), `install_skills.sh` +
`install_genie_code_skills.py` (a-d-k's installers — we use the cli
installer instead), `databricks-builder-app/` (a Python app for a-d-k's
builder UI), `databricks-mcp-server/` (the a-d-k MCP server — separate
concern from skills), `databricks-tools-core/` (Python lib used by a-d-k
tooling — no experimental skill references it), `hooks/hooks.json`
(a-d-k plugin lifecycle hooks tied to
`\${CLAUDE_PLUGIN_ROOT}/.claude-plugin/setup.sh`/`check_update.sh` —
plugin-specific, not skill content), plus top-level repo metadata
(`.github/`, `LICENSE.md`, `README.md`, `VERSION`, `install.{sh,ps1}`,
etc.). Verified no experimental skill cross-references any excluded
path.
10. ~~**README placement.**~~ **Verified.** `experimental/README.md`
retains the adapted a-d-k skill list with a top warning block; the root
`README.md` has an \"Experimental Skills\" section with an
install-by-name example. Three concrete fixes applied during the
verification pass: (a) dropped the stale `databricks-model-serving`
collision example since that skill was removed from the PR, (b) install
commands updated to include the `-experimental` suffix + flag per TODO
#8's resolution, (c) added a short note in `experimental/README.md`
explaining why the in-repo dir names don't carry the suffix (it's added
at install time).
11. ~~**Manifest shape.**~~ **Resolved.** Replaced the original two-map
design (top-level `skills` + `experimental_skills` plus per-skill
`experimental` bool) with a single `skills` map where each entry's
`repo_dir` field is the source of truth. Rationale: the directory
location in the repo already determines status, so it's the natural
single source. Consumers derive experimental state from `repo_dir` (see
cli's `SkillMeta.IsExperimental`). The manifest generator
(`scripts/skills.py`) raises a clear error if the same skill name
appears under both `skills/` and `experimental/`, so future drift fails
generation rather than silently overwriting.

## Test plan

- [x] `python3 scripts/skills.py generate` regenerates the manifest
cleanly.
- [x] `python3 scripts/skills.py validate` passes.
- [ ] CI green on this branch.
- [ ] Manual: `databricks experimental aitools skills install` (no flag)
installs only stable skills.
- [ ] Manual: `databricks experimental aitools skills install
--experimental` installs both.
- [ ] Manual: `databricks experimental aitools skills install
databricks-iceberg-experimental` errors because it's experimental.
- [ ] Manual: `databricks experimental aitools skills install
databricks-iceberg-experimental --experimental` installs that one skill.

This pull request and its description were written by Claude.
@jamesbroadhead
Copy link
Copy Markdown
Contributor Author

👋 Claude here on James's behalf — closing this RFC in favour of a simpler plan.

The original RFC proposed inverting the sync direction post-lock via git subtree, with d-a-s publishing an experimental-only branch on every push to main and a-d-k pulling it in as a subtree under databricks-skills/imported/. That required new infrastructure on both sides (subtree-publish workflow on d-a-s, subtree-consume workflow on a-d-k) for relatively modest payoff.

New plan, post-d-a-s-#73 merge:

  1. databricks-agent-skills is the source of truth for all skills (stable + experimental).
  2. a-d-k stops shipping skill content for anything that lives in d-a-s. Instead, each skill's directory in databricks-skills/<name>/ becomes a tombstone: a single SKILL.md that redirects users to install via the CLI:
    databricks aitools install <name> [--experimental]
  3. Install scripts (install_skills.sh, install_genie_code_skills.py) get replaced with redirect messages pointing at databricks aitools install.
  4. README updated to point at databricks/databricks-agent-skills.

Advantages over the subtree approach:

  • No new infrastructure on either repo.
  • Single source-of-truth: contributors PR directly to d-a-s.
  • Existing users land on the tombstones and get a clear redirect.
  • Genie Code installs follow the same path (via the CLI).

Follow-up PR against a-d-k incoming with the tombstone implementation.

Closing this RFC.

(comment posted by Claude)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant