Add databricks-serverless-storage-check skill#82
Open
GabbysCode wants to merge 1 commit into
Open
Conversation
…disk handoffs Adds a new skill `databricks-serverless-storage-check` that ships an executable preflight scanner for the antipattern where parent/child tasks share state through /local_disk0, /tmp, or trustedTemp paths -- the failure seen in serverless jobs that fail with `INTERNAL_ERROR: [Errno 13] Permission denied` on local-disk paths. The scanner (scripts/preflight.py, stdlib-only, AST + regex) supports five input modes (--notebook, --dir, --job-yaml, --job-id, --run-id) and 7 detection rules (FANOUT001-006 plus ENV001 which routes env-sync errors to support escalation). All 7 self-tests pass. Complementary to databricks-serverless-migration (single-notebook migration). Added a one-line cross-reference from that skill's data-access table pointing here for multi-task fan-out concerns. Includes the required agents/openai.yaml (hand-authored) and SKILL_METADATA entry in scripts/skills.py; manifest regenerated and `python3 scripts/skills.py validate` passes. Signed-off-by: GABRIELLE DOMPREH <Gabby.dompreh@databricks.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
databricks-serverless-storage-check— a skill that ships an executable preflight scanner detecting the antipattern where serverless tasks share state through/local_disk0,/tmp, ortrustedTemppaths. This is the failure mode behindINTERNAL_ERROR: [Errno 13] Permission denied: '/local_disk0/.../trustedTemp.../...', where a parent task writes to local disk and a child task on a different node cannot read it.Complementary to the existing
databricks-serverless-migrationskill (which covers single-notebook migration and correctly recommends/local_disk0/tmpfor intra-task scratch). This new skill covers the cross-task case.Contents
skills/databricks-serverless-storage-check/SKILL.mdskills/databricks-serverless-storage-check/agents/openai.yamlskills/databricks-serverless-storage-check/scripts/preflight.py--notebook/--dir/--job-yaml/--job-id/--run-id),--jsonflag, exit codes 0/1/2skills/databricks-serverless-storage-check/scripts/test_preflight.pypython3and no third-party depsskills/databricks-serverless-storage-check/references/pattern-catalog.mdskills/databricks-serverless-storage-check/references/remediation-guide.mdtaskValues/ pipeline-downstream handoffs, plus what-not-to-do anti-examplesDetection rules
FANOUT001dbutils.notebook.run,taskValues.set, or job-task parameter (resolved through variable assignments and dict/list/tuple literals)FANOUT002widgets.get/taskValues.get) reads from a/local_disk0or/tmppathFANOUT003FANOUT004pipeline_taskimmediately downstream of anotebook_taskthat wrote to local tempFANOUT005dbutils.fs.cplocal-to-local inside a notebook invoked by a multi-task job (heuristic)FANOUT006/local_disk0/spark-*/trustedTemp/...anywhere in sourceENV001--run-idmode only: routesENVIRONMENT_SETUP_ERROR.PYTHON_NOTEBOOK_ENVIRONMENTto support escalation (not a fixable pattern)Sibling cross-reference
Adds one line to
skills/databricks-serverless-migration/SKILL.md(Category B: Data Access table) clarifying that/local_disk0/tmpis per-task scratch only and pointing to this skill for cross-task concerns. Flagged here because it touches a sibling skill.Validation
python3 scripts/skills.py validate—Everything is up to date.python3 skills/databricks-serverless-storage-check/scripts/test_preflight.py— 7/7 passing (BSI repro → blockers + exit 2; DAB sibling-shared/tmp→ warning + exit 1; clean/Volumesnotebook → 0 findings + exit 0; env-sync regex; BSI signature; exit-code resolution; JSON shape)/local_disk0/spark-.../trustedTemp-.../..., child reads via widget) triggersFANOUT001+FANOUT002+FANOUT006as blockersdatabricksCLI is required only for--job-id/--run-idmodesChecklist
python3 scripts/skills.py validatepassesSKILL_METADATAentry added inscripts/skills.pyagents/openai.yamlhand-authoredSKILL.mdbody under 250 lines (149 lines)trustedTemp,local_disk0,permission denied,fan-out,cross-task