Skip to content

Add debug-github-ci and debug-jenkins-ci skills#76

Open
neubig wants to merge 3 commits intomainfrom
add-ci-debug-skills
Open

Add debug-github-ci and debug-jenkins-ci skills#76
neubig wants to merge 3 commits intomainfrom
add-ci-debug-skills

Conversation

@neubig
Copy link
Contributor

@neubig neubig commented Feb 26, 2026

Summary

This PR adds two new skills for debugging CI/CD pipeline failures:

1. debug-github-ci

Debug GitHub Actions workflow failures with systematic guidance for:

  • Finding failed runs using gh run list
  • Fetching logs with gh run view --log-failed
  • Common error patterns and their fixes
  • Rerunning failed jobs

Triggers:

  • /debug-github-ci
  • github ci failed
  • github actions failed
  • workflow failed

2. debug-jenkins-ci

Debug Jenkins pipeline failures with guidance for:

  • Fetching build status and console output via Jenkins REST API
  • Pipeline stage debugging with wfapi
  • Common Jenkins error patterns
  • Jenkinsfile validation and debugging

Triggers:

  • /debug-jenkins-ci
  • jenkins ci failed
  • jenkins build failed
  • jenkins pipeline failed

Format

Both skills follow the same format as github-pr-review:

  • YAML frontmatter with name, description, and triggers
  • Step-by-step debugging workflows
  • Common error pattern tables
  • API examples with curl fallbacks
  • Debugging checklists

Files Changed

  • skills/debug-github-ci/SKILL.md - Main skill definition
  • skills/debug-github-ci/README.md - Human-readable documentation
  • skills/debug-jenkins-ci/SKILL.md - Main skill definition
  • skills/debug-jenkins-ci/README.md - Human-readable documentation
  • marketplaces/default.json - Register both skills in marketplace

Add two new skills for debugging CI failures:

- debug-github-ci: Debug GitHub Actions workflow failures using gh CLI
  and GitHub API. Includes triggers for common failure scenarios.

- debug-jenkins-ci: Debug Jenkins pipeline failures using Jenkins REST API.
  Covers console output, stage logs, and common error patterns.

Both skills follow the same format as github-pr-review with:
- YAML frontmatter with triggers
- Step-by-step debugging workflows
- Common error patterns and fixes
- API examples with curl fallbacks

Co-authored-by: openhands <openhands@all-hands.dev>
Add full plugin implementations matching the pr-review plugin structure:

debug-github-ci plugin:
- action.yml: GitHub composite action for CI debugging
- workflows/debug-ci-failure.yml: Example workflow triggered on failures
- scripts/agent_script.py: Agent script using gh CLI to fetch logs
- scripts/prompt.py: Prompt template for CI debugging
- skills/: Symlink to the debug-github-ci skill

debug-jenkins-ci plugin:
- scripts/agent_script.py: Agent script using Jenkins REST API
- scripts/prompt.py: Prompt template for Jenkins debugging
- skills/: Symlink to the debug-jenkins-ci skill
- README.md: Integration options (post-build, shared library, standalone)

Both plugins follow the same structure as pr-review:
- Executable Python scripts for agent orchestration
- Prompt templates with skill triggers
- Symlinks to corresponding skills
- Comprehensive documentation

Co-authored-by: openhands <openhands@all-hands.dev>
@neubig neubig marked this pull request as ready for review March 1, 2026 13:17
Copy link

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🔴 Needs Improvement

Core Issue: These are plugins with executable code being registered as skills in the marketplace, with broken source paths. The marketplace registration fundamentally conflicts with the actual directory structure.

Verdict: ❌ Needs rework - Path resolution is broken, architecture needs clarification.

Key Insight: You cannot register plugins as skills. Pick one: either register the SKILL.md files as skills (correct paths), or register these as plugins with their automation capabilities. The current approach breaks both.

@neubig neubig marked this pull request as draft March 1, 2026 13:28
- Made MAX_LOG_SIZE configurable via env var (default 50k chars)
- Changed truncation to keep both start (40%) and end (60%) of logs
  - Start often has root cause (failed dependencies, env setup)
  - End has final errors and stack traces
- Added documentation explaining truncation strategy

Co-authored-by: openhands <openhands@all-hands.dev>
@neubig neubig marked this pull request as ready for review March 2, 2026 03:40
Copy link

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟡 Acceptable - Works but has unresolved design issues

Core Pattern: Previous review feedback was "resolved" with explanations rather than fixes. The code works, but fundamental concerns remain unaddressed.

Verdict: ✅ Worth merging with caveats - The automation is functional and useful, but carries technical debt from day one.

Key Insight: You built plugins with executable automation but registered them as skills in the marketplace. This identity crisis creates confusion about what these artifacts actually are and how they should be used.

"--log-failed"
])
max_size = _get_max_log_size()
if len(output) > max_size:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion - Magic Percentages: The 40/60 split for log truncation is arbitrary. Why these specific percentages?

Previous review asked "why 50,000?" and you made it configurable (good), but the percentage split is equally arbitrary. If the first error is at line 1000 in a 100k line log, you lose 50% of the critical middle section.

Better approach: Search for error patterns first, then extract surrounding context. Or at least justify why 40/60 is optimal for CI logs.

Comment on lines +268 to +291
failed_jobs=failed_jobs,
logs=logs,
)

agent = create_agent(config)

secrets = {}
if config["api_key"]:
secrets["LLM_API_KEY"] = config["api_key"]
if config["github_token"]:
secrets["GITHUB_TOKEN"] = config["github_token"]

conversation = run_debug(agent, prompt, secrets)
log_cost_summary(conversation)

logger.info("CI debug analysis completed successfully")

except Exception as e:
logger.error(f"CI debug failed: {e}")
sys.exit(1)


if __name__ == "__main__":
main()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important - Function Complexity: main() is 80+ lines doing 6+ different things (validation, API calls, data formatting, agent creation, execution, logging).

This violates single responsibility and makes testing impossible (oh wait, there are no tests).

Refactor into:

  • validate_and_load_config()
  • fetch_workflow_data(run_id, repo_name)
  • execute_debug_agent(config, workflow_data)

Each testable independently.

Comment on lines +56 to +74

def _run_gh_command(args: list[str]) -> str:
"""Run a GitHub CLI command and return output."""
try:
result = subprocess.run(
["gh"] + args,
capture_output=True,
text=True,
timeout=120,
)
if result.returncode != 0:
logger.warning(f"gh command failed: {result.stderr}")
return ""
return result.stdout
except subprocess.TimeoutExpired:
logger.warning("gh command timed out")
return ""
except FileNotFoundError:
logger.error("gh CLI not found")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important - Silent Failures: When gh CLI fails or times out, you log a warning and return empty string. The calling code has no way to distinguish "no logs" from "API failure".

This masks real problems. Return None or raise an exception so callers can decide how to handle failures.

},
{
"name": "debug-github-ci",
"source": "./debug-github-ci",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion - Path Ambiguity: You claim pluginRoot: "./skills" resolves these paths, but that configuration isn't visible in this diff.

Either:

  1. Show the full marketplace context where pluginRoot is set
  2. Use explicit paths: "./skills/debug-github-ci"

Relying on invisible context makes this PR impossible to review accurately.

Comment on lines +15 to +52
PROMPT = """/debug-github-ci

Debug the CI failure below and identify the root cause.

## Workflow Run Information

- **Repository**: {repo_name}
- **Run ID**: {run_id}
- **Workflow**: {workflow_name}
- **Branch**: {branch}
- **Commit**: {commit_sha}
- **Triggered by**: {triggered_by}

## Failed Jobs

{failed_jobs}

## Error Logs

The following logs are from the failed jobs. Analyze them to identify the root cause.

```
{logs}
```

## Your Task

1. **Analyze the logs** to identify the specific error(s) that caused the failure
2. **Determine the root cause** - is it a code issue, dependency problem, configuration error, or flaky test?
3. **Provide actionable fixes** with specific commands or code changes
4. **Post a comment** on the associated PR (if any) or create an issue with your findings

Use the GitHub CLI (`gh`) to:
- Fetch additional context if needed: `gh run view {run_id} --log`
- Post comments: `gh pr comment` or `gh issue create`
- Check recent commits: `gh api repos/{repo_name}/commits`

Focus on providing clear, actionable guidance that helps developers fix the issue quickly.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion - Format String Injection Risk: If run_id or repo_name contain braces or special characters, this template could break or leak information.

While unlikely in CI context, add basic validation:

def format_prompt(...):
    # Validate inputs
    assert re.match(r"^[0-9]+$", run_id), "Invalid run_id"
    assert re.match(r"^[\w-]+/[\w-]+$", repo_name), "Invalid repo_name"
    ...

Comment on lines +303 to +371
logger.error(f"No failed builds found for job '{job_name}'")
sys.exit(1)
else:
# Default to last failed build
build_number = get_last_failed_build(
jenkins_url, jenkins_user, jenkins_token, job_name
)
if not build_number:
logger.error(f"No failed builds found for job '{job_name}'")
sys.exit(1)

logger.info(f"Debugging build {build_number} of job '{job_name}'")

try:
build_info = get_build_info(
jenkins_url, jenkins_user, jenkins_token, job_name, build_number
)
if not build_info:
logger.error("Failed to fetch build info")
sys.exit(1)

build_result = build_info.get("result", "UNKNOWN")
duration = format_duration(build_info.get("duration", 0))
timestamp = format_timestamp(build_info.get("timestamp", 0))

logger.info(f"Build result: {build_result}")
logger.info(f"Duration: {duration}")

stages = get_pipeline_stages(
jenkins_url, jenkins_user, jenkins_token, job_name, build_number
)
logs = get_console_output(
jenkins_url, jenkins_user, jenkins_token, job_name, build_number
)

if not logs:
logger.warning("No console output found")
logs = "No console output available."

prompt = format_prompt(
jenkins_url=jenkins_url,
job_name=job_name,
build_number=build_number,
build_result=build_result,
duration=duration,
timestamp=timestamp,
stages=stages,
logs=logs,
)

agent = create_agent(config)

secrets = {
"LLM_API_KEY": config["api_key"],
"JENKINS_API_TOKEN": jenkins_token,
}

conversation = run_debug(agent, prompt, secrets)
log_cost_summary(conversation)

logger.info("Jenkins debug analysis completed successfully")

except Exception as e:
logger.error(f"Jenkins debug failed: {e}")
sys.exit(1)


if __name__ == "__main__":
main()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important - Same Issues as GitHub Script: This has the same problems:

  • 70+ line main() function
  • Silent error handling returning empty data
  • No input validation on job names
  • No tests

Don't copy-paste bad patterns. Refactor the GitHub version first, then use the improved design here.

@@ -0,0 +1,151 @@
# Debug GitHub CI Plugin

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion - Plugin vs Skill Clarity: This README says "Plugin" but you registered it as a skill in the marketplace.

From the repo's AGENTS.md:

  • Skills = progressive disclosure guidance (SKILL.md)
  • Plugins = executable code with lifecycle hooks

You have both. Which is the primary artifact? The marketplace registration suggests "skill" but this README and the action.yml suggest "plugin".

Pick one identity or clearly document the dual-mode usage.


- name: Install uv
uses: astral-sh/setup-uv@v6
with:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion - Cache in Debug Tool: Previous review flagged this. You defended it with "reproducibility", but for a debug tool analyzing CI failures, you want fresh state.

The cache could hide the very problem you're debugging (e.g., "works on my cached env but fails on fresh install").

Consider making this opt-in: use cache only if debug-use-cache: true is explicitly set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants