[feature] CI build failure helper bot #524#594
Conversation
Created reusable ai triage workflow and suggestion script for suggestion bot. Fixes #524
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a GenAI-powered CI triage system: a new Python script at .github/scripts/ai_suggest.py that reads failed_logs.txt and optional repo_context.xml, requires GEMINI_API_KEY, initializes a Google GenAI client, calls gemini-2.5-flash-lite to generate a Markdown report, and exposes get_error_logs() and main(); and a reusable GitHub Actions workflow at .github/workflows/reusable-ai-triage.yml that prepares a runner (Python, deps), fetches CI logs, packs repo context with repomix, creates a GitHub App token, runs the script, and conditionally posts the generated solution.md as a PR comment. Sequence Diagram(s)sequenceDiagram
actor GHA as GitHub Actions
participant Checkout as Checkout (reusable + PR)
participant Runner as Runner (Python, pip)
participant Logs as Logs Fetcher
participant Repomix as Repomix (pack)
participant Script as ai_suggest.py
participant GenAI as Google GenAI (gemini-2.5-flash-lite)
participant App as GitHub App (token)
participant Comment as PR Commenter
GHA->>Checkout: checkout reusable workflow & PR code
GHA->>Runner: setup Python, install deps (google-genai, repomix)
GHA->>Logs: fetch CI logs (run_id)
Logs-->>GHA: failed_logs.txt (or placeholder)
GHA->>Repomix: pack repo -> repo_context.xml
GHA->>App: create GitHub App token (APP_ID, PRIVATE_KEY)
App-->>GHA: token
GHA->>Script: run ai_suggest.py (GEMINI_API_KEY, failed_logs, repo_context)
Script->>GenAI: send prompt to gemini-2.5-flash-lite
GenAI-->>Script: Markdown report
Script-->>GHA: write solution.md
GHA->>Comment: post PR comment using token (if solution.md non-empty)
Comment-->>GHA: comment posted / skipped
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (2)
.github/scripts/ai_suggest.py (2)
12-14: Tail-only truncation drops early failures.
content[-15000:]sends only the last 15 000 characters. CI failures that occur early (e.g., a dependency installation error before the test runner even starts) will be silently truncated. Consider taking from both the head and tail, or extracting only the failure section:🔧 Proposed alternative
- return content[-15000:] + # Preserve both the beginning (setup errors) and end (test output) + if len(content) <= 15000: + return content + head = content[:5000] + tail = content[-10000:] + return f"{head}\n\n[...truncated...]\n\n{tail}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/scripts/ai_suggest.py around lines 12 - 14, The current reader opens log_file and returns only content[-15000:], which loses early failures; update the log reading logic used around the with-open block (where log_file is read into variable content) to preserve head and tail or to extract the failure section instead of only the tail — e.g., read the whole file into content, detect/regex for failure markers (stack traces, "ERROR", "FAIL", or CI-specific sections) and return that slice, or return a concatenation of the first N and last M characters (keeping variable name content and the with-open scope intact so callers relying on content still work).
11-16: Broad/silent exception handling across three sites (Ruff BLE001, S110).
- Lines 15–16:
except Exceptionhides unexpected errors from log reads; at minimum propagate with a logged message (already returned as string, which is fine, but the blind catch masks unexpected issues like encoding errors).- Lines 32–33:
except Exception: passsilently discards context-read failures. Ifrepo_context.xmlexists but is unreadable, the model receives the stale default string with no indication of the problem.- Line 75:
except Exceptionswallows any non-generation error (e.g., network timeout, authentication failure), making triage of workflow failures harder.🔧 Proposed fix
- except Exception as e: - return f"Error reading logs: {e}" + except OSError as e: + return f"Error reading logs: {e}"- except Exception: - pass + except OSError as e: + print(f"Warning: could not read repo_context.xml: {e}", file=sys.stderr)- except Exception as e: - print(f"Generation Failed: {e}") + except (genai.errors.APIError, OSError) as e: + print(f"Generation Failed: {e}")Also applies to: 28-33, 75-76
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/scripts/ai_suggest.py around lines 11 - 16, The code uses broad "except Exception" and silent "except Exception: pass" around file reads and generation which hides real errors; update the try/excepts around the log_file read (where content = f.read()), the repo_context.xml read, and the generation block so they catch specific exceptions (e.g., FileNotFoundError, PermissionError, OSError, UnicodeDecodeError, requests.exceptions.RequestException as appropriate) instead of Exception, and on failure either log the exception with its message/traceback (include the exception variable) and return a clear error string or re-raise/propagate after wrapping; remove silent "pass" and ensure the code that uses repo context receives a visible error or fallback with logged details rather than swallowing failures.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.github/scripts/ai_suggest.py.github/workflows/reusable-ai-triage.yml
🧰 Additional context used
🪛 actionlint (1.7.11)
.github/workflows/reusable-ai-triage.yml
[error] 45-45: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
🪛 Ruff (0.15.1)
.github/scripts/ai_suggest.py
[warning] 15-15: Do not catch blind exception: Exception
(BLE001)
[error] 32-33: try-except-pass detected, consider logging the exception
(S110)
[warning] 32-32: Do not catch blind exception: Exception
(BLE001)
[warning] 75-75: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: Python==3.13 | django~=5.1.0
- GitHub Check: Python==3.11 | django~=5.1.0
- GitHub Check: Python==3.10 | django~=4.2.0
- GitHub Check: Python==3.12 | django~=4.2.0
- GitHub Check: Python==3.12 | django~=5.1.0
- GitHub Check: Python==3.10 | django~=5.0.0
- GitHub Check: Python==3.11 | django~=5.2.0
- GitHub Check: Python==3.13 | django~=5.2.0
- GitHub Check: Python==3.10 | django~=5.1.0
- GitHub Check: Python==3.12 | django~=5.2.0
- GitHub Check: Python==3.11 | django~=5.0.0
- GitHub Check: Python==3.10 | django~=5.2.0
- GitHub Check: Python==3.12 | django~=5.0.0
- GitHub Check: Python==3.11 | django~=4.2.0
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/scripts/ai_suggest.py:
- Around line 56-64: The current prompt string assigned to the variable prompt
forces the assistant to "Fix this failing test." which contradicts the triage
scope defined in system_instruction; change the prompt generation so it first
instructs the model to triage the input into one of the three categories from
system_instruction (Code Style/QA, Commit Message, Test Failure) and only
proceed to propose fixes when the category is Test Failure, otherwise return the
appropriate QA or commit guidance; update the prompt variable to explicitly
mention using {error_log} and {repo_context} as context for triage and
downstream actions and reference system_instruction in the directive so the
model follows the three-category flow.
- Around line 66-76: The print of the Gemini response can emit "None" when
response.text is None; update the handling around client.models.generate_content
so you check response.text (the variable `response` from the
`client.models.generate_content` call) and replace None with a safe fallback
(e.g., an empty string or a clear message like "[no content returned]") before
formatting the "## Report" output and writing/posting it; modify the except
block to also emit the same safe fallback when an exception occurs so
`response.text` is never interpolated as "None" in the output.
In @.github/workflows/reusable-ai-triage.yml:
- Around line 62-65: The Pack Context step currently runs repomix over the
entire PR checkout (repomix --include "**/*" ...) which risks sending secrets;
update the Pack Context job to (1) compute the minimal file set (e.g., use git
diff --name-only against the base branch or use the GitHub event changed_files
list) and pass only those paths to repomix instead of "**/*" so only changed
files are serialized, (2) add a pre-flight secret-scan step (run gitleaks or
trufflehog on the checkout and fail or redact on findings) before calling
repomix, and (3) add a clear comment/metadata in the workflow documentation
indicating that the repomix output is transmitted to an external API so
maintainers can opt out; reference the repomix invocation and the Pack Context
step when making these changes.
- Around line 45-47: Update the GitHub Action step that uses the
actions/setup-python action by changing its version reference from
actions/setup-python@v4 to actions/setup-python@v5; locate the step that
contains the "uses: actions/setup-python@v4" entry (the setup-python step) and
replace the tag so the workflow uses the newer v5 release, keeping the existing
python-version input (python-version: "3.10") unchanged.
- Around line 49-52: The workflow installs unpinned dependencies; change the pip
and npm install steps to pin google-genai and repomix to known-good versions
(replace google-genai and repomix in the "Install Tools" step with explicit
versions, e.g., google-genai==<version> and repomix@<version>) and optionally
generate and reference a requirements.txt with hashes or commit a
package-lock.json to the repo to enforce reproducible installs and stronger
supply-chain guarantees.
- Around line 80-85: Guard against posting an empty solution.md and avoid
duplicate bot comments by first checking the contents of solution.md and
existing PR comments before running gh pr comment: if solution.md is empty (zero
bytes or only whitespace) skip posting and log a message; for deduplication,
query the PR comments (using the PR_NUM/REPO env vars and gh api/gh pr view) to
find an existing bot comment and update it (e.g., use gh pr comment --edit-last
or gh api to PATCH the found comment) instead of unconditionally running gh pr
comment "$PR_NUM" --repo "$REPO" --body-file solution.md; make these checks in
the Post Comment step that uses GH_TOKEN, PR_NUM and REPO.
- Line 78: The workflow step "Run AI Analysis" references a non-existent script
trusted_scripts/.github/scripts/ai_fix.py; update that invocation to use the
committed script name trusted_scripts/.github/scripts/ai_suggest.py (or rename
ai_suggest.py to ai_fix.py if you prefer) so the command python
trusted_scripts/.github/scripts/ai_suggest.py > solution.md succeeds and
produces solution.md for the subsequent "Post Comment" step.
- Around line 54-60: The "Fetch CI Logs" step runs "gh run view $RUN_ID --repo
$REPO --log-failed > failed_logs.txt" but doesn't handle a non-zero exit or
empty output; update that step to capture the command exit code and verify
failed_logs.txt is present and non-empty after running "gh run view", and if the
command failed or the file is empty print a clear message and exit early (e.g.,
exit 0) to skip downstream steps; reference the "Fetch CI Logs" step, the "gh
run view" invocation and the generated failed_logs.txt when implementing this
check.
- Around line 30-35: The checkout step currently uses github.action_repository
and github.action_ref (and the suggested github.workflow_ref), which refer to
the running action or caller workflow rather than the reusable workflow itself;
update the actions/checkout step (the block that uses actions/checkout@v4 and
sets with.repository and with.ref) to either hardcode the reusable workflow's
repository and ref, or derive them from the OIDC token claims that identify the
called workflow by using job_workflow_ref for with.repository/ref and
job_workflow_sha for with.ref (or job_workflow_ref as repo and job_workflow_sha
as ref), ensuring you remove references to
github.action_repository/github.action_ref/github.workflow_ref.
---
Nitpick comments:
In @.github/scripts/ai_suggest.py:
- Around line 12-14: The current reader opens log_file and returns only
content[-15000:], which loses early failures; update the log reading logic used
around the with-open block (where log_file is read into variable content) to
preserve head and tail or to extract the failure section instead of only the
tail — e.g., read the whole file into content, detect/regex for failure markers
(stack traces, "ERROR", "FAIL", or CI-specific sections) and return that slice,
or return a concatenation of the first N and last M characters (keeping variable
name content and the with-open scope intact so callers relying on content still
work).
- Around line 11-16: The code uses broad "except Exception" and silent "except
Exception: pass" around file reads and generation which hides real errors;
update the try/excepts around the log_file read (where content = f.read()), the
repo_context.xml read, and the generation block so they catch specific
exceptions (e.g., FileNotFoundError, PermissionError, OSError,
UnicodeDecodeError, requests.exceptions.RequestException as appropriate) instead
of Exception, and on failure either log the exception with its message/traceback
(include the exception variable) and return a clear error string or
re-raise/propagate after wrapping; remove silent "pass" and ensure the code that
uses repo context receives a visible error or fallback with logged details
rather than swallowing failures.
Added safety nets in workflow and improved prompt in suggestion script. Fixes #524
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
.github/workflows/reusable-ai-triage.yml (1)
60-60: Quote$RUN_IDand$REPOin the shell command.Unquoted variables are susceptible to word splitting and glob expansion.
♻️ Proposed fix
- gh run view $RUN_ID --repo $REPO --log-failed > failed_logs.txt + gh run view "$RUN_ID" --repo "$REPO" --log-failed > failed_logs.txt🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/reusable-ai-triage.yml at line 60, The shell command uses unquoted variables RUN_ID and REPO which can cause word-splitting or glob expansion; update the gh run view invocation so the variables are quoted (use "$RUN_ID" and "$REPO") in the command that currently reads gh run view $RUN_ID --repo $REPO --log-failed > failed_logs.txt to ensure safe expansion and prevent accidental splitting or globbing..github/scripts/ai_suggest.py (1)
32-33: Silentexcept: passmasks repo-context read errors — add a log.When
repo_context.xmlexists but is unreadable, the silentpassmeans the script falls back to "No repository context available." with no indication of the cause. Ruff flags this as S110/BLE001.♻️ Proposed fix
+import sys ... try: with open("repo_context.xml", "r") as f: repo_context = f.read() - except Exception: - pass + except Exception as e: + print(f"Warning: could not read repo_context.xml: {e}", file=sys.stderr)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/scripts/ai_suggest.py around lines 32 - 33, Replace the silent "except Exception: pass" that swallows errors when reading "repo_context.xml" with a logged failure: inside the except block for the try that opens/parses repo_context.xml, call the module's logger (e.g., logger.exception or logging.exception) or process_logger.error with the exception info so the error and stacktrace are recorded (use logger.exception to include traceback); keep falling back to "No repository context available" but ensure the exception is logged for debugging.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.github/scripts/ai_suggest.py.github/workflows/reusable-ai-triage.yml
🧰 Additional context used
🪛 Ruff (0.15.1)
.github/scripts/ai_suggest.py
[warning] 15-15: Do not catch blind exception: Exception
(BLE001)
[error] 32-33: try-except-pass detected, consider logging the exception
(S110)
[warning] 32-32: Do not catch blind exception: Exception
(BLE001)
[warning] 79-79: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: Python==3.11 | django~=5.1.0
- GitHub Check: Python==3.11 | django~=4.2.0
- GitHub Check: Python==3.12 | django~=5.1.0
- GitHub Check: Python==3.10 | django~=4.2.0
- GitHub Check: Python==3.10 | django~=5.2.0
- GitHub Check: Python==3.11 | django~=5.0.0
- GitHub Check: Python==3.11 | django~=5.2.0
- GitHub Check: Python==3.12 | django~=5.2.0
- GitHub Check: Python==3.13 | django~=5.1.0
- GitHub Check: Python==3.10 | django~=5.1.0
- GitHub Check: Python==3.13 | django~=5.2.0
- GitHub Check: Python==3.12 | django~=4.2.0
- GitHub Check: Python==3.10 | django~=5.0.0
- GitHub Check: Python==3.12 | django~=5.0.0
🔇 Additional comments (2)
.github/scripts/ai_suggest.py (2)
7-16: LGTM — minor broad-exception catch noted.The tail-truncation (
content[-15000:]) and theos.path.existsguard are correct. Theexcept Exceptionon line 15 is flagged by Ruff (BLE001) but is an acceptable top-level fallback here since the return value propagates a descriptive error string.
69-69:gemini-2.5-flash-liteis a valid Gemini API model identifier. No action required — the model is officially supported as a stable/GA model by Google's Gemini API.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/scripts/ai_suggest.py:
- Line 22: Change the diagnostic prints to write to stderr so only the final
report goes to stdout: replace the stdout prints that emit "Skipping: No API Key
found.", "Generation returned an empty response; skipping report.", and
"Generation Failed: {e}" with calls that write to sys.stderr (e.g., print(...,
file=sys.stderr) or sys.stderr.write(... + "\n")), ensure sys is imported at top
if not already, and leave the actual report output (the variable/statement that
prints the report at line 76) as the only stdout write; update the messages in
the same functions/locations that currently use those exact strings.
In @.github/workflows/reusable-ai-triage.yml:
- Line 34: The workflow step currently pins the reusable workflow to a hardcoded
ref "ref: issues/524-ci-failure-bot", which will break after that branch is
deleted; update the Checkout Reusable Workflow step to use a stable ref (for
example replace that literal with the target branch like "ref: master" or "ref:
main") or make it configurable (use a workflow input or GitHub context such as
github.ref) so the reusable workflow reference won't be removed when the feature
branch is deleted.
- Line 68: The repomix invocation has a typo in its --output argument producing
"repo_context.xmlrepo_context.xml", so ai_suggest.py's
os.path.exists("repo_context.xml") check always fails; update the repomix
command (the line containing repomix --include ... --output ...) to write the
correct single path (e.g., --output ../repo_context.xml or --output
repo_context.xml as appropriate for the job's working directory) so the file
name matches what ai_suggest.py expects and the Pack Context step can find the
repository context.
---
Duplicate comments:
In @.github/workflows/reusable-ai-triage.yml:
- Around line 89-93: The workflow currently posts a new PR comment every run
using gh pr comment "$PR_NUM" --repo "$REPO" --body-file solution.md with no
deduplication; fix this by adding a deterministic marker or hash to the comment
body (e.g., append <!-- ai-triage-id: <hash> --> to solution.md) and, before
calling gh pr comment, query existing PR comments via gh api (or gh pr view/gh
api repos/:owner/:repo/issues/:PR_NUM/comments) to see if a comment with that
marker/hash already exists and skip posting (or update that comment) if found;
modify the step that writes/uses solution.md and the condition around gh pr
comment "$PR_NUM" --repo "$REPO" --body-file solution.md to perform the lookup
and conditional post/update.
- Around line 65-68: The Pack Context step currently uses repomix --include
"**/*" which serializes the entire repo (including secrets); change it to only
package PR-changed files or a safe include list and add strict ignore patterns
for sensitive files: replace --include "**/*" with either a generated list of
changed files (e.g., from git diff --name-only) piped into repomix, or a
restrictive include like --include "src/**,lib/**,package.json" and expand
--ignore to explicitly exclude
**/.env*,**/*.key,**/*.pem,**/*.crt,**/*.p12,**/secrets*,**/private-keys/**,**/.git/**
and node_modules; update the "Pack Context" step and the repomix invocation
accordingly to ensure only intended files are sent to the external API.
---
Nitpick comments:
In @.github/scripts/ai_suggest.py:
- Around line 32-33: Replace the silent "except Exception: pass" that swallows
errors when reading "repo_context.xml" with a logged failure: inside the except
block for the try that opens/parses repo_context.xml, call the module's logger
(e.g., logger.exception or logging.exception) or process_logger.error with the
exception info so the error and stacktrace are recorded (use logger.exception to
include traceback); keep falling back to "No repository context available" but
ensure the exception is logged for debugging.
In @.github/workflows/reusable-ai-triage.yml:
- Line 60: The shell command uses unquoted variables RUN_ID and REPO which can
cause word-splitting or glob expansion; update the gh run view invocation so the
variables are quoted (use "$RUN_ID" and "$REPO") in the command that currently
reads gh run view $RUN_ID --repo $REPO --log-failed > failed_logs.txt to ensure
safe expansion and prevent accidental splitting or globbing.
All diagnostic/error messages now use sys.stderr and fixed repomix command. Fixes #524
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
.github/scripts/ai_suggest.py (3)
28-32:repo_contexthas no size cap, unlikeerror_log.
error_logis sensibly capped at 15 000 characters (line 15), butrepo_contextis read in full. For larger repos, repomix output can be several MB, inflating token costs significantly. Consider applying the same truncation strategy.💡 Proposed fix
- repo_context = f.read() + repo_context = f.read()[-50000:] # cap to ~12k tokens🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/scripts/ai_suggest.py around lines 28 - 32, The repo_context variable is read in full and can be very large; mirror the error_log truncation by capping repo_context (e.g., to 15_000 chars) or reusing the same truncation helper when reading "repo_context.xml": after reading into repo_context in the block that assigns repo_context, apply the same substring/truncate logic or function used for error_log to limit its length and optionally add an ellipsis to indicate truncation.
33-34: Silentexcept: passdrops repo-context read errors without any signal.If
repo_context.xmlexists but fails to read (encoding error, permissions issue, etc.), the exception is silently discarded and the stale default"No repository context available."is used with no indication to the operator. At minimum, log tosys.stderr.🔧 Proposed fix
- except Exception: - pass + except Exception as e: + print(f"Warning: Could not read repo_context.xml: {e}", file=sys.stderr)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/scripts/ai_suggest.py around lines 33 - 34, The silent except in the ai_suggest.py block swallows errors when reading repo_context.xml; change the bare "except Exception: pass" to "except Exception as e" and surface the failure (e.g., write a descriptive message and the exception info to sys.stderr or call logging.exception) so operators see encoding/permission/read errors instead of silently falling back to "No repository context available."; update the except block around the repo context read logic to include the exception variable and an stderr/log call.
36-37: Wasted API call when no logs are present.When
failed_logs.txtis absent,get_error_logs()returns the sentinel string"No failed logs found.".main()does not check for this, so it still initialises the client, builds the prompt, and makes a paid Gemini API call, potentially posting a meaningless or confusing report.💡 Proposed fix
error_log = get_error_logs() + if error_log == "No failed logs found.": + print("Skipping: No failure logs to analyse.", file=sys.stderr) + return system_instruction = """🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/scripts/ai_suggest.py around lines 36 - 37, main() currently always proceeds after calling get_error_logs() even when it returns the sentinel "No failed logs found.", causing unnecessary client init and Gemini API calls; update main() to check the return of get_error_logs() (compare to the exact sentinel "No failed logs found.") and short-circuit (return or exit) when no logs are present so you skip initializing the Gemini client, building the prompt, and making any paid API call (i.e., do not call build_prompt(), init/instantiate the Gemini client, or send the request when error_log equals the sentinel).
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.github/scripts/ai_suggest.py.github/workflows/reusable-ai-triage.yml
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/reusable-ai-triage.yml
🧰 Additional context used
🪛 Ruff (0.15.1)
.github/scripts/ai_suggest.py
[warning] 16-16: Do not catch blind exception: Exception
(BLE001)
[error] 33-34: try-except-pass detected, consider logging the exception
(S110)
[warning] 33-33: Do not catch blind exception: Exception
(BLE001)
[warning] 83-83: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/scripts/ai_suggest.py:
- Around line 57-66: The prompt currently interpolates untrusted variables
error_log and repo_context directly into the f-string assigned to prompt in
ai_suggest.py; fix by wrapping both injected blocks with explicit, unique
boundary markers (e.g., "-----BEGIN CI LOG-----" / "-----END CI LOG-----" and
"-----BEGIN REPO CONTEXT-----" / "-----END REPO CONTEXT-----") when building
prompt, and pre-sanitize error_log and repo_context by removing or escaping
control-like lines and model-directive patterns (e.g., lines starting with
"ignore", "do not", "assistant:", "user:", "system:", or containing "write:",
"respond with", "generate", or XML/HTML tags that could act as directives)
before interpolation; update the prompt construction around the prompt variable
so it uses the delimited, sanitized strings instead of raw
error_log/repo_context.
---
Nitpick comments:
In @.github/scripts/ai_suggest.py:
- Around line 28-32: The repo_context variable is read in full and can be very
large; mirror the error_log truncation by capping repo_context (e.g., to 15_000
chars) or reusing the same truncation helper when reading "repo_context.xml":
after reading into repo_context in the block that assigns repo_context, apply
the same substring/truncate logic or function used for error_log to limit its
length and optionally add an ellipsis to indicate truncation.
- Around line 33-34: The silent except in the ai_suggest.py block swallows
errors when reading repo_context.xml; change the bare "except Exception: pass"
to "except Exception as e" and surface the failure (e.g., write a descriptive
message and the exception info to sys.stderr or call logging.exception) so
operators see encoding/permission/read errors instead of silently falling back
to "No repository context available."; update the except block around the repo
context read logic to include the exception variable and an stderr/log call.
- Around line 36-37: main() currently always proceeds after calling
get_error_logs() even when it returns the sentinel "No failed logs found.",
causing unnecessary client init and Gemini API calls; update main() to check the
return of get_error_logs() (compare to the exact sentinel "No failed logs
found.") and short-circuit (return or exit) when no logs are present so you skip
initializing the Gemini client, building the prompt, and making any paid API
call (i.e., do not call build_prompt(), init/instantiate the Gemini client, or
send the request when error_log equals the sentinel).
Make step 1 for bot generation and use it to fetch logs. Fixes #524
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/reusable-ai-triage.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: Python==3.11 | django~=5.1.0
- GitHub Check: Python==3.12 | django~=5.1.0
- GitHub Check: Python==3.11 | django~=4.2.0
- GitHub Check: Python==3.12 | django~=5.0.0
- GitHub Check: Python==3.13 | django~=5.2.0
- GitHub Check: Python==3.10 | django~=5.0.0
- GitHub Check: Python==3.13 | django~=5.1.0
- GitHub Check: Python==3.12 | django~=4.2.0
- GitHub Check: Python==3.10 | django~=5.2.0
- GitHub Check: Python==3.12 | django~=5.2.0
- GitHub Check: Python==3.10 | django~=5.1.0
- GitHub Check: Python==3.11 | django~=5.2.0
- GitHub Check: Python==3.10 | django~=4.2.0
- GitHub Check: Python==3.11 | django~=5.0.0
🔇 Additional comments (5)
.github/workflows/reusable-ai-triage.yml (5)
30-35: LGTM!Token generation is correctly placed before any steps that need it.
44-49: LGTM.Checking out the PR code at the specific commit SHA for analysis is appropriate.
51-59: LGTM!Python version and pinned dependency versions look good.
77-81: LGTM.Running the analysis script from the trusted checkout (not the PR code) is the right security boundary.
72-75: The--ignorepatterns passed via CLI cannot be overridden by local config files.repomixmerges settings in this order: defaults → config-file values → CLI overrides, meaning the--ignore "**/*.lock,**/*.json,**/.env*,**/*.secret"flag takes precedence over any localrepomix.config.*file in thepr_codedirectory.While
repomixdoes search for and read local config files from the current directory, the specific security concern about ignore patterns being overridden is not valid.Likely an incorrect or invalid review comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/reusable-ai-triage.yml:
- Around line 5-17: Add a new workflow input named base_repo alongside
pr_number, head_sha, head_repo, and run_id; then update the "Fetch CI Logs" step
(which currently runs gh run view $RUN_ID --repo $HEAD_REPO) to use base_repo
for the --repo flag and update the "Post Comment" step (which runs gh pr comment
"$PR_NUM" --repo $HEAD_REPO) to also use base_repo; ensure the new input is
required and referenced via the same variable name (base_repo) in both the gh
run view and gh pr comment invocations so upstream repo operations use the
repository that owns the PR and CI run.
---
Duplicate comments:
In @.github/workflows/reusable-ai-triage.yml:
- Around line 37-42: The checkout step named "Checkout Reusable Workflow"
currently hardcodes ref: issues/524-ci-failure-bot which will break after merge;
update that ref value to master (or to a persistent branch/tag) in the step that
uses actions/checkout@v4 (the block with name: Checkout Reusable Workflow and
path: trusted_scripts) so the reusable workflow checks out the permanent branch
instead of the temporary feature branch.
- Around line 66-70: The step aborts if the `gh run view $RUN_ID --repo $REPO
--log-failed` command exits non-zero, so change the run line so the `gh run
view` invocation is allowed to fail without stopping the shell (e.g., append `||
true` to that command) so the subsequent empty-file guard that writes to
`failed_logs.txt` can run; locate the `gh run view ... > failed_logs.txt`
invocation in the workflow and make it tolerant of errors so the guard lines
that check `failed_logs.txt` execute.
- Around line 83-93: The workflow currently always creates a new PR comment with
gh pr comment "$PR_NUM" --repo "$REPO" --body-file solution.md which causes
duplicate bot comments on re-runs; change the Post Comment step to detect an
existing bot comment and edit it instead (for example, use gh pr view/list
--repo "$REPO" to find the last comment by the bot or gh pr comment --edit-last)
and, if found, update that comment with the contents of solution.md, otherwise
create a new comment—update references in the step using the same env vars
(GH_TOKEN, PR_NUM, REPO) and the body-file solution.md so the logic replaces or
edits the previous bot comment rather than always posting a new one.
Added base_repo input to prevent fetch logs failure. Fixes #524
There was a problem hiding this comment.
🧹 Nitpick comments (2)
.github/workflows/reusable-ai-triage.yml (2)
33-35: Consider pinning third-party actions to commit SHAs.
actions/create-github-app-token@v1,actions/checkout@v4, andactions/setup-python@v5use mutable version tags. A compromised tag push could silently inject malicious code into the workflow, which is particularly impactful forcreate-github-app-tokensince it handlesAPP_IDandPRIVATE_KEY.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/reusable-ai-triage.yml around lines 33 - 35, The workflow uses mutable tags for third-party actions (actions/create-github-app-token@v1, actions/checkout@v4, actions/setup-python@v5) which are risky; update the steps (e.g., the step with id generate-token) to reference the corresponding pinned commit SHAs instead of the floating tags—find the exact commit SHA for each action repo (create-github-app-token, checkout, setup-python) and replace the `@vX` tag with @<sha> for each uses: entry, and optionally add a comment with the action repo+tag you pinned for future audits.
29-31: Set atimeout-minuteson the job to avoid stalled runners.Without it, a hung Gemini API call or Python process can hold the runner for the full 6-hour GitHub default.
⏱️ Proposed fix
jobs: analyze: runs-on: ubuntu-latest + timeout-minutes: 15🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/reusable-ai-triage.yml around lines 29 - 31, Add a timeout-minutes setting to the GitHub Actions job named analyze to prevent hung processes from tying up runners; edit the jobs -> analyze block and add a timeout-minutes (e.g., 30) key at the same indentation level as runs-on so the job will be cancelled after the specified number of minutes.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/reusable-ai-triage.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: Python==3.11 | django~=5.0.0
- GitHub Check: Python==3.13 | django~=5.2.0
- GitHub Check: Python==3.13 | django~=5.1.0
- GitHub Check: Python==3.12 | django~=5.2.0
- GitHub Check: Python==3.12 | django~=5.0.0
- GitHub Check: Python==3.11 | django~=4.2.0
- GitHub Check: Python==3.10 | django~=4.2.0
- GitHub Check: Python==3.11 | django~=5.2.0
- GitHub Check: Python==3.10 | django~=5.2.0
- GitHub Check: Python==3.12 | django~=5.1.0
- GitHub Check: Python==3.12 | django~=4.2.0
- GitHub Check: Python==3.11 | django~=5.1.0
- GitHub Check: Python==3.10 | django~=5.0.0
- GitHub Check: Python==3.10 | django~=5.1.0
🔇 Additional comments (1)
.github/workflows/reusable-ai-triage.yml (1)
59-62: Both versions exist and are installable; no issues found.Registry checks confirm:
repomix@0.3.5exists on npm and is installablegoogle-genai==1.16.1exists on PyPI and is installableai_suggest.py(line 70) explicitly usesgemini-2.5-flash-lite, which the installed version supportsThe pinned versions are compatible with the workflow's requirements.
Likely an incorrect or invalid review comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In @.github/workflows/reusable-ai-triage.yml:
- Around line 40-44: The checkout step "Checkout Reusable Workflow" currently
pins the repository ref to the feature branch "issues/524-ci-failure-bot";
update the step that uses actions/checkout@v4 (the block with "repository:
openwisp/openwisp-utils" and "ref: issues/524-ci-failure-bot") to use "ref:
master" so callers of the reusable workflow don't fail when the feature branch
is deleted.
- Around line 64-73: The step running the gh run view command can exit the job
on non-zero status so the empty-file guard for failed_logs.txt never runs;
update the "Fetch CI Logs" run block to ensure gh run view failures are
tolerated (e.g., append a no-fail suffix like "|| true" or temporarily disable
exit-on-error) when invoking gh run view $RUN_ID --repo $REPO --log-failed so
the subsequent check for [ ! -s failed_logs.txt ] always executes and
failed_logs.txt is created when appropriate; reference the gh run view
invocation, the RUN_ID/REPO env vars, and the failed_logs.txt filename when
making the change.
- Around line 86-96: The "Post Comment" step currently always runs gh pr comment
which appends duplicate AI analysis messages; change it to detect and update the
bot's previous comment instead of always creating a new one. Modify the step
that uses GH_TOKEN/PR_NUM/REPO and solution.md so it first searches existing
comments for the bot (via gh api or gh pr view comments filtered by actor) and
if found uses gh api to update that comment or gh pr comment --edit-last,
otherwise creates a new comment; ensure the logic references the existing "Post
Comment" step and the gh pr comment command so the job edits the prior bot
comment when present.
- Around line 75-78: The workflow currently packs the entire repo via the "Pack
Context" step using the repomix invocation; instead scope the pack to only
changed files and run a pre-flight secret scan: first add a step before the
"Pack Context" step that runs a secrets scanner (e.g., gitleaks or trufflehog)
against the PR diff (use the PR merge-base / git diff to limit scope) and fail
the job on findings, then change the repomix invocation in the "Pack Context"
step so it consumes only the changed file list (generate a file-list via git
diff --name-only or the GitHub PR files API and pass that to repomix --include
or an equivalent repomix option) instead of packing the entire tree; reference
the repomix command in this step and the new secrets-scan step names when
implementing.
---
Nitpick comments:
In @.github/workflows/reusable-ai-triage.yml:
- Around line 33-35: The workflow uses mutable tags for third-party actions
(actions/create-github-app-token@v1, actions/checkout@v4,
actions/setup-python@v5) which are risky; update the steps (e.g., the step with
id generate-token) to reference the corresponding pinned commit SHAs instead of
the floating tags—find the exact commit SHA for each action repo
(create-github-app-token, checkout, setup-python) and replace the `@vX` tag with
@<sha> for each uses: entry, and optionally add a comment with the action
repo+tag you pinned for future audits.
- Around line 29-31: Add a timeout-minutes setting to the GitHub Actions job
named analyze to prevent hung processes from tying up runners; edit the jobs ->
analyze block and add a timeout-minutes (e.g., 30) key at the same indentation
level as runs-on so the job will be cancelled after the specified number of
minutes.
Added explicit boundary markers so the model can distinguish instructions from data. Fixes #524
What is done:
TODO: [COMPLETED]
Important:Need to change checkout reusable workflow ref to master before merging in master. |
Added dynamic header instructions in system instructions prompt. Fixes #524
|
This is how the Caller looks like: |
|
@coderabbitai review |
Improved logs handling by adding truncation and changed repomix command. Fixes #524
Updated workflows to use latest genai sdk version and added inbuilt client retry logic. Fixes #524
|
1. Error string from # get_error_logs() can return this on exception:
return f"Error reading logs: {e}"
# But main() only guards against this exact string:
if error_log == "No failed logs found.":If the file exists but reading it throws an exception, the error message string (e.g., 2. Workflow fallback string also bypasses the Python guard → bogus comment — if [ ! -s failed_logs.txt ]; then
echo "No failed logs found or inaccessible run." > failed_logs.txt
fiThis writes This triggers in a realistic scenario: 3. Fix: either use 4. sys.exit(1) # on empty response
sys.exit(1) # on API exceptionThe "Post Comment" step (line 97) has correct logic to gracefully skip when if [ ! -s solution.md ]; then
echo "AI analysis produced no output; skipping comment."
exit 0
fiBut Moderate — Security5. Outdated action versions in a secret-handling workflow — This workflow handles a GitHub App private key and a Gemini API key — it is the most sensitive workflow in the repo. Using older mutable tags increases supply-chain attack surface. Should at minimum match the repo convention ( 6. This is both:
The same truncation strategy used for logs should be applied here, or at minimum a hard cap. 7. No sanitization or size cap on AI-generated output before posting to PR — 8. No Minor — Inefficiencies9. Repomix packs entire repo with many irrelevant file types — For diagnosing CI failures, most of these file types ( 10. No concurrency control — duplicate comments possible — If a contributor force-pushes or CI is re-triggered on the same PR, multiple instances of this workflow can run simultaneously with no guard. Each posts its own comment, resulting in duplicate or conflicting bot comments on the same PR. A concurrency:
group: ai-triage-${{ inputs.pr_number }}
cancel-in-progress: true |
Added truncations in ai suggestion script and concurrency in ci. Fixes #524
Pin to hash for v2.0.6 to prevent supply chain vulnerability. Fixes #524
nemesifier
left a comment
There was a problem hiding this comment.
Great work @stktyagi, it looks almost ready to me I have 2 comments below.
Improved system instructions for commit message and added model env var. Fixes #524
Fixed system instructions to address multiple issues. Fixes #524
nemesifier
left a comment
There was a problem hiding this comment.
actions/ci-failure-bot > actions/bot-ci-failure
workflows/ci-failure-bot.yml > workflows/bot-ci-failure.yml
workflows/reusable-ci-failure-bot.yml > workflows/reusable-bot-ci-failure.yml
Refactored file and directory names according to new convention. Fixes #524
nemesifier
left a comment
There was a problem hiding this comment.
I have a few remaining comments below.
|
@coderabbitai review, please focus only on real issues: security, bugs, performance issues. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 39755
One minor non-blocking note: in the "Fetch CI Logs" step, gh run view $RUN_ID --repo $REPO --log-failed > failed_logs.txt || trueSince No critical security, bug, or performance issues remain. [approve] |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chainLet me read the current state of the files before reviewing. 🏁 Script executed: Length of output: 150 🏁 Script executed: Length of output: 109 🏁 Script executed: Length of output: 986 🏁 Script executed: Length of output: 32996
Bug (minor robustness): In try:
with open(filepath, "r", encoding="utf-8") as f:
content = f.read()
except UnicodeDecodeError:
continueOther IO errors ( except (UnicodeDecodeError, OSError):
continuePerformance/cost (minor): 1.5 MB repo context per CI failure
Both items are non-critical. Approving. [approve] |
Added google-genai for installation and fixed model nomenclature. Fixes #524
Fixed null check and fallback to default gemini model Fixes #524
nemesifier
left a comment
There was a problem hiding this comment.
The code is clean enough, testing samples in openwisp/bot-testing-ground#30 and openwisp/bot-testing-ground#40 show not only that the outcome matches my expectations but also that security was taken seriously.
I am merging! 👏🙏
Next steps: add this to all repos!

Created reusable ai ci failure bot helper workflow and analyzing script for the bot.
Fixes #524
Checklist
Reference to Existing Issue
Closes #524.
Description of Changes
This PR introduces a CI failure bot that posts suggestions related to formatting issues, code fixes and CI failure reasons under the PR for a particular contributor.