[copilot-session-insights] Daily Copilot Agent Session Analysis — 2026-03-31 #23680
Closed
Replies: 1 comment
-
|
This discussion was automatically closed because it expired on 2026-04-01T11:50:25.986Z.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Executive Summary
Key Metrics
📈 Session Trends Analysis
Completion Patterns
Completion rate has climbed from 42.1% (Mar 30) to 46.9% (Mar 31), continuing recovery from a dip on Mar 20 (60%). The increase in
action_requiredsessions is largely attributable to review agents (Security Review, PR Nitpick Reviewer) operating by design — these aren't failures but review completions. Genuine failures dropped to just 1 today vs. 3 yesterday.Duration & Efficiency
Average session duration continues to fall (3.5m → 2.8m → 2.4m) while the number of copilot agent sessions is growing rapidly (5 → 19 → 32). The low median (0.6 min) versus higher mean (2.4 min) shows a bimodal distribution: fast review agents (<1 min) alongside longer-running agents like Q and /cloclo (~3-6 min).
Branch Analysis
Branch-level breakdown (3 branches)
copilot/fix-yaml-indentation-bug— 12 sessions, 10 success (83%) ✅copilot/investigate-documentation-unbloat-failure-again— 8 sessions, 2 success, 6 action_requiredaction_required— expected behavior for review bots flagging itemscopilot/update-cli-mcp-versions— 30 sessions, 11 success, 10 skipped, 6 action_required, 2 failure, 1 cancelledSuccess Factors ✅
Targeted bug fix tasks:
copilot/fix-yaml-indentation-bugachieved 83% success rate. Specific, well-scoped tasks with a single clear objective outperform open-ended investigations.PR comment response tasks:
Addressing comment on PR #23644= 100% success (1/1). Consistent with prior data (100% across all observed runs)./cloclo agent efficiency: 4/6 sessions succeeded (67%), higher than Q (40%) and Scout (60%). Short task focus likely helps.
Failure Signals⚠️
Archie consistently skipping: 3 sessions across 2 branches, all
skipped. Suggests an eligibility condition that Archie checks is not being met. This is silent — no visible error, just skipped work.Changeset Generator failure: 1 session,
failure(3.3 min duration). Failures in tooling/automation agents tend to block downstream workflows.Review agents on investigation branches: 6/8 sessions on
investigate-documentation-unbloat-failure-againreturnedaction_required. While expected for review bots, the volume suggests the PR needs significant human attention before merging.Q agent cancellation: 1 Q session cancelled on
copilot/update-cli-mcp-versions. Possible timeout or competing run triggering cancellation.Prompt Quality Analysis 📝
High-Quality Prompt Characteristics
Low-Quality Prompt Characteristics
investigate-documentation-unbloat-failure-again— the double "again" signals repeated failure without new context, making it hard for agents to know what changedNotable Observations
Loop Detection
Tool Usage
action_requiredis their success stateContext Issues
Trends Over Time (3-run snapshot)
Actionable Recommendations
For Users Writing Task Descriptions
Prefer specific over investigative framing: Replace "investigate X failure again" with "identify root cause of X: [specific symptoms]". Include what's already been tried.
investigate-documentation-unbloat-failure-againfix-documentation-build-timeout-caused-by-large-asset-importsReference concrete acceptance criteria: Link to the failing CI step, error message, or expected output. Tasks like "Addressing comment on PR #N" succeed because the comment provides exact expected behavior.
Avoid compound tasks: Single-objective tasks (bug fix, PR comment) achieve 70-100% success. Multi-step investigations (update-cli-mcp-versions) require more iterations and have higher retry/skip rates.
For System Improvements
Archie eligibility visibility (High impact): Archie silently skips with no explanation in metadata. Adding a log message or comment on the PR explaining why it skipped would help users understand what's needed.
Conversation log OAuth integration (High impact): The OAuth gap blocking conversation transcript access persists for 10+ days. True behavioral analysis (loop detection, reasoning quality, tool usage patterns) requires this data. Without it, insights are limited to metadata patterns.
Action-required classification (Medium impact): Distinguish between review-bot
action_required(expected/success) vs. true action-required blockers. Currently both show the same status, inflating apparent failure rates.For Tool Development
Statistical Summary
Next Steps
investigate-documentation-unbloat-failure-againPR — 6 reviewers flagged actionable itemsupdate-cli-mcp-versionsAnalysis generated automatically on 2026-03-31
Run ID: §23795150206
Workflow: Copilot Session Insights
Beta Was this translation helpful? Give feedback.
All reactions