Skip to content

Adopt Production-Grade Loop: minimax implementer + 5-reviewer cycle#66

Open
ktsaou wants to merge 4 commits into
masterfrom
protocol/production-grade-loop
Open

Adopt Production-Grade Loop: minimax implementer + 5-reviewer cycle#66
ktsaou wants to merge 4 commits into
masterfrom
protocol/production-grade-loop

Conversation

@ktsaou

@ktsaou ktsaou commented Jun 10, 2026

Copy link
Copy Markdown
Member

Summary

Encode the operator's new operating model in the project contract:

  • AGENTS.md: new "Production-Grade Loop" section, Hard Rule SOW-0001 Chunk 10: pricing data + temporal tiers + refresh script #11, updated branch-protection + merge workflow.
  • project-workflow SKILL: roles table (CTO / implementer=minimax / 5 reviewers), updated cycle, Step 4/6/7/8 reflect the new protocol.
  • project-delegation SKILL: "The implementer is minimax" section, backup rotation order, new forbidden block wording.
  • project-second-opinions SKILL: the 5-reviewer set (glm, mimo, minimax, qwen, deepseek), PRODUCTION GRADE vote, P0/P1/P2/P3 stop conditions, claim-verification steps, deprecation note for codex/gemini on the production loop.

Why

  • Operator requested a tighter feedback loop: every non-trivial change gets five independent opinions.
  • "I don't trust the development skills of the implementer. You shouldn't either." → implementer is one model; reviewers are five.
  • Implementer ≠ Reviewer. The minimax instance that writes code is not the same instance that reviews it.
  • The protocol must survive compactions, so the contract is in AGENTS.md and the runtime enforcement is in the three skills.

Out of scope

  • No production code changed.
  • No SOW affected.
  • No CI changes.
  • The pre-existing in-flight PR Complete SOW-0055 ingest write-model cleanup #63 (SOW-0055) keeps its current cubic P2 finding; that handling happens on a separate commit/merge after this lands.

How the new protocol works (CTO/integration check)

After this lands, every non-trivial SOW follows this loop:

  1. CTO updates specs + failing tests
  2. CTO delegates to minimax (implementer)
  3. minimax writes code, runs local gates, returns diff
  4. CTO verifies the diff (read, re-run tests, re-run gates)
  5. CTO runs the 5 reviewers (glm, mimo, minimax-fresh, qwen, deepseek) in parallel
  6. Each reviewer votes PRODUCTION GRADE or NEEDS WORK (P0–P3)
  7. CTO verifies every claim, addresses P0/P1/P2, documents P3
  8. CTO merges when 5/5 PG (or only P3 noise) + gates green + CI green

Reviewer will be invoked on this PR per the new protocol.


Summary by cubic

Adopts the Production-Grade Loop (implementer minimax with a 5-reviewer cycle), pins exact model paths, and completes the ingest write-model cleanup with behavior-preserving refactors, stronger protocol rules (P2 vs P3, backup implementer rotation, reviewer availability), and new characterization tests.

  • Refactors

    • Extracted focused helpers in internal/ingest/writer.go (event dispatcher), catalog_migrate.go (table-scoped updates), resolver.go (named transactional steps), rollup_refresh.go (shared bucket pass), and rollup_backfill.go (prep/run/log split) without changing behavior.
    • Added characterization tests for writer apply/log flow, resolver transactional rollback (no stray notify rows), catalog cascade identity/cost migration, and rollup carry-forward across multiple refreshes.
    • Moved SOW-0055 to done/; opened SOW-0060 to track splitting the large writer file. No schema changes.
  • Migration

    • Production-Grade Loop: implementer is minimax; reviewers are glm, mimo, minimax (fresh), qwen, deepseek. CTO runs reviewers; merge only on 5/5 PRODUCTION GRADE with gates/CI green. Protocol lives in AGENTS.md and is enforced by project-workflow, project-delegation, and project-second-opinions.
    • Clarifications: P2 vs P3 split (P2 re-triggers the full cycle; P3 does not); backup implementer rotation removes them from the reviewer set and fills the slot from ad-hoc (codex, gemini, claude, kimi); single reviewer unavailability substitutes from ad-hoc (two+ is a hard stall); SOW review recording clarified; “user” → “operator”; discipline checklist and “Recommended” signals updated.
    • Model pins: minimax = llm-netdata-cloud/minimax-m3-coder; deepseek = llm-netdata-cloud/deepseek-v4-pro; qwen = llm-netdata-cloud/qwen3.7-plus. Roles table lists versions; Hard Rule SOW-0001 Chunk 10: pricing data + temporal tiers + refresh script #11 wording polished.
    • project-second-opinions is a required first check; contract/process changes also run the loop.

Written for commit 3e74b2d. Summary will update on new commits.

Review in cubic

ktsaou added 2 commits June 8, 2026 00:17
Reduce ingest write-model complexity with behavior-preserving helper extraction and characterization coverage. Move SOW-0055 to completed and track residual writer file-size debt in SOW-0060.
Encode the operator's new operating model in the project contract:

- AGENTS.md: new "Production-Grade Loop" section, Hard Rule #11, updated
  branch-protection + merge workflow.
- project-workflow SKILL: roles table (CTO / implementer=minimax / 5
  reviewers), updated cycle, Step 4/6/7/8 reflect the new protocol.
- project-delegation SKILL: "The implementer is minimax" section, backup
  rotation order, new forbidden block wording.
- project-second-opinions SKILL: the 5-reviewer set (glm, mimo, minimax,
  qwen, deepseek), PRODUCTION GRADE vote, P0/P1/P2/P3 stop conditions,
  claim-verification steps, deprecation note for codex/gemini on the
  production loop.

This makes the protocol survive compactions: AGENTS.md is loaded at
startup, and the three skills are the runtime enforcement.

No production code changed. No SOW affected. Pure contract update.
@codacy-production

codacy-production Bot commented Jun 10, 2026

Copy link
Copy Markdown

Not up to standards ⛔

🔴 Issues 4 medium · 5 minor

Alerts:
⚠ 9 issues (≤ 0 issues of at least minor severity)

Results:
9 new issues

Category Results
BestPractice 4 medium
Comprehensibility 5 minor

View in Codacy

🟢 Metrics 140 complexity · 41 duplication

Metric Results
Complexity 140
Duplication 41

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

ktsaou added 2 commits June 10, 2026 10:53
P0 (durable specs were stale):
- specs/workflow.md: Roles section + External Review section rewritten
  to match the 5-reviewer Production-Grade Loop.
- specs/second-opinions.md: TL;DR, "show the user the prompts"
  rule, reviewer table, and ad-hoc subset all updated to the new
  protocol.

P1 (must fix):
- Implementer model pinned to llm-netdata-cloud/minimax-m3-coder
  everywhere (AGENTS.md, project-delegation, project-workflow,
  specs/second-opinions, specs/workflow). The reviewer-minimax uses
  the same model. Reviewer minimax invocation path also pinned.
- Hard Rule #4 now says "5/5 PRODUCTION GRADE" instead of the
  weaker "at least one round".
- Hard Rule #3 now references minimax and the Production-Grade
  Loop (was the generic "spawned subagents").
- "What the operator sees" / Reporting to the Operator: removed
  "file paths with line numbers as evidence" from the operator
  report in project-workflow SKILL — that lives in SOW ## Reviews
  and the PR description only.

P2 (fixed in this PR):
- AGENTS.md stop conditions: P2 wording made explicit ("fix in
  the same PR, re-trigger the cycle; merge only when 5/5 PG or
  only P3 noise remains"); hard-stall language clarified to
  business-level escalation only.
- "When the loop runs" gained a "Contract / process changes"
  bullet: AGENTS.md, project skills, specs, CI config all go
  through the 5-reviewer cycle.
- project-workflow SKILL: model name in Roles table updated to
  match; cross-reference now specifies the "Production-Grade
  Loop" section.
- project-delegation SKILL: implementer vs reviewer wording
  differentiated.
- project-second-opinions SKILL: ad-hoc reviewer table no
  longer lists the 5 production reviewers (they're reserved
  for the production cycle); the contradictory
  "are-not-used-for-ad-hoc" note is gone.
- project-second-opinions SKILL: deepseek invocation path fixed
  to llm-netdata-cloud/deepseek-v4-pro.
- project-second-opinions SKILL: SOW review template gained
  the "Vote PRODUCTION GRADE / NEEDS WORK" instruction so
  reviewer responses are machine-parseable.
- Required First Checks: project-second-opinions added to the
  minimum load list (it's the runtime enforcement of the loop).

P3 (documented, no fix this round):
- Spec → Test → Code Protocol updated to reference the
  5-reviewer cycle.
- "Delegation Protocol" updated to mention the 5-reviewer
  cycle is delegated too — but only the CTO runs it.
- Loop step 8 uses → consistently.
…ementer

P1 fixes (must fix before merge):
- Stop conditions split: P2 and P3 are now distinct bullets. P2 explicitly
  requires re-triggering the full 5-reviewer cycle; P3 does not. The
  previous grouping "P2/P3 NEEDS WORK → fix in this PR; merge when 5/5
  PG or only P3 noise remains" hid the re-trigger requirement for P2
  and contradicted the P2 definition. Applied in AGENTS.md and
  project-second-opinions SKILL.
- Backup implementer rotation: clarified that the rotated reviewer is
  REMOVED from the 5-reviewer cycle for that PR (so the implementer is
  not also reviewing their own work) and the 5-reviewer set is filled
  by substituting from the ad-hoc set (codex/gemini/claude/kimi).
  Applied in AGENTS.md and project-delegation SKILL.

P2 fixes (fixed in this PR):
- "Reviewer unavailability" paragraph added to AGENTS.md and
  project-second-opinions SKILL: a single unavailable reviewer is
  substituted from the ad-hoc set with a SOW ## Reviews log entry;
  two or more simultaneous unavailability escalates as a hard stall.
- Recording Reviews section in spec/second-opinions.md rewritten to
  match the SKILL workflow: reviewer attribution + vote, claim
  verification verdict, fix applied, final PG count. No raw prompts
  or reviewer output in the SOW.
- "the user" -> "the operator" in spec/second-opinions.md.
- "Recommended" section in spec/second-opinions.md reframed as
  "signals that the change is non-trivial and therefore mandatory
  under the 5-reviewer cycle" (was blurring mandatory/optional).
- Discipline checklist in spec/workflow.md updated to "5/5 PRODUCTION
  GRADE, or only P3 noise with documented disposition".
- Ad-hoc reviewer set paragraph in project-second-opinions SKILL
  simplified (removed the self-contradictory "reserved but CTO may
  still invoke" wording).
- qwen model bumped to llm-netdata-cloud/qwen3.7-plus (matches
  workstation litellm config; the "always pin to latest stable"
  policy applies). Applied in spec/second-opinions.md and
  project-second-opinions SKILL.

P3 polish (fixed in this PR):
- project-workflow SKILL roles table now includes the implementer
  model version and the reviewer model versions.
- AGENTS.md Hard Rule #11 wording cleaned up: "minimax" no longer
  appears twice in the reviewer list; uses "minimax implementer" /
  "minimax reviewer" labels.

No new regressions. No production code changed.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 19 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="internal/ingest/sow55_characterization_writer_test.go">

<violation number="1" location="internal/ingest/sow55_characterization_writer_test.go:107">
P2: The JSON path used to assert removal of `attr.x` is incorrect, so the test can pass even if `attr.x` was not removed.</violation>
</file>

<file name="internal/ingest/sow55_characterization_resolver_test.go">

<violation number="1" location="internal/ingest/sow55_characterization_resolver_test.go:96">
P2: Rollback assertion for `root_session_id` is under-specified and can false-pass on incorrect values.</violation>
</file>

Tip: cubic can generate docs of your entire codebase and keep them up to date. Try it here.

Re-trigger cubic

if gotCN != "parent:agent:xyz" {
t.Errorf("childNativeId stash = %q, want %q (must survive nil-extras re-emit)", gotCN, "parent:agent:xyz")
}
gotAttr := scanString(t, db, `SELECT IFNULL(json_extract(extras_json,'$.attr.x'),'') FROM ops WHERE id=?`, opID)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The JSON path used to assert removal of attr.x is incorrect, so the test can pass even if attr.x was not removed.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/ingest/sow55_characterization_writer_test.go, line 107:

<comment>The JSON path used to assert removal of `attr.x` is incorrect, so the test can pass even if `attr.x` was not removed.</comment>

<file context>
@@ -0,0 +1,153 @@
+	if gotCN != "parent:agent:xyz" {
+		t.Errorf("childNativeId stash = %q, want %q (must survive nil-extras re-emit)", gotCN, "parent:agent:xyz")
+	}
+	gotAttr := scanString(t, db, `SELECT IFNULL(json_extract(extras_json,'$.attr.x'),'') FROM ops WHERE id=?`, opID)
+	if gotAttr != "" {
+		t.Errorf("attr.x present after nil-extras re-emit = %q, want empty (graft must drop non-aiViewer keys)", gotAttr)
</file context>
Suggested change
gotAttr := scanString(t, db, `SELECT IFNULL(json_extract(extras_json,'$.attr.x'),'') FROM ops WHERE id=?`, opID)
gotAttr := scanString(t, db, `SELECT IFNULL(json_extract(extras_json,'$."attr.x"'),'') FROM ops WHERE id=?`, opID)

if got := scanString(t, db, `SELECT IFNULL(parent_session_id,'') FROM sessions WHERE id=?`, childID); got != "" {
t.Errorf("child parent_session_id = %q, want empty (linkage must roll back on notify failure)", got)
}
if got := scanString(t, db, `SELECT IFNULL(root_session_id,'') FROM sessions WHERE id=?`, childID); got == parentID {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Rollback assertion for root_session_id is under-specified and can false-pass on incorrect values.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/ingest/sow55_characterization_resolver_test.go, line 96:

<comment>Rollback assertion for `root_session_id` is under-specified and can false-pass on incorrect values.</comment>

<file context>
@@ -0,0 +1,130 @@
+	if got := scanString(t, db, `SELECT IFNULL(parent_session_id,'') FROM sessions WHERE id=?`, childID); got != "" {
+		t.Errorf("child parent_session_id = %q, want empty (linkage must roll back on notify failure)", got)
+	}
+	if got := scanString(t, db, `SELECT IFNULL(root_session_id,'') FROM sessions WHERE id=?`, childID); got == parentID {
+		t.Errorf("child root_session_id = %q (= parent), want still self (linkage rolled back)", got)
+	}
</file context>
Suggested change
if got := scanString(t, db, `SELECT IFNULL(root_session_id,'') FROM sessions WHERE id=?`, childID); got == parentID {
if got := scanString(t, db, `SELECT IFNULL(root_session_id,'') FROM sessions WHERE id=?`, childID); got != childID {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant