plan(seed-sentinel-security-eval): seed Cisco foundry-security-spec as Sentinel capability by jankneumann · Pull Request #188 · jankneumann/agentic-coding-tools

jankneumann · 2026-05-26T12:26:00Z

Adapt the Cisco foundry-security-spec (agentic AI security evaluation) into an
OpenSpec seed change. All ~35 foundry clarification markers resolved up front.

constitution.md: 11 principles + Deviation D-1 (multi-vendor exception to
single-provider reproducibility) mitigated by verdict-provenance
specs/sentinel-security-eval: 8 roles, finding lifecycle, 3-leg evidence gate,
structure-based fingerprint, Validator-only exploited flag, coverage+yield
auto-stop, auto-block, sandbox-by-infrastructure, CVSS-v4/CWE/needs-review policy
design.md: role->existing-capability binding table (Approach A), seed/roadmap
boundary, deviation analysis, deferred-extension preconditions
proposal/tasks/work-packages/contracts stub; seed-only, no role logic

https://claude.ai/code/session_01VMF1MX95ryHATWjUpa9QMt

…s Sentinel capability Adapt the Cisco foundry-security-spec (agentic AI security evaluation) into an OpenSpec seed change. All ~35 foundry clarification markers resolved up front. - constitution.md: 11 principles + Deviation D-1 (multi-vendor exception to single-provider reproducibility) mitigated by verdict-provenance - specs/sentinel-security-eval: 8 roles, finding lifecycle, 3-leg evidence gate, structure-based fingerprint, Validator-only exploited flag, coverage+yield auto-stop, auto-block, sandbox-by-infrastructure, CVSS-v4/CWE/needs-review policy - design.md: role->existing-capability binding table (Approach A), seed/roadmap boundary, deviation analysis, deferred-extension preconditions - proposal/tasks/work-packages/contracts stub; seed-only, no role logic https://claude.ai/code/session_01VMF1MX95ryHATWjUpa9QMt

…om project.md Wire the Sentinel constitution + capability into openspec/project.md Domain Context, and mark Phase 1-3 seed tasks complete. https://claude.ai/code/session_01VMF1MX95ryHATWjUpa9QMt

…ementation Decompose the seed into 19 prioritized, dependency-ordered candidates: 14 scheduled across 5 phases (foundation -> knowledge -> detection/triage -> validation/reporting/coverage -> operability) + 5 deferred extension roles recorded as BLOCKED with adopt-when preconditions. Each candidate binds to its mapped existing capability per the seed's design.md D1. DAG validated acyclic. https://claude.ai/code/session_01VMF1MX95ryHATWjUpa9QMt

Replace the "tolerated reproducibility liability" framing of Deviation D-1 with a multi-vendor consensus mechanism: within-vendor consistency -> cross-vendor calibration -> principled synthesis (confirmed/unconfirmed/disagreement), reusing parallel-infrastructure's ConsensusSynthesizer. Governing rule: never mix raw cross-vendor outputs on one scale. - constitution.md D-1: 5 binding mitigations + calibration-quality residual risk - spec: new "Multi-Vendor Verdict Consensus and Calibration" requirement (4 scenarios); consensus-aware provenance; calibrated (not averaged) severity - design.md: ConsensusSynthesizer binding row + rewritten D3 analysis - roadmap: new sentinel-verdict-consensus item (P4); Reporter now depends on it https://claude.ai/code/session_01VMF1MX95ryHATWjUpa9QMt

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ffc351d8f7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-26T12:29:17Z

+A Sentinel finding SHALL progress through the states `candidate → verdict-assigned → confirmed → [validated] → published`. The five verdicts and their surfacing rules SHALL be: `true-positive` (surfaced), `false-positive` (internal), `needs-review` (surfaced to humans), `not-applicable` (internal), `code-quality` (internal) (foundry §7.2, FR-085–FR-093; Constitution II).
+


Add a publication path for needs-review findings

The lifecycle here requires every published finding to pass through confirmed, but confirmed is only reachable from true-positive (see the later scenario), while needs-review is explicitly required to be surfaced to humans elsewhere in this spec. In runs where evidence is incomplete, implementers following this state machine will either be forced to suppress needs-review items (missing required human review) or violate the lifecycle contract, so the state transitions need an explicit needs-review publication path.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-26T12:29:17Z

+
+### Requirement: Multi-Vendor Verdict Consensus and Calibration
+
+Sentinel SHALL combine per-vendor results into a verdict through principled synthesis rather than by placing raw outputs from different vendors on a shared scale (Deviation D-1; Constitution I, V). Each vendor SHALL apply the rubric uniformly so its own scale is self-consistent (within-vendor consistency). Before results from different vendors are combined, their scales SHALL be calibrated to a common reference using owned, versioned calibration configuration (cross-vendor calibration). Calibrated per-vendor results SHALL then be synthesized into a consensus verdict classified as `confirmed`, `unconfirmed`, or `disagreement` with each vendor's disposition recorded, reusing the `parallel-infrastructure` consensus substrate (`ConsensusSynthesizer`). The synthesized consensus verdict — not a lone vendor's — SHALL be what the Reporter publishes. Cross-vendor `disagreement` SHALL be surfaced for human attention rather than silently averaged.


Reconcile consensus verdict taxonomy with triage verdicts

This requirement says cross-vendor synthesis produces verdicts confirmed/unconfirmed/disagreement and that this synthesized verdict is what Reporter publishes, but other requirements define publication behavior and labels around true-positive/needs-review/etc. Without a normative mapping between these two verdict taxonomies, different implementations can publish incompatible states for the same finding, breaking dedup/comparison and operator workflows across runs.

Useful? React with 👍 / 👎.

claude added 4 commits May 26, 2026 11:49

docs(seed-sentinel-security-eval): reference Sentinel constitution fr…

90258dd

…om project.md Wire the Sentinel constitution + capability into openspec/project.md Domain Context, and mark Phase 1-3 seed tasks complete. https://claude.ai/code/session_01VMF1MX95ryHATWjUpa9QMt

chatgpt-codex-connector Bot reviewed May 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

plan(seed-sentinel-security-eval): seed Cisco foundry-security-spec as Sentinel capability#188

plan(seed-sentinel-security-eval): seed Cisco foundry-security-spec as Sentinel capability#188
jankneumann wants to merge 4 commits into
mainfrom
claude/cisco-foundry-spec-integration-WV05K

jankneumann commented May 26, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 26, 2026

Uh oh!

chatgpt-codex-connector Bot May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		A Sentinel finding SHALL progress through the states `candidate → verdict-assigned → confirmed → [validated] → published`. The five verdicts and their surfacing rules SHALL be: `true-positive` (surfaced), `false-positive` (internal), `needs-review` (surfaced to humans), `not-applicable` (internal), `code-quality` (internal) (foundry §7.2, FR-085–FR-093; Constitution II).


		### Requirement: Multi-Vendor Verdict Consensus and Calibration

		Sentinel SHALL combine per-vendor results into a verdict through principled synthesis rather than by placing raw outputs from different vendors on a shared scale (Deviation D-1; Constitution I, V). Each vendor SHALL apply the rubric uniformly so its own scale is self-consistent (within-vendor consistency). Before results from different vendors are combined, their scales SHALL be calibrated to a common reference using owned, versioned calibration configuration (cross-vendor calibration). Calibrated per-vendor results SHALL then be synthesized into a consensus verdict classified as `confirmed`, `unconfirmed`, or `disagreement` with each vendor's disposition recorded, reusing the `parallel-infrastructure` consensus substrate (`ConsensusSynthesizer`). The synthesized consensus verdict — not a lone vendor's — SHALL be what the Reporter publishes. Cross-vendor `disagreement` SHALL be surfaced for human attention rather than silently averaged.

Conversation

jankneumann commented May 26, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants