plan(seed-sentinel-security-eval): seed Cisco foundry-security-spec as Sentinel capability#188
plan(seed-sentinel-security-eval): seed Cisco foundry-security-spec as Sentinel capability#188jankneumann wants to merge 4 commits into
Conversation
…s Sentinel capability Adapt the Cisco foundry-security-spec (agentic AI security evaluation) into an OpenSpec seed change. All ~35 foundry clarification markers resolved up front. - constitution.md: 11 principles + Deviation D-1 (multi-vendor exception to single-provider reproducibility) mitigated by verdict-provenance - specs/sentinel-security-eval: 8 roles, finding lifecycle, 3-leg evidence gate, structure-based fingerprint, Validator-only exploited flag, coverage+yield auto-stop, auto-block, sandbox-by-infrastructure, CVSS-v4/CWE/needs-review policy - design.md: role->existing-capability binding table (Approach A), seed/roadmap boundary, deviation analysis, deferred-extension preconditions - proposal/tasks/work-packages/contracts stub; seed-only, no role logic https://claude.ai/code/session_01VMF1MX95ryHATWjUpa9QMt
…om project.md Wire the Sentinel constitution + capability into openspec/project.md Domain Context, and mark Phase 1-3 seed tasks complete. https://claude.ai/code/session_01VMF1MX95ryHATWjUpa9QMt
…ementation Decompose the seed into 19 prioritized, dependency-ordered candidates: 14 scheduled across 5 phases (foundation -> knowledge -> detection/triage -> validation/reporting/coverage -> operability) + 5 deferred extension roles recorded as BLOCKED with adopt-when preconditions. Each candidate binds to its mapped existing capability per the seed's design.md D1. DAG validated acyclic. https://claude.ai/code/session_01VMF1MX95ryHATWjUpa9QMt
Replace the "tolerated reproducibility liability" framing of Deviation D-1 with a multi-vendor consensus mechanism: within-vendor consistency -> cross-vendor calibration -> principled synthesis (confirmed/unconfirmed/disagreement), reusing parallel-infrastructure's ConsensusSynthesizer. Governing rule: never mix raw cross-vendor outputs on one scale. - constitution.md D-1: 5 binding mitigations + calibration-quality residual risk - spec: new "Multi-Vendor Verdict Consensus and Calibration" requirement (4 scenarios); consensus-aware provenance; calibrated (not averaged) severity - design.md: ConsensusSynthesizer binding row + rewritten D3 analysis - roadmap: new sentinel-verdict-consensus item (P4); Reporter now depends on it https://claude.ai/code/session_01VMF1MX95ryHATWjUpa9QMt
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ffc351d8f7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| A Sentinel finding SHALL progress through the states `candidate → verdict-assigned → confirmed → [validated] → published`. The five verdicts and their surfacing rules SHALL be: `true-positive` (surfaced), `false-positive` (internal), `needs-review` (surfaced to humans), `not-applicable` (internal), `code-quality` (internal) (foundry §7.2, FR-085–FR-093; Constitution II). | ||
|
|
There was a problem hiding this comment.
Add a publication path for
needs-review findings
The lifecycle here requires every published finding to pass through confirmed, but confirmed is only reachable from true-positive (see the later scenario), while needs-review is explicitly required to be surfaced to humans elsewhere in this spec. In runs where evidence is incomplete, implementers following this state machine will either be forced to suppress needs-review items (missing required human review) or violate the lifecycle contract, so the state transitions need an explicit needs-review publication path.
Useful? React with 👍 / 👎.
|
|
||
| ### Requirement: Multi-Vendor Verdict Consensus and Calibration | ||
|
|
||
| Sentinel SHALL combine per-vendor results into a verdict through principled synthesis rather than by placing raw outputs from different vendors on a shared scale (Deviation D-1; Constitution I, V). Each vendor SHALL apply the rubric uniformly so its own scale is self-consistent (within-vendor consistency). Before results from different vendors are combined, their scales SHALL be calibrated to a common reference using owned, versioned calibration configuration (cross-vendor calibration). Calibrated per-vendor results SHALL then be synthesized into a consensus verdict classified as `confirmed`, `unconfirmed`, or `disagreement` with each vendor's disposition recorded, reusing the `parallel-infrastructure` consensus substrate (`ConsensusSynthesizer`). The synthesized consensus verdict — not a lone vendor's — SHALL be what the Reporter publishes. Cross-vendor `disagreement` SHALL be surfaced for human attention rather than silently averaged. |
There was a problem hiding this comment.
Reconcile consensus verdict taxonomy with triage verdicts
This requirement says cross-vendor synthesis produces verdicts confirmed/unconfirmed/disagreement and that this synthesized verdict is what Reporter publishes, but other requirements define publication behavior and labels around true-positive/needs-review/etc. Without a normative mapping between these two verdict taxonomies, different implementations can publish incompatible states for the same finding, breaking dedup/comparison and operator workflows across runs.
Useful? React with 👍 / 👎.
Adapt the Cisco foundry-security-spec (agentic AI security evaluation) into an
OpenSpec seed change. All ~35 foundry clarification markers resolved up front.
single-provider reproducibility) mitigated by verdict-provenance
structure-based fingerprint, Validator-only exploited flag, coverage+yield
auto-stop, auto-block, sandbox-by-infrastructure, CVSS-v4/CWE/needs-review policy
boundary, deviation analysis, deferred-extension preconditions
https://claude.ai/code/session_01VMF1MX95ryHATWjUpa9QMt