Providing sufficient baseline rules and identifying appropriate special skills #5
Replies: 1 comment
-
|
Hannah, first let me apologize for the criminally late response to your post. These are excellent questions that get to the heart of some of the trickier design decisions in SpecOps. Let me address each one: On identifying the right level of seed documentationThe short answer is: I probably didn't get it right up front in developing this approach, and you shouldn't expect to either. The methodology deliberately frames instruction sets as starting at Level 1 (basic guidance sufficient to begin) and evolving through practice. What guides the initial scope is a principle from empirical benchmarking: focused instruction sets with 2–3 targeted modules consistently outperform broad, comprehensive documentation packages. More content is not better content — overly large instruction sets can actually degrade agent performance compared to concise ones. In practice, this means starting with enough guidance to get legible output, then iterating. When the agent consistently misses a pattern or produces malformed specs, that's a signal to add a specific rule. When an instruction is routinely ignored or leads to hallucinated structure, that's a signal to remove it. The path from rough to refined is as much about editing out what doesn't help as adding what does. On determining which rule sets become dedicated skills vs. inline guidanceThe test to apply: Is this procedural knowledge the model is unlikely to already have? General programming patterns (what a loop is, how functions work) don't need instruction sets — models know that. But something like how a state's categorical eligibility rules interact with federal SNAP guidelines, or how COBOL PERFORM THRU clauses should be documented — that's specialized procedural knowledge that won't be in training data. That's where a dedicated skill earns its overhead. A secondary filter: Can the instruction set be scoped to a specific, repeatable class of tasks? "Understanding COBOL" is too broad. "Extracting business rules from COBOL conditional logic" is appropriately narrow. The narrower the task alignment, the more effective the skill. On the concern about missing business information — this is a real gapYour instinct here is well-founded, and it points to something I think of as a discovery-layer or routing problem. Phase 1 of the methodology (Discovery and Assessment) includes a knowledge assessment step specifically to surface what kinds of business logic and edge cases are likely present in a system before the AI agent starts generating specs. But you're right that this is largely a human-driven activity today — domain experts reviewing the system inventory, not an automated classification pass. The idea you're describing — a SpecOps layer that first characterizes what types of business information are present in legacy code, and then selects or confirms the appropriate skill — is a compelling direction. Think of it as a "skill router" or meta-instruction-set. I've thought about this as a future pattern, but it hasn't been prototyped in a reference implementation yet. If you or others at your agency are in a position to explore this, it would be a genuinely valuable contribution to the community. The underlying research (the SkillsBench study referenced in INSTRUCTION-SETS.md supports the idea that skill selection matters significantly — models benefit from targeted, well-matched instruction sets, not broad coverage. Happy to discuss further if you like. What type of system are you looking to apply this to? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Good morning, I hope you are well. I am interested in applying a SpecOps approach at my agency. In reviewing the source files, I noticed that the each instruction set included initial documentation of key rules and edge cases, almost like "seed" documentation to provide the model sufficient information to interpret the code correctly.
A few questions about this:
Thanks for your time!
Beta Was this translation helpful? Give feedback.
All reactions