v0.10.12 — Fix system prompt scoring (issue #50)
Fix: Score all keyword dimensions against user text only
Reported by: @Machiel692 in #50 — thank you for the exceptionally detailed bug report and follow-up testing!
Problem
When ClawRouter is used as an OpenClaw plugin, the system prompt (~6,000 tokens with 20+ tool definitions) contains keywords that match nearly every scoring dimension. This caused all requests to score identically (~0.47) regardless of user intent, making blockrun/auto routing completely non-functional.
Root Cause
13 of 15 scoring dimensions in classifyByRules() scored against the concatenated system prompt + user message. The user's actual query (<1% of scored text) had no measurable impact on the score.
Fix
Changed all keyword-based scoring dimensions to score against userText only (the user's message), matching the pattern already established for reasoningMarkers and scoreAgenticTask. The tokenCount dimension still uses total context size since that legitimately affects model selection.
Before vs After
| Query | Score Before | Score After |
|---|---|---|
| "What time is it?" | ~0.47 | 0.080 |
| "What's the weather?" | ~0.47 | 0.080 |
| Complex coding task | ~0.47 | 0.182 |
| Math proof (reasoning) | ~0.47 | 0.260 |
Scores now differentiate properly across query complexity levels.
Testing
- 214 unit tests pass
- 40 e2e tests pass (including new OpenClaw-scale system prompt scenario)
Files Changed
| File | Change |
|---|---|
src/router/rules.ts |
Score all keyword dimensions against userText only |
test/e2e.ts |
Add OpenClaw-scale system prompt e2e test |
package.json |
Version bump 0.10.11 → 0.10.12 |
Fixes #50