Skip to content

v0.10.12 — Fix system prompt scoring (issue #50)

Choose a tag to compare

@1bcMax 1bcMax released this 26 Feb 06:05
· 456 commits to main since this release
9f60017

Fix: Score all keyword dimensions against user text only

Reported by: @Machiel692 in #50 — thank you for the exceptionally detailed bug report and follow-up testing!

Problem

When ClawRouter is used as an OpenClaw plugin, the system prompt (~6,000 tokens with 20+ tool definitions) contains keywords that match nearly every scoring dimension. This caused all requests to score identically (~0.47) regardless of user intent, making blockrun/auto routing completely non-functional.

Root Cause

13 of 15 scoring dimensions in classifyByRules() scored against the concatenated system prompt + user message. The user's actual query (<1% of scored text) had no measurable impact on the score.

Fix

Changed all keyword-based scoring dimensions to score against userText only (the user's message), matching the pattern already established for reasoningMarkers and scoreAgenticTask. The tokenCount dimension still uses total context size since that legitimately affects model selection.

Before vs After

Query Score Before Score After
"What time is it?" ~0.47 0.080
"What's the weather?" ~0.47 0.080
Complex coding task ~0.47 0.182
Math proof (reasoning) ~0.47 0.260

Scores now differentiate properly across query complexity levels.

Testing

  • 214 unit tests pass
  • 40 e2e tests pass (including new OpenClaw-scale system prompt scenario)

Files Changed

File Change
src/router/rules.ts Score all keyword dimensions against userText only
test/e2e.ts Add OpenClaw-scale system prompt e2e test
package.json Version bump 0.10.110.10.12

Fixes #50