During the 2026-04-28 wave-3 tokenizer review, the C1 added vocab had one residual PostScript-name token: /hyphenminus.
Corpus spot checks showed contexts such as:
4.600/hyphenminus5.600 KDa
0.3/hyphenminus 100%
50.000/hyphenminus250.000 κατοίκους
The cleaner strips standalone foo /hyphenminus bar, and correctly preserves URL/path contexts such as https://example.org/a/hyphenminus/b, but it currently preserves numeric slash ranges because URL_LIKE_TOKEN_REGEX treats tokens like 4.600/hyphenminus5.600 as URL-like host/path tokens.
Suggested later fix:
- keep protecting real URLs,
www. links, and host/path tokens with alphabetic host components
- do not classify purely numeric dotted ranges followed by slash as URL-like
- add tests for numeric
/hyphenminus stripping and real URL preservation
This is a small cleanup follow-up, not a blocker for the completed tokenizer wave.
During the 2026-04-28 wave-3 tokenizer review, the C1 added vocab had one residual PostScript-name token:
/hyphenminus.Corpus spot checks showed contexts such as:
4.600/hyphenminus5.600 KDa0.3/hyphenminus 100%50.000/hyphenminus250.000 κατοίκουςThe cleaner strips standalone
foo /hyphenminus bar, and correctly preserves URL/path contexts such ashttps://example.org/a/hyphenminus/b, but it currently preserves numeric slash ranges becauseURL_LIKE_TOKEN_REGEXtreats tokens like4.600/hyphenminus5.600as URL-like host/path tokens.Suggested later fix:
www.links, and host/path tokens with alphabetic host components/hyphenminusstripping and real URL preservationThis is a small cleanup follow-up, not a blocker for the completed tokenizer wave.