Skip to content

Introduce KsonTreeWalker<N>#347

Open
holodorum wants to merge 24 commits intokson-org:mainfrom
holodorum:treewalker
Open

Introduce KsonTreeWalker<N>#347
holodorum wants to merge 24 commits intokson-org:mainfrom
holodorum:treewalker

Conversation

@holodorum
Copy link
Copy Markdown
Collaborator

Introduce KsonTreeWalker, a generic interface for navigating JSON-like trees, and rewrite tree navigation as walker-parameterized algorithms. This lets KsonValue trees and raw AST nodes share the same navigation logic.

The AST walker (AstNodeWalker) is the key motivation: it operates directly on the parser's output, treating error nodes as leaves, so IDE features like path building, completions, and hover work on partially-typed documents where KsonValue conversion would fail.

KsonValueNavigation is eliminated, its logic now lives as extension functions on KsonTreeWalker, available to any tree representation.

@holodorum holodorum force-pushed the treewalker branch 2 times, most recently from e1c6df6 to e8b1f72 Compare March 23, 2026 15:02
holodorum and others added 22 commits March 23, 2026 20:24
The pull configuration model (a1c9d2b) introduced an inherent race:
the server receives a didChangeConfiguration notification and then
makes an async round-trip to pull the new settings.  A format request
issued immediately after a config change can arrive before that pull
completes, producing output with stale settings.

Fix by polling in the test: format and re-check until the expected
output appears, with a timeout.  Extract a `formatAndAwait` helper
to keep the test bodies clean and document the race in its JSDoc.
Move document symbol and semantic token logic into Kotlin where it has
direct AST access. DocumentSymbolBuilder recursively walks the KsonValue
tree. SemanticTokenBuilder uses AST-aware key token collection instead
of fragile lookahead heuristics. Both use domain-pure enums with no LSP
coupling.
Replace tree-walking and token classification logic with thin wrappers
that delegate to kson-tooling. TypeScript now only maps domain-pure
Kotlin enums to LSP constants. DocumentSymbolService drops from 152 to
~60 LOC; SemanticTokensService from 104 to ~50 LOC.
Tighten assertions across the integration test suite: replace >= checks
with exact values for semantic token counts, document highlight counts,
and document symbol children. Add error-handling tests that verify each
service handler returns a safe default when its underlying service throws.
These tests exercise the try/catch boundaries that protect the LSP
connection from individual service failures.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Token-stream based FoldingRangeService that matches open/close pairs
({}, [], $tag..$$) using a stack. Only creates fold ranges when the
open and close tokens are on different lines. Server advertises
foldingRangeProvider capability; 8 new tests cover single-line,
multi-line, nested, embed, and mixed scenarios.

Note: registering a foldingRangeProvider replaces VS Code's built-in
indentation/bracket folding for the language, so this service must
provide ranges for all structural constructs — not just embeds.
Walks the KsonValue AST to build a parent chain of ranges from
innermost node to full document. Handles objects (key → property →
container), arrays (element → array), and leaf values. Deduplicates
identical adjacent ranges. 9 new tests verify hierarchy correctness
including nested objects, arrays, and the strictly-expanding property.
Delete IndexedDocumentSymbols.ts and SymbolPositionIndex.test.ts,
replaced by Kotlin-side SiblingKeyBuilder. Clean up KsonDocument.ts
and KsonTextDocumentService.ts to remove IndexedDocumentSymbols usage.
The 5 non-diagnostic builders in kson-tooling each independently called
KsonCore.parseToAst, so a single document change could trigger up to 6
parses. ToolingDocument wraps a single error-tolerant parse result that
is shared across all builder calls.

Refactored builders to accept pre-parsed data instead of raw strings:
- SemanticTokenBuilder.build now takes tokens + ast
- FoldingRangeBuilder.build now takes tokens
- SelectionRangeBuilder.build now takes KsonValue
- SiblingKeyBuilder.build now takes KsonValue
- DocumentSymbolBuilder.build unchanged (already took KsonValue)

KsonTooling.parse(content) creates a ToolingDocument; the 5 feature
methods now accept ToolingDocument instead of String. Diagnostics keeps
its own strict parse path for accurate error messages.

Because ToolingDocument parses with ignoreErrors=true, features like
folding and semantic tokens now return partial results for broken
documents rather than empty — better editor behavior.
KsonDocument now holds a lazily-created ToolingDocument, created once
per content change via getToolingDocument(). All five non-diagnostic
TS services now accept ToolingDocument directly, so a single document
change triggers one parse instead of five.

Services updated to accept ToolingDocument:
- DocumentSymbolService.getDocumentSymbols
- SemanticTokensService.getSemanticTokens
- FoldingRangeService.getFoldingRanges
- SelectionRangeService.getSelectionRanges
- DocumentHighlightService.getDocumentHighlights

To enable SelectionRangeService to take ToolingDocument (instead of
KsonDocument for getFullDocumentRange()), KsonTooling.getEnclosingRanges
now includes the full-document range as the outermost entry, computed
from the EOF token position. This eliminates the previous contract
where the caller had to append it.

DiagnosticService unchanged — keeps its own strict parse path via
validateDocument(content, schemaContent).
With ignoreErrors=true, the parser produces an AST where the root is a
valid KsonRootImpl but child nodes may contain AstNodeError nodes. The
error-walking step is skipped, so hasErrors() returns false, and the
lazy ksonValue property proceeds to call toKsonValue() which throws
ShouldNotHappenException or UnsupportedOperationException on those
error nodes.

The fix: catch RuntimeException in ksonValue and return null, which is
already the documented contract for "errors during parse." This removes
the need for callers to defensively catch — ToolingDocument had a
20-line workaround for exactly this, now replaced by a direct delegation
to parseResult.ksonValue.
The document symbol tree was being rebuilt from the KsonValue AST on
every call to getDocumentSymbols() and getSiblingKeys(). Since
getSiblingKeys() fires on every cursor position change via
DocumentHighlightService, this meant a full recursive tree construction
on every keystroke in large documents.

ToolingDocument now lazily builds and caches the symbol tree, shared
across both getDocumentSymbols() and getSiblingKeys(). SiblingKeyBuilder
accepts the pre-built symbol list instead of rebuilding it from
KsonValue.
The four schema-aware methods (getSchemaInfoAtLocation,
getSchemaLocationAtLocation, resolveRefAtLocation,
getCompletionsAtLocation) previously accepted raw content strings and
re-parsed them on every call. This meant the document was parsed up to
3 times and the schema once per invocation.

These methods now accept ToolingDocument, which callers cache. This
eliminates redundant parsing across repeated calls (e.g. hover then
completion on the same document version).

Internal collaborators are updated to match:
- KsonValuePathBuilder accepts an optional pre-parsed KsonValue for
  navigation, still doing its own strict parse for token context only
- SchemaFilteringService accepts KsonValue? instead of String
- ResolvedSchemaContext accepts pre-parsed values instead of strings

ToolingDocument gains two new properties:
- content: exposes the raw string for KsonValuePathBuilder's strict
  token analysis and recovery path
- strictKsonValue: a lazily-computed strict parse for accurate location
  information, since gap-free parsing (used by ToolingDocument) produces
  slightly broader node spans that include trailing whitespace

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
KsonDocument now caches a ToolingDocument for its associated schema,
and KsonSchemaDocument does the same for its metaschema. The three
schema-aware services (HoverService, CompletionService,
DefinitionService) pass these cached ToolingDocuments to the Kotlin
API instead of raw content strings.

This eliminates all redundant parsing in the schema-aware path:
across repeated hover, completion, and definition requests on the same
document version, neither the document nor the schema is re-parsed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Kotlin:
- Make ResolvedSchemaContext.resolveAndFilterSchemas return non-nullable:
  the null-parse check moved to callers, so the method always returns a
  value. Remove dead ?: return null checks from all four call sites.
- Fix KsonValuePathBuilder test to use gap-free ksonValue (via
  KsonTooling.parse) matching the production path, instead of strict
  parse which trivially equals the fallback.

TypeScript:
- Restructure DefinitionService to avoid non-null assertions on schema
  URI retrieval. Compute schemaUri alongside schemaToolingDoc in a
  single branch, using optional chaining instead of !.
- Add isKsonSchemaDocument check to CompletionService, matching the
  pattern in HoverService and DefinitionService. Previously completions
  would silently return null when editing schema files.
- Remove restating comment in HoverService.
In gap-free mode, KsonMarker.done() records lastTokenIndex as
getTokenIndex()-1. Because advanceLexer() skips past WHITESPACE and
COMMENT tokens, this index can point to a trailing whitespace token
rather than the meaningful closing token (e.g. "}"). Since
AstNodeImpl.location computes from sourceTokens.last(), AST node
locations would extend into the whitespace following the construct —
potentially spanning additional lines.

Fix done() to walk lastTokenIndex back past WHITESPACE tokens for
non-error nodes, so the marker always ends at the last meaningful
token.  Two important distinctions from a naive "skip all ignored
tokens" approach:

- Only skip WHITESPACE, not COMMENT — comments carry semantic content
  that nodes must retain (e.g. trailing comments on formatted output).

- Skip the walk-back entirely for ERROR nodes — error nodes must
  retain all their tokens including trailing whitespace so they can
  faithfully reconstruct the original source.  (The gap-free comment
  in KsonCore.parseToAst explicitly notes this: "we tokenize gapFree
  when we are errorTolerant so that error nodes can reconstruct their
  whitespace".)

Also removes the existing dropLastWhile workaround in
ObjectPropertyNodeError which was compensating for this same issue.

With gap-free positions now matching strict positions, eliminate two
sources of redundant parsing:

- Remove ToolingDocument.strictKsonValue (lazy strict re-parse of
  schema documents). Schema-aware methods now use ksonValue directly.

- Refactor KsonValuePathBuilder to accept pre-lexed tokens from
  ToolingDocument, filtering out WHITESPACE/COMMENT internally. This
  eliminates the strict re-parse that previously happened on every
  hover, completion, and definition request. Falls back to strict
  parse when no tokens are provided (backward compatibility).

Update SelectionRangeTest expectations to match corrected gap-free
positions. Add tests for gap-free/strict position equivalence and
for the pre-lexed tokens path in KsonValuePathBuilder.
KsonTreeWalker<N> is an interface that decouples tree-navigation
algorithms from any specific tree representation.

TreeNavigation provides generic algorithms parameterized by the walker:
- navigateWithJsonPointer: follow a JSON Pointer through any tree
- navigateToLocationWithPointer: find the deepest node at a cursor
  position and build the JSON Pointer path from root

These algorithms are derived from the concrete implementations in
KsonValueNavigation but work with any KsonTreeWalker, enabling
both KsonValue trees and IntelliJ PSI trees to share the same
navigation logic.
Two key changes that work together:

1. AstNodeWalker — a KsonTreeWalker<AstNode> that walks the parser's
   AST directly. Unlike KsonValueWalker (which requires a fully valid
   KsonValue tree), the AST includes error nodes for syntactically
   broken parts. Error nodes are treated as leaves that navigation
   passes through, so path building works on partially-typed documents
   where KsonValue conversion would fail. This replaces the old
   recovery mechanism (inserting `[]` and re-parsing) with a cleaner
   approach: the tree is always available, error nodes are just there.

2. Schema-aware KsonTooling methods now accept ToolingDocument instead
   of raw strings, eliminating redundant parsing. ToolingDocument
   stores the content string, exposes meaningfulTokens (WHITESPACE and
   COMMENT filtered out, reusing Lexer.ignoredTokens), and provides
   the root AST node for the walker.

Internal refactoring:
- KsonValuePathBuilder takes ToolingDocument, uses AstNodeWalker for
  navigation and meaningfulTokens for cursor-context analysis — zero
  re-parsing
- SchemaFilteringService accepts KsonValue? instead of String
- SchemaInformation uses TreeNavigation for document navigation
- ResolvedSchemaContext uses pre-parsed values from ToolingDocument
Move the walker package (KsonTreeWalker, KsonValueWalker, AstNodeWalker,
TreeNavigation) from kson-tooling-lib to the root project so that both
projects can use the generic tree navigation algorithms.

This enables eliminating KsonValueNavigation entirely:
- navigateWithJsonPointer and navigateToLocationWithPointer were already
  duplicated in TreeNavigation as generic <N> algorithms
- navigateWithJsonPointerGlob (with wildcard, glob pattern, and recursive
  descent support) is now generalized into TreeNavigation as well, using
  the KsonTreeWalker interface instead of KsonValue type checks

EmbedBlockResolver and SchemaIdLookup now use TreeNavigation +
KsonValueWalker directly. The KsonValueNavigation object, its
LocationNavigationResult type, and its dedicated test file are deleted.
All glob/wildcard/recursive-descent tests are migrated to
TreeNavigationTest alongside the existing pointer and location tests.
…ng, polish tests

Four focused improvements found during review of the treewalker commits:

1. Replace Pair<String, N> with TreeProperty<N> in KsonTreeWalker.
   getObjectProperties now returns List<TreeProperty<N>> instead of
   List<Pair<String, N>>, giving callers .name and .value instead of
   the anonymous .first and .second.

2. Extract processedStringContent() helper in KsonValuePathBuilder.
   The duplicated QuotedStringNode construction for escape-processing
   STRING_CONTENT tokens is consolidated into a single private method.

3. Rename navigateToLocation test methods in TreeNavigationTest to use
   backtick-quoted descriptive names, matching the style of the
   surrounding navigateWithJsonPointer and navigateWithJsonPointerGlob
   tests.

4. Add missing trailing newlines to TreeNavigationTest.kt and
   KsonValuePathBuilderTest.kt.
…eWalker

The navigation algorithms (navigateWithJsonPointer, navigateWithJsonPointerGlob,
navigateToLocationWithPointer) are now extension functions on KsonTreeWalker<N>
rather than static methods on a TreeNavigation object.

This eliminates the redundant `walker` parameter at every call site: the walker
is the receiver, so callers write `walker.navigateWithJsonPointer(root, pointer)`
instead of `TreeNavigation.navigateWithJsonPointer(walker, root, pointer)`.

Internally, a private TreeNavigator class captures the walker once so the
recursive helpers (navigateByParsedTokens, navigateRecursive, processSingleToken,
etc.) can reference it directly without threading it as a parameter.

The TreeNavigation object is eliminated entirely. TreeNavigationResult remains
as a top-level data class in the same file.
KsonDocument previously maintained two separate parse paths: a strict
Kson.analyze() for $schema extraction and a lazy error-tolerant
ToolingDocument for editor features. This was redundant — both produced
a KsonValue tree, and error-tolerant parsing is sufficient for $schema
lookup.

Move $schema extraction into ToolingDocument.schemaId on the Kotlin
side (where it has proper access to the internal KsonValue types), and
make KsonDocument a simple wrapper around a pre-parsed ToolingDocument
passed in at construction. This eliminates one full parse per keystroke.
…elper

Remove the processedStringContent helper in KsonValuePathBuilder in favor
of direct QuotedStringContentTransformer usage.  Tighten KDoc in
TreeNavigation and KsonTreeWalker to avoid restating what's already
documented on the referenced types.
The walker interface had 6 methods (isObject, isArray, getObjectProperties,
getArrayElements, getStringValue, getObjectProperty) but every production
call site followed the same pattern: check isObject/isArray, then call the
corresponding get method. A sealed return type collapses the type check
and data access into a single `when` match.

The interface is now just two methods: getChildren (returning
NodeChildren.Object, NodeChildren.Array, or NodeChildren.Leaf) and
getLocation. This gives us compiler-enforced exhaustive matching at
every call site, so adding a new node kind in the future would produce
compile errors everywhere it needs handling.

getStringValue had zero production callers and is removed. getObjectProperty
had exactly one (the Literal branch in TreeNavigation), which now uses
children.properties directly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants