feat: add query owner context metadata by buger · Pull Request #560 · probelabs/probe

buger · 2026-05-06T06:12:25Z

Summary

add probe query --with-context / --owner-context for opt-in JSON owner-block metadata
introduce a shared language-agnostic source context layer for AST match, owner, comments, enclosing symbols, and enclosing calls
reuse parsed source context per file while formatting query/search JSON, avoiding parse-per-match behavior
improve search JSON metadata with language, owner symbols, qualified owners, enclosing symbols/calls, leading comments, and classified text matches
improve JS/TS symbols output by unwrapping exported declarations to their semantic inner names
handle multiline block comments, HTML comments, and multiline strings when classifying search match metadata
document the implemented query/search JSON context surfaces
add mixed TS/Go req-id search JSON regressions for the --no-merge evidence path, including negative string/comment hit classification

Implemented: `probe query --with-context`

probe query now has a new opt-in flag:

probe query 'fetch($$$ARGS)' ./src -l typescript --format json --with-context

The default query JSON remains backward-compatible. With --with-context, each result keeps the existing fields and additionally includes:

schema_version: "probe.query.context.v1" at the top level
language: inferred source language
pattern: source pattern and nullable pattern ID
match: exact AST match node type, content, line range, and column range
owner: smallest useful enclosing source block where available
owner.symbol and owner.qualified_symbol
owner.node_type, owner.scope, owner.lines, and owner.columns
owner.comments: attached leading/source comments as raw source facts
owner.enclosing_symbols: containing class/module/impl-style symbols where knowable
owner.enclosing_call / owner.enclosing_calls: generic call context for callback/call nesting where knowable
owner.content: owning block source text when available

This is intentionally generic. Probe reports source facts only; it does not parse requirement IDs, policy annotations, checklist semantics, test frameworks, or security meanings.

Implemented: search JSON source metadata

Search JSON keeps its existing shape but now includes additional source facts useful for text-first evidence discovery:

language
owner_symbol for common owner blocks, including TS/JS methods and exported const arrows
owner_qualified_symbol, e.g. PolicyService.evaluatePolicy
enclosing_symbols, e.g. containing class/module/impl symbols
enclosing_call / enclosing_calls, e.g. generic describe(...) / it(...) callback call chains without framework interpretation
leading_comments with line ranges and raw text
matches with text, line/column range, kind (comment, string, code), and comment_role when applicable

The added regressions cover the intended search path:

probe search --allow-tests --strict-elastic-syntax --max-results 20 --no-merge \
  --format json '"SYS-REQ-424" OR "SYS-REQ-425"' ./fixture

They verify TS method ownership, exported const arrow ownership, qualified class-method identity, nested callback call chains, test callback scope, leading comments, classified comment matches, string-literal hits, and loose comment hits. Probe still does not interpret which comments are valid evidence; downstream tools decide that from the raw source facts.

Performance / maintainability notes

Query/search context generation now uses a reusable ParsedSourceContext abstraction.
JSON formatting keeps a per-command HashMap<PathBuf, Option<ParsedSourceContext>>, so each source file is parsed at most once per formatting pass.
Unsupported or unparseable files are cached as None, so repeated results from the same unsupported file do not retry parsing.
This keeps the PR scoped: no new product mode or global parser cache, just efficient reuse inside the existing output path.

JS/TS fixes

probe search -o json can now report owner symbols for TS/JS method_definition blocks.
probe search -o json can now report owner symbols for exported const arrow functions even when the returned block is an export_statement.
probe symbols now unwraps common TS/JS export_statement / declare_statement wrappers and returns semantic names like PolicyService and normalizeDecision instead of generic export_statement names.

Architecture notes

Keeps existing default output compatible; richer query metadata is opt-in via --with-context.
Uses existing parser/language abstractions rather than adopting the issue contract verbatim.
Shares source-context helpers between query and search while keeping their JSON contracts compatible with existing command behavior.
Treats this as the strong Probe-native slice: structural query context and neutral search/source metadata.

Out of scope for this PR

batch pattern execution / --patterns-file
new search --semantic-blocks behavior
new extract --semantic-block behavior
wildcard query syntax changes beyond existing ast-grep support
policy interpretation of comments, requirement IDs, frameworks, or checklist semantics

Tests

cargo fmt --all -- --check
cargo clippy --all-targets --all-features -- -D warnings
cargo test --lib
cargo test --test integration_tests
cargo test --test query_command_tests
cargo test --test query_command_json_tests test_query_json_with_context_reports_owner_block
cargo test --test json_format_tests test_search_json_no_merge_reports_req_id_source_metadata
cargo test --test json_format_tests test_search_json_classifies_req_id_noise_without_policy_interpretation
npm test -- --runInBand npm/tests/unit/search-delegate.test.js
git diff --check

Refs #557
Refs #558

Add --with-context for query JSON output, backed by a shared language-agnostic source context helper. Also expose compatible search JSON metadata for language, leading comments, and classified text matches, and unwrap JS/TS export statements in symbols output.

Track block comments, HTML comments, and multiline quoted strings when adding search match metadata so the new JSON fields do not regress existing search comment handling.

probelabs · 2026-05-06T06:15:25Z

PR Overview: Add Query Owner Context Metadata

Summary

This PR introduces a language-agnostic source context layer for Probe's query and search commands, enabling rich owner-block metadata extraction without interpreting domain semantics. The implementation adds opt-in JSON context for structural queries and enhances search JSON output with source facts.

Key Changes

1. New `--with-context` Flag for Query Command

Usage:

probe query 'fetch($$$ARGS)' ./src -l typescript --format json --with-context

Adds to each result:

schema_version: "probe.query.context.v1"
language: Inferred source language
pattern: Source pattern and nullable pattern ID
match: Exact AST match node type, content, line/column ranges
owner: Smallest enclosing source block with:
- symbol and qualified_symbol (e.g., PolicyService.evaluatePolicy)
- node_type, scope, lines, columns
- comments: Leading/source comments as raw facts
- enclosing_symbols: Containing class/module/impl symbols
- enclosing_call / enclosing_calls: Generic call context for callbacks
- content: Owning block source text

2. Enhanced Search JSON Output

New fields added to existing search JSON:

language: Inferred language when available
owner_symbol: Owning symbol (methods, exported const arrows)
owner_qualified_symbol: Qualified owner (e.g., PolicyService.evaluatePolicy)
enclosing_symbols: Containing class/module/impl symbols
enclosing_call / enclosing_calls: Generic call chains (e.g., describe(...) / it(...))
leading_comments: Raw leading comments with line ranges
matches: Classified text matches with:
- kind: "comment", "string", or "code"
- comment_role: "leading" when applicable

3. New `semantic_context.rs` Module (982 lines)

Core responsibilities:

AST traversal for owner node detection
Symbol extraction from tree-sitter nodes
Comment parsing (block, line, HTML)
String/comment classification for matches
Scope classification (function, module, declaration, test)
Enclosing symbol and call chain collection

Key types:

ParsedSourceContext: Caches parsed tree for a file
QuerySourceContext: Context for query matches
OwnerContext: Rich owner block metadata
SourceComment, SourceMatch, EnclosingSymbol, EnclosingCall

4. JS/TS Symbol Extraction Fixes

Changes in src/extract/symbols.rs:

Added semantic_symbol_node() to unwrap export_statement / declare_statement wrappers
Now returns semantic names like PolicyService instead of generic export_statement
Handles exported const arrow functions correctly

Test coverage:

test_exported_typescript_symbols_use_inner_declaration_names()

5. CLI and NPM Integration

CLI changes (src/cli.rs):

Added --with-context / --owner-context flag to query command

NPM changes:

npm/src/query.js: Added withContext flag mapping
npm/src/tools/common.js: Updated querySchema with with_context field
npm/src/tools/vercel.js: Pass withContext through to query execution

6. Comprehensive Test Coverage

New tests in tests/json_format_tests.rs:

test_search_json_no_merge_reports_req_id_source_metadata: Validates TS method ownership, exported const arrows, qualified symbols, nested callback chains, leading comments, and classified matches
test_search_json_classifies_req_id_noise_without_policy_interpretation: Ensures string literals, loose comments, and non-annotation comments are classified correctly without domain interpretation

New test in tests/query_command_json_tests.rs:

test_query_json_with_context_reports_owner_block: Validates query context output with owner block, comments, and match metadata

Architecture & Impact

Component Relationships

graph TD
    A[query --with-context] --> B[format_and_print_query_results]
    B --> C[ParsedSourceContext::parse]
    C --> D[query_source_context]
    D --> E[find_owner_node]
    D --> F[build_owner_context]
    F --> G[extract_symbol_name_from_node]
    F --> H[collect_enclosing_symbols]
    F --> I[collect_enclosing_calls]
    F --> J[leading_comments_before_node]
    
    K[search -o json] --> L[format_and_print_json_results]
    L --> C
    L --> M[search_owner_context]
    M --> F
    L --> N[leading_comments_from_block]
    L --> O[classify_text_matches_in_block]
    
    P[semantic_context.rs] -.-> C
    P -.-> D
    P -.-> M
    P -.-> F

Data Flow for Query Context

sequenceDiagram
    participant User
    participant CLI
    participant Query
    participant SemanticContext
    participant TreeSitter
    
    User->>CLI: probe query 'pattern' --with-context
    CLI->>Query: handle_query(with_context=true)
    Query->>Query: query_file() - find matches
    Query->>SemanticContext: ParsedSourceContext::parse(file)
    SemanticContext->>TreeSitter: parse source code
    TreeSitter-->>SemanticContext: AST tree
    Query->>SemanticContext: query_source_context(byte_start, byte_end)
    SemanticContext->>SemanticContext: find_smallest_covering_node()
    SemanticContext->>SemanticContext: find_owner_node()
    SemanticContext->>SemanticContext: build_owner_context()
    SemanticContext-->>Query: QuerySourceContext
    Query->>Query: Build JSON with owner fields
    Query-->>User: JSON with schema_version, match, owner

Affected System Components

Query Command (src/query.rs)
- Added with_context to QueryOptions
- Modified format_and_print_query_results() to include context when enabled
- Caches parsed files per query run
Search Output (src/search/search_output.rs)
- Enhanced JsonResult struct with new fields
- Integrated semantic_context functions for owner extraction
- Added match classification and comment extraction
Symbol Extraction (src/extract/symbols.rs)
- Added semantic_symbol_node() for wrapper unwrapping
- Fixed TS/JS exported declaration handling
NPM Package
- Updated query tool schema and execution
- Added withContext parameter support
Documentation
- Updated CLI reference (docs/probe-cli/query.md)
- Updated output formats reference (docs/reference/output-formats.md)
- Updated SDK tools reference (docs/probe-agent/sdk/tools-reference.md)

Scope Discovery & Context Expansion

Immediate Impact

Directly affected modules:

src/query.rs - Query formatting and options
src/search/search_output.rs - JSON output formatting
src/extract/symbols.rs - Symbol extraction logic
src/semantic_context.rs - NEW - Core context extraction
src/cli.rs - CLI argument definitions
src/main.rs - Main entry point wiring
src/lib.rs - Module exports

Test files:

tests/json_format_tests.rs - Search JSON regressions
tests/query_command_json_tests.rs - Query JSON regressions
tests/query_command_tests.rs - Updated for new option
tests/lib_usage.rs - Updated for new option

Related Files to Review

Language implementations (for owner node detection):

src/language/rust.rs - Rust-specific node handling
src/language/typescript.rs - TS/JS-specific node handling
src/language/go.rs - Go-specific node handling
src/language/language_trait.rs - Language trait definitions

Search integration:

src/search/file_processing.rs - SearchResult creation
src/models.rs - SearchResult and ParentContext models

NPM integration:

npm/src/tools/common.js - Shared schemas
npm/src/tools/vercel.js - Vercel AI SDK integration
npm/src/query.js - Query execution wrapper

Design Decisions

Opt-in via flag: Default behavior unchanged; richer metadata requires explicit --with-context
Source facts only: Probe reports raw metadata; downstream tools interpret semantics
Shared context layer: Both query and search use semantic_context module
Backward compatible: Existing JSON output structure preserved; new fields are additive
Language-agnostic: Works across all supported languages using tree-sitter

Out of Scope (Explicitly Deferred)

Batch pattern execution / --patterns-file
New search --semantic-blocks behavior
New extract --semantic-block behavior
Wildcard query syntax changes
Policy interpretation of comments, requirements, or test frameworks

Review Notes

Key areas to focus on:

src/semantic_context.rs:130-180 - AST traversal and owner detection logic
src/semantic_context.rs:200-350 - Multi-language symbol extraction patterns
src/semantic_context.rs:750-880 - Lexical scanner for match classification
src/query.rs:396-443 - Query JSON formatting with context
src/search/search_output.rs:574-680 - Search JSON integration
src/extract/symbols.rs:144-165 - TS/JS export wrapper handling

Performance considerations:

File parsing is cached per query run via HashMap<PathBuf, Option<ParsedSourceContext>>
Each file is parsed once, then reused for all matches in that file
Comment and string classification uses lexical scanning as fallback when AST node info insufficient

Testing strategy:

Mixed TS/Go test fixtures validate cross-language behavior
Negative test cases ensure string literals and loose comments are classified correctly
Tests verify Probe does not add domain-specific interpretation

References

Core implementation:

src/semantic_context.rs:1-982 - Complete new module
src/query.rs:327-443 - Query context integration
src/search/search_output.rs:563-727 - Search JSON enhancements
src/extract/symbols.rs:106-165 - Symbol extraction fixes

CLI and integration:

src/cli.rs:370-380 - CLI flag definition
src/main.rs:846-871 - Main wiring
npm/src/query.js:18-27 - NPM flag mapping

Tests:

tests/json_format_tests.rs:95-365 - Search regressions
tests/query_command_json_tests.rs:311-401 - Query regressions
src/extract/symbols.rs:629-665 - Symbol extraction tests

Documentation:

docs/probe-cli/query.md:119-225 - CLI documentation
docs/reference/output-formats.md:56-183 - Output format reference
docs/probe-agent/sdk/tools-reference.md:163-190 - SDK reference

Metadata

Review Effort: 4 / 5
Primary Label: feature

Powered by Visor from Probelabs

Last updated: 2026-05-06T10:57:22.776Z | Triggered by: pr_updated | Commit: 647705b

💡 TIP: You can chat with Visor using /visor ask <your question>

probelabs · 2026-05-06T06:19:30Z

\n\n

Architecture Issues (10)

Severity	Location	Issue
🟠 Error	`src/semantic_context.rs:570`	extract_owner_symbol_from_source() duplicates logic already present in search_output.rs::extract_owner_symbol(). Both functions parse code strings to extract symbol names using similar patterns (func, fn, def, class, etc.). This creates maintenance burden and inconsistency risks. 💡 Suggestion Consolidate symbol extraction logic into a single shared utility. The search_output.rs version could be refactored to call semantic_context::extract_owner_symbol_from_source, or both could delegate to a common parser abstraction.
🟡 Warning	`src/semantic_context.rs:982`	The new semantic_context.rs module is 982 lines with significant complexity. It implements multiple responsibilities: AST node traversal, comment extraction, lexical analysis for string/comment classification, symbol extraction, and context building. This violates single responsibility principle and creates a large, difficult-to-maintain module. 💡 Suggestion Consider splitting this module into smaller, focused modules: (1) AST traversal utilities, (2) Comment extraction, (3) Lexical analysis for match classification, (4) Symbol/name extraction, (5) Context building orchestration. Each would have clear boundaries and be independently testable.
🟡 Warning	`src/semantic_context.rs:680`	The classify_match_kind_at() function implements a custom lexical state machine to distinguish between code, comments, and strings. This is a special-case solution that reimplements language-specific parsing logic already handled by tree-sitter parsers. 💡 Suggestion Use tree-sitter's existing node type information to determine if a match is in a comment or string literal. The AST already knows node types like 'comment', 'string', 'string_fragment' - leverage that instead of reimplementing lexical analysis.
🟡 Warning	`src/semantic_context.rs:820`	normalize_owner_node() contains special-case logic for variable_declarator and arguments nodes. This pattern of checking specific parent/child relationships creates fragile code that breaks with language grammar changes. 💡 Suggestion Replace hardcoded parent/child checks with a more general abstraction. Consider a trait or strategy pattern where each language can define its own owner normalization rules, or use tree-sitter's field name queries more generically.
🟡 Warning	`src/search/search_output.rs:915`	classify_scope() was partially refactored to use classify_scope_from_node() from semantic_context, but only for the shared_scope case. The remaining function-specific logic (comment blocks attached to functions, etc.) creates inconsistency where some scope classification is centralized but some remains local. 💡 Suggestion Complete the refactoring by moving all scope classification logic into semantic_context. The local checks for function signatures in code content should also be centralized to ensure consistent behavior across search and query commands.
🟡 Warning	`src/query.rs:388`	format_and_print_query_results() now accepts pattern and with_context parameters that are only used for JSON output. This creates unnecessary coupling - the function signature changes for all formats when only JSON needs these parameters. 💡 Suggestion Refactor to use a context object or builder pattern for JSON-specific metadata. Alternatively, split JSON formatting into a separate function that takes the additional parameters, keeping the main formatter signature simple.
🟡 Warning	`src/semantic_context.rs:140`	extract_owner_symbol_from_source() contains extensive language-specific pattern matching with hardcoded prefixes like 'export function ', 'async function ', 'export const ', etc. This creates a maintenance burden and doesn't scale well across languages. 💡 Suggestion Replace hardcoded string patterns with a data-driven approach using language-specific configuration. Define patterns in a structured format (e.g., HashMap<Language, Vec<Pattern>>) that can be extended without modifying core logic.
🟡 Warning	`src/semantic_context.rs:320`	leading_comments_from_block() implements custom comment parsing logic with state tracking for block comments. This duplicates tree-sitter's comment node tracking and creates edge cases (e.g., nested block comments, HTML comments). 💡 Suggestion Use tree-sitter's comment nodes directly instead of manual parsing. Query the AST for comment nodes preceding the target node, which would be more reliable and handle all language-specific comment syntaxes correctly.
🟡 Warning	`src/extract/symbols.rs:109`	semantic_symbol_node() specifically handles 'export_statement' and 'declare_statement' wrapper nodes for TypeScript/JavaScript. This is a language-specific special case that doesn't generalize to other languages with similar wrapper patterns. 💡 Suggestion Generalize this pattern by adding a language trait method like 'unwrap_semantic_node()' that each language implementation can override. This would make the abstraction consistent across all languages rather than hardcoding TS/JS behavior.
🟡 Warning	`src/semantic_context.rs:1`	The ParsedSourceContext struct and its methods mix parsing, caching, and context extraction concerns. The parse() method returns Option but doesn't distinguish between parse failures and unsupported languages, making error handling opaque. 💡 Suggestion Separate parsing from context extraction. Create a Result-based parse function that can distinguish errors, and a separate context builder that takes parsed AST. Add explicit language support checking rather than failing silently with None.

Performance Issues (7)

Severity	Location	Issue
🟡 Warning	`src/query.rs:398-445`	Query JSON output with --with-context may parse the same file multiple times. The HashMap cache stores Option<ParsedSourceContext> and calls ParsedSourceContext::parse() inside or_insert_with for each match. For queries returning many matches from the same file, this causes redundant file I/O and tree-sitter parsing. 💡 Suggestion Restructure the caching to parse each file only once. The current implementation calls parse() repeatedly for each result from the same file. Consider parsing once and storing the actual ParsedSourceContext result rather than calling parse() multiple times.
🟡 Warning	`src/search/search_output.rs:630-670`	Search JSON output may parse the same file multiple times when processing multiple results from the same file. The parsed_files HashMap stores Option<ParsedSourceContext> and calls ParsedSourceContext::parse() for each result, potentially causing redundant file I/O and parsing. 💡 Suggestion Restructure the caching to parse each file only once. The current implementation calls parse() inside the or_insert_with closure for every result from the same file, which is inefficient.
🟡 Warning	`src/semantic_context.rs:818-870`	classify_text_matches_in_block performs repeated string allocations and to_lowercase() conversions for each keyword on each line. For large code blocks with many keywords, this creates significant temporary string allocations. 💡 Suggestion Consider using case-insensitive matching without allocating new strings, or normalize the code once and search against it. The current approach calls to_lowercase() on both needle and haystack for every match attempt.
🟡 Warning	`src/semantic_context.rs:642-680`	find_search_owner_node collects all owner nodes in a line range into a Vec, then finds the minimum. This traverses potentially large portions of the AST and allocates memory for all candidates before selecting one. 💡 Suggestion Consider tracking the best candidate during traversal instead of collecting all candidates. This would avoid allocating the Vec and storing all intermediate nodes.
🟡 Warning	`src/semantic_context.rs:770-810`	classify_match_kind_at performs character-by-character scanning of all code lines up to the target position. For large code blocks, this scans potentially thousands of characters to classify a single match. 💡 Suggestion Consider using the tree-sitter AST to determine if a position is in a comment or string literal, rather than manual lexical scanning. The AST already has this information.
🟡 Warning	`src/semantic_context.rs:872-920`	scan_line_until performs character-by-character scanning with manual state tracking. This is reimplementing lexical analysis that tree-sitter already provides. 💡 Suggestion Use tree-sitter's node information to determine if a byte offset is within a comment or string node. The AST already has this information parsed.
🟡 Warning	`src/semantic_context.rs:410-440`	extract_owner_symbol_from_source iterates through lines with multiple string operations for many different patterns. This is called frequently and performs many temporary string allocations. 💡 Suggestion Consider using regex compilation or a more efficient pattern matching approach. The current approach performs many string operations even when no match is found.

Quality Issues (11)

Severity	Location	Issue
🟡 Warning	`src/semantic_context.rs:898`	Function extract_owner_symbol_from_source iterates over first 20 lines using take(20) without documenting why this limit is appropriate. This arbitrary cutoff could miss symbol declarations in longer function prologues or complex decorators. 💡 Suggestion Add documentation explaining the 20-line limit is a heuristic for performance, or make it a configurable constant. Consider whether this should instead search until finding the first non-comment, non-blank line with a declaration.
🟡 Warning	`tests/json_format_tests.rs:263`	Test assertion expects exactly 5 results but does not document why this specific count is expected. If fixture data changes, this test will fail without clear indication of what the correct count should be. 💡 Suggestion Add a comment explaining why 5 results are expected, or use a more flexible assertion that validates the presence of specific expected results rather than an exact count.
🟡 Warning	`src/semantic_context.rs:898`	Function extract_owner_symbol_from_source has high cyclomatic complexity with multiple nested if-let chains and pattern matching. This makes the function difficult to test, maintain, and extend for new language patterns. 💡 Suggestion Refactor into smaller helper functions for each language or pattern type. Consider a strategy pattern where language-specific extractors are registered and tried in sequence.
🟡 Warning	`src/semantic_context.rs:898`	The extract_owner_symbol_from_source function contains duplicated logic for extracting names after keywords. Multiple calls to extract_name_after_keyword with different prefixes could be consolidated. 💡 Suggestion Create a list of prefix patterns and iterate over them, or use a macro to generate the repeated pattern matching code.
🟡 Warning	`src/semantic_context.rs:898`	The test module in semantic_context.rs lacks comprehensive negative test cases. While test_classify_text_matches_in_comments_and_strings tests positive cases, there are no tests for edge cases like empty code, code with only comments, or malformed source that could cause panics. 💡 Suggestion Add tests for negative cases: empty strings, code with no symbols, code with only whitespace/comments, and malformed syntax that should not crash the parser.
🟡 Warning	`src/semantic_context.rs:898`	No tests verify error handling in ParsedSourceContext::parse when file reading fails or parsing fails. The function uses Option extensively but tests do not verify graceful failure. 💡 Suggestion Add tests that verify parse() returns None for non-existent files, unreadable files, or files with syntax errors that cause tree parsing to fail.
🟡 Warning	`src/semantic_context.rs:898`	The semantic_context module mixes concerns: language-specific symbol extraction, comment parsing, lexical analysis for string/comment classification, and AST traversal. This violates single responsibility principle. 💡 Suggestion Consider splitting into submodules: language (symbol extraction), comments (comment parsing), lexical (string/comment classification), and ast (tree traversal utilities).
🟡 Warning	`src/semantic_context.rs:898`	The classify_match_kind_at function performs complex lexical state machine parsing but has no dedicated unit tests. It is only tested indirectly through test_classify_text_matches_in_comments_and_strings. 💡 Suggestion Add dedicated tests for classify_match_kind_at covering: nested quotes, escaped characters, block comments spanning multiple lines, mixed comment and string on same line, and edge cases like unclosed comments/strings.
🟡 Warning	`tests/json_format_tests.rs:263`	Test test_search_json_no_merge_reports_req_id_source_metadata has complex fixture setup with multiple files and many assertions. This makes the test brittle - changes to output format or fixture structure will require updating many assertions. 💡 Suggestion Break this test into smaller, focused tests. Create helper functions to reduce duplication. Consider using table-driven tests for the different result type validations.
🟡 Warning	`src/semantic_context.rs:898`	The scan_line_until function uses byte indexing with idx += ch.len_utf8() but does not validate that idx never exceeds line.len(). While the loop condition should prevent this, there is no explicit bounds checking. 💡 Suggestion Add explicit bounds checking or use a safer iteration pattern. Consider adding debug assertions to catch potential out-of-bounds access in testing.
🟡 Warning	`src/extract/symbols.rs:109`	Test test_exported_typescript_symbols_use_inner_declaration_names only tests positive cases where export_statement wrappers are successfully unwrapped. It does not test edge cases like nested exports, re-exports, or export statements without semantic declarations. 💡 Suggestion Add tests for: export of already-exported symbols, re-export statements (export { X } from './y'), type-only exports, and default exports to ensure semantic_symbol_node handles all cases correctly.

Powered by Visor from Probelabs

Last updated: 2026-05-06T10:50:52.789Z | Triggered by: pr_updated | Commit: 647705b

💡 TIP: You can chat with Visor using /visor ask <your question>

buger · 2026-05-06T10:03:03Z

Thanks, this PR now covers the important first slice for Proof's source-discovery path. The scoped split in the description is right: Probe should expose source facts, while Proof keeps requirement/test-policy interpretation and language-specific instrumentation.

For Proof's planned cleanup, this is enough to move req-id text discovery/autolink toward Probe:

search --allow-tests --no-merge -o json can return annotation-bearing blocks.
Search JSON now exposes language, owner_symbol, leading_comments, and classified matches.
That lets Proof reject string-literal hits and apply its own Implements / Verifies / MCDC comment policy without reparsing every file.
Common TS/JS owners like methods and exported const arrows are covered.

What is still missing before Proof can remove the remaining non-instrumentation AST/source readers:

Search-level enclosing context

query --with-context exposes owner.qualified_symbol, owner.enclosing_symbols, and owner.enclosing_calls, but search only exposes owner_symbol.

For Proof, req-id discovery starts with text search (SYS-REQ-*, SW-REQ-*, etc.), not structural query patterns. When a req-id appears in a JS/TS callback block, Proof still needs generic context like:
```
{
  "owner_symbol": null,
  "enclosing_calls": [
    {"callee": "describe", "first_arg_literal": "normalization", "line": 13},
    {"callee": "it", "first_arg_literal": "normalizes decisions", "line": 15}
  ]
}
```
Probe should not interpret this as a test framework. It only needs to expose the AST call chain. Proof can decide whether describe / it matters.
Search-level qualified owners / containing symbols

Search can now return owner_symbol, but Proof still cannot reliably distinguish PolicyService.evaluatePolicy from another evaluatePolicy in the same file/module without either reparsing or accepting weaker evidence identity.

A lightweight search-side equivalent of the query owner fields would be enough:
```
{
  "owner_symbol": "evaluatePolicy",
  "owner_qualified_symbol": "PolicyService.evaluatePolicy",
  "enclosing_symbols": [
    {"kind": "class", "name": "PolicyService", "line": 1}
  ]
}
```
Search regression for negative text hits

The new search regression covers the positive path well. It would be useful to also assert the negative/source-classification cases through the real search JSON path:
- req id inside a string -> matches[].kind == "string"
- req id in a loose non-annotation comment -> matches[].kind == "comment" but no domain interpretation from Probe
- leading annotation comment -> matches[].kind == "comment" and comment_role == "leading"
Proof can then make its policy decision entirely from Probe JSON.
Semantic block mode remains a later gap

--no-merge is good enough for the first integration. Longer term, an explicit search mode that guarantees one semantic owner per result would let evidence tooling avoid relying on merge heuristics:
```
probe search --semantic-blocks --allow-tests -o json '"SYS-REQ-424"'
```
Not asking for that in this PR, just calling out that it remains the thing that would let Proof remove more local source-block cleanup.

With those additions, Proof could use Probe as the generic source-reader for trace/evidence discovery across Go/JS/TS and reserve local AST parsing for cases that truly require transformation or language execution semantics, such as MC/DC instrumentation and coverage/tool-specific processing.

buger added 2 commits May 6, 2026 06:05

fix: classify multiline source comments in metadata

b995eb6

Track block comments, HTML comments, and multiline quoted strings when adding search match metadata so the new JSON fields do not regress existing search comment handling.

buger added 2 commits May 6, 2026 06:30

docs: document query owner context

56724ae

test: cover req id search json metadata

89b0fab

buger added 2 commits May 6, 2026 10:07

feat: add search owner context metadata

1a9be12

perf: reuse parsed source context

647705b

buger merged commit 6839a0c into main May 6, 2026
18 of 19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add query owner context metadata#560

feat: add query owner context metadata#560
buger merged 6 commits into
mainfrom
feat/semantic-owner-context-557-558

buger commented May 6, 2026 •

edited

Loading

Uh oh!

probelabs Bot commented May 6, 2026 •

edited

Loading

Uh oh!

probelabs Bot commented May 6, 2026 •

edited

Loading

Uh oh!

buger commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

buger commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Implemented: probe query --with-context

Implemented: search JSON source metadata

Performance / maintainability notes

JS/TS fixes

Architecture notes

Out of scope for this PR

Tests

Uh oh!

probelabs Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Overview: Add Query Owner Context Metadata

Summary

Key Changes

1. New --with-context Flag for Query Command

2. Enhanced Search JSON Output

3. New semantic_context.rs Module (982 lines)

4. JS/TS Symbol Extraction Fixes

5. CLI and NPM Integration

6. Comprehensive Test Coverage

Architecture & Impact

Component Relationships

Data Flow for Query Context

Affected System Components

Scope Discovery & Context Expansion

Immediate Impact

Related Files to Review

Design Decisions

Out of Scope (Explicitly Deferred)

Review Notes

References

Uh oh!

probelabs Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Architecture Issues (10)

Performance Issues (7)

Quality Issues (11)

Uh oh!

buger commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

buger commented May 6, 2026 •

edited

Loading

Implemented: `probe query --with-context`

probelabs Bot commented May 6, 2026 •

edited

Loading

1. New `--with-context` Flag for Query Command

3. New `semantic_context.rs` Module (982 lines)

probelabs Bot commented May 6, 2026 •

edited

Loading