Skip to content

fix: unknown lucene field falls through in search#2422

Open
karl-power wants to merge 2 commits into
mainfrom
lucene-unknown-field-falls-through-as-a-raw-sql-identifier
Open

fix: unknown lucene field falls through in search#2422
karl-power wants to merge 2 commits into
mainfrom
lucene-unknown-field-falls-through-as-a-raw-sql-identifier

Conversation

@karl-power

@karl-power karl-power commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Summary

Why

When a Lucene search field couldn't be resolved to a real column, the CustomSchemaSQLSerializerV2 fall-through emitted it verbatim as a raw SQL identifier (queryParser.ts, the old // It might be an alias, let's just try the column branch). That had two problems:

  • Typos / non-existent fields produced SQL like WHERE myTypo = '...', which ClickHouse rejects with a confusing Unknown identifier error instead of simply returning no rows.
  • The fall-through was the only thing making legitimate SELECT-alias references work (ClickHouse resolves SELECT aliases in WHERE), but it couldn't tell an alias apart from a typo — both were emitted blindly.
  • Why WITH-clause aliases matter — saved-search alerts: a SAVED_SEARCH alert's query selects count(), not the saved search's select, so the saved search's select aliases are injected as expression WITH clauses by computeAliasWithClauses (in the alert task — unchanged here). Including those in the alias set keeps a Lucene alert WHERE such as body:wrong (where the saved search declares toString(Body) AS body) resolving to the alias instead of collapsing to (1 = 0). Without this, gating the fall-through would have silently regressed saved-search alerts that reference a select alias in their WHERE — they'd stop firing.

This change makes that distinction explicit: a field that matches a known SELECT alias is emitted as a bare identifier; anything genuinely unknown resolves to the no-match predicate (1 = 0).

What changed

  • Gate the fall-through on known aliases (queryParser.ts): CustomSchemaConfig / CustomSchemaSQLSerializerV2 gain an optional selectAliases: Set<string>. The resolver now returns the bare identifier only when selectAliases.has(field); otherwise it returns found: false, which renders as (1 = 0).

  • Collect aliases from the chart config (renderChartConfig.ts): new extractSelectAliases({ selectLists, withClauses }) helper gathers the identifiers a Lucene WHERE may legally reference, from three sources:

    • array-form select ({ valueExpression, alias }[]) — reads alias directly;
    • string-form select (raw SQL, e.g. a source's defaultTableSelectExpression such as 'Timestamp, ServiceName as service, Body') — parsed via the existing chSqlToAliasMap() helper to recover declared aliases. This matters because the default search/events view uses string-form selects, so without it the default view would lose alias resolution;
    • expression-form WITH clauses — contributes a clause's name only when it declares an expression alias (isSubquery === false, i.e. WITH (expr) AS ident). Subquery CTEs (WITH ident AS (subquery)) are excluded, since they name a table-like source rather than a column usable in WHERE.

    The set is threaded through renderWhereExpressionStr into the serializer.

  • Null-safety fix (queryParser.ts): the exact-match branch now treats a null/undefined materialized-columns lookup the same as the existing catch path (?? new Map()), so a missing lookup proceeds with no materialized columns instead of throwing on .entries().

Behavior change

Lucene field Before After
Real column (ServiceName:foo) resolves resolves (unchanged)
SELECT alias (Content:foo where Body AS Content) raw identifier (worked by luck) resolves explicitly
Unknown / typo (myTypo:foo) raw identifier → ClickHouse error (1 = 0) → no rows

Example that now works end-to-end: SELECT Body AS Content … WHERE Content = '…'

How to test on Vercel preview

Preview routes: /search

Steps:

  1. Go to the search page.
  2. Try some of the search examples from the Behavior change section above.
  3. Ensure alerts fire correctly.

References

  • Linear Issue: Closes HDX-4367

@karl-power karl-power changed the title fix: unknown lucene field falls through fix: unknown lucene field falls through in search Jun 5, 2026
@changeset-bot

changeset-bot Bot commented Jun 5, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: 6f2a0c4

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@hyperdx/common-utils Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel

vercel Bot commented Jun 5, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hyperdx-oss Ready Ready Preview, Comment Jun 8, 2026 10:04am
hyperdx-storybook Ready Ready Preview, Comment Jun 8, 2026 10:04am

Request Review

@github-actions github-actions Bot added the review/tier-2 Low risk — AI review + quick human skim label Jun 5, 2026
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

🔵 Tier 2 — Low Risk

Small, isolated change with no API route or data model modifications.

Why this tier:

  • Standard feature/fix — introduces new logic or modifies core functionality

Review process: AI review + quick human skim (target: 5–15 min). Reviewer validates AI assessment and checks for domain-specific concerns.
SLA: Resolve within 4 business hours.

Stats
  • Production files changed: 2
  • Production lines changed: 105 (+ 440 in test files, excluded from tier calculation)
  • Branch: lucene-unknown-field-falls-through-as-a-raw-sql-identifier
  • Author: karl-power

To override this classification, remove the review/tier-2 label and apply a different review/tier-* label. Manual overrides are preserved on subsequent pushes.

@karl-power karl-power force-pushed the lucene-unknown-field-falls-through-as-a-raw-sql-identifier branch from 09e5e96 to a6ed715 Compare June 5, 2026 11:44
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

E2E Test Results

All tests passed • 197 passed • 3 skipped • 1354s

Status Count
✅ Passed 197
❌ Failed 0
⚠️ Flaky 5
⏭️ Skipped 3

Tests ran across 4 shards in parallel.

View full report →

@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Deep Review

Gating the unknown-field fall-through is the right call, and the implementation is sound: every getColumnForField consumer short-circuits on !found to the (1 = 0) predicate, the alias path is strictly narrower than the prior unconditional verbatim emission (no new injection surface), and the ?? new Map() addition fixes a real latent .entries() crash that the null-returning test mock now exercises. No ship-blockers. The findings below are a coverage gap in alias collection plus test/clarity recommendations.

✅ No critical issues found.

🟡 P2 -- recommended

  • packages/common-utils/src/core/renderChartConfig.ts:1112 -- extractSelectAliases collects aliases only from select and groupBy, so a Lucene WHERE referencing a WITH expr AS name expression-alias (isSubquery: false) now resolves to (1 = 0) and silently returns zero rows, where the prior fall-through emitted the bare identifier that ClickHouse resolved.
    • Fix: Also feed expression-alias with entries (isSubquery === false) into the alias set, excluding subquery CTEs which are table names rather than field identifiers.
    • correctness
  • packages/common-utils/src/core/renderChartConfig.ts:1112 -- the groupBy argument to extractSelectAliases is collected but no test exercises a Lucene WHERE that references a groupBy-declared alias, so that branch is unverified.
    • Fix: Add a renderChartConfig test with a groupBy alias referenced from a Lucene WHERE, asserting it resolves to a bare identifier.
    • testing
  • packages/common-utils/src/queryParser.ts:1602 -- a negated unknown field (-NotAColumn:foo) returns (1 = 0) before isNegatedField is applied, so an excluded unknown field matches nothing instead of everything, and no test pins this newly-reachable behavior.
    • Fix: Add a test for -NotAColumn:foo pinning the emitted predicate, and confirm whether an excluded unknown field should match all rows or none.
    • testing, correctness
🔵 P3 nitpicks (4)
  • packages/common-utils/src/queryParser.ts:1602 -- the found: false branch still returns columnExpression: field and columnType: 'Unknown', which no caller reads, making the not-found result look like a usable column.
    • Fix: Convert the return type to a discriminated union ({ found: false } | { found: true; columnExpression; columnType; ... }) so the not-found path carries no phantom fields.
    • kieran-typescript, maintainability
  • packages/common-utils/src/core/renderChartConfig.ts:996 -- the comment claims the scaffold mirrors a SELECT * FROM `t` form, but the code emits SELECT ${...} FROM t (no backticks, no *) and the referenced scaffold differs.
    • Fix: Align the scaffold to the backtick form used elsewhere or drop the inaccurate cross-reference.
    • maintainability
  • packages/common-utils/src/core/renderChartConfig.ts:993 -- the comment states chSqlToAliasMap swallows parse failures and returns {}, but it logs via console.error and returns whatever aliases it parsed before the throw.
    • Fix: Reword the comment to reflect that it logs and returns the partial alias map.
    • kieran-typescript, maintainability
  • packages/common-utils/src/__tests__/renderChartConfig.test.ts:2117 -- the new alias-resolution tests assert with toContain rather than exact toBe, so a regression that leaks extra raw SQL around the alias could still pass.
    • Fix: Compare the full parameterizedQueryToSql output with toBe, matching the strong assertions used in queryParser.test.ts.
    • testing

Reviewers (6): correctness, testing, maintainability, kieran-typescript, security, performance.

Testing gaps:

  • No test exercises getMaterializedColumnsLookupTable returning null to confirm the ?? new Map() guard (the new alias tests take the fall-through path, not exactMatch).
  • No test merges aliases across select + groupBy simultaneously, nor recovers a bracket/array-index alias (e.g. ResourceAttributes['x'] as y) from a string-form select.

@karl-power karl-power marked this pull request as draft June 5, 2026 14:38
@greptile-apps

greptile-apps Bot commented Jun 8, 2026

Copy link
Copy Markdown

Greptile Summary

This PR fixes a correctness bug in the Lucene-to-SQL serializer where an unresolvable field was emitted as a raw SQL identifier (causing ClickHouse Unknown identifier errors on typos) by gating the fall-through on a new selectAliases: Set<string> that lists every alias a WHERE clause may legally reference.

  • queryParser.ts: CustomSchemaSQLSerializerV2 gains an optional selectAliases set; the previously unconditional "try the column" fall-through now returns found: true only for known aliases, and found: false (→ (1 = 0)) for everything else. A null-safety fix prevents .entries() from throwing when getMaterializedColumnsLookupTable returns null/undefined.
  • renderChartConfig.ts: New extractSelectAliases helper collects aliases from array-form selects, string-form selects (parsed via chSqlToAliasMap), and expression-form WITH clauses (isSubquery === false); subquery CTEs are correctly excluded. The set is computed and threaded to all five Lucene call sites: main where, filters, per-aggregate aggCondition, valueExpression, and having.
  • Tests: Comprehensive new suites cover alias resolution and the no-match predicate across all call sites, including the saved-search alert pattern (expression-form WITH aliases).

Confidence Score: 5/5

Safe to merge — all Lucene call sites now receive the alias set, and unknown fields consistently resolve to the no-match predicate instead of raw SQL identifiers.

The fix is precise and well-bounded: the serializer fall-through is gated on a Set computed from the chart config's own select aliases and expression WITH clauses. Every Lucene-capable call site in renderChartConfig (where, filters, aggCondition, valueExpression, having) now threads the alias set through. Comprehensive tests cover all paths including the saved-search alert pattern. The null-safety fix for getMaterializedColumnsLookupTable is a strict improvement. No existing behaviour is regressed.

No files require special attention.

Important Files Changed

Filename Overview
packages/common-utils/src/queryParser.ts Adds selectAliases set to CustomSchemaSQLSerializerV2; gates the fall-through identifier on membership in that set; unknown fields now return found: false(1 = 0). Null-safety fix for getMaterializedColumnsLookupTable is clean.
packages/common-utils/src/core/renderChartConfig.ts New extractSelectAliases helper gathers aliases from array-form selects, string-form selects (via chSqlToAliasMap), and expression-form WITH clauses. selectAliases is now threaded into all five Lucene call sites: main where, filters, aggCondition, valueExpression, and having.
packages/common-utils/src/tests/queryParser.test.ts Adds tests for known-alias resolution and unknown-field no-match predicate at the serializer level.
packages/common-utils/src/tests/renderChartConfig.test.ts Comprehensive new test suites covering all Lucene call sites: array-form and string-form select aliases, expression-form WITH aliases, subquery CTE exclusion, filters, aggCondition, valueExpression, and having paths.
.changeset/healthy-pans-grab.md Patch changeset entry for the fix.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["Lucene WHERE field lookup"] --> B{Real column or JSON field?}
    B -- Yes --> C[Return columnExpression with correct type]
    B -- No --> D{field in selectAliases?}
    D -- Yes --> E["Return found:true, columnExpression = field"]
    D -- No --> F["Return found:false → (1 = 0)"]
    G["extractSelectAliases(config)"] --> H[Array-form selects: col.alias]
    G --> I[String-form selects: chSqlToAliasMap]
    G --> J["WITH clauses (isSubquery === false only)"]
    H & I & J --> K["Set passed to CustomSchemaSQLSerializerV2"]
    K --> A
Loading

Fix All in Claude Code Fix All in Conductor Fix All in Cursor Fix All in Codex

Reviews (2): Last reviewed commit: "greptile feedback: other call sites were..." | Re-trigger Greptile

Comment thread packages/common-utils/src/queryParser.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review/tier-2 Low risk — AI review + quick human skim

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants