Skip to content

feat: use text index to power filters and autocomplete#2376

Open
knudtty wants to merge 16 commits into
mainfrom
aaron/fts-index-as-aggregate-view
Open

feat: use text index to power filters and autocomplete#2376
knudtty wants to merge 16 commits into
mainfrom
aaron/fts-index-as-aggregate-view

Conversation

@knudtty

@knudtty knudtty commented May 29, 2026

Copy link
Copy Markdown
Contributor

Summary

The new text indexes can power filters and autocomplete and ease the metadataMVs. So let's do it!

References

  • Linear Issue: Closes HDX-4229

@changeset-bot

changeset-bot Bot commented May 29, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: 1181d48

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@vercel

vercel Bot commented May 29, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hyperdx-oss Ready Ready Preview, Comment Jun 9, 2026 9:52pm
hyperdx-storybook Ready Ready Preview, Comment Jun 9, 2026 9:52pm

Request Review

@github-actions github-actions Bot added the review/tier-4 Critical — deep review + domain expert sign-off label May 29, 2026
@github-actions

github-actions Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

🔴 Tier 4 — Critical

Touches auth, data models, config, tasks, OTel pipeline, ClickHouse, or CI/CD.

Why this tier:

  • Large diff: 1174 production lines changed (threshold: 1000)

Review process: Deep review from a domain expert. Synchronous walkthrough may be required.
SLA: Schedule synchronous review within 2 business days.

Stats
  • Production files changed: 4
  • Production lines changed: 1174 (+ 981 in test files, excluded from tier calculation)
  • Branch: aaron/fts-index-as-aggregate-view
  • Author: knudtty

To override this classification, remove the review/tier-4 label and apply a different review/tier-* label. Manual overrides are preserved on subsequent pushes.

@github-actions

github-actions Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

E2E Test Results

All tests passed • 191 passed • 3 skipped • 1310s

Status Count
✅ Passed 191
❌ Failed 0
⚠️ Flaky 4
⏭️ Skipped 3

Tests ran across 4 shards in parallel.

View full report →

@github-actions

github-actions Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

Deep Review

🔴 P0/P1 — must fix

  • packages/common-utils/src/core/metadata.ts:498getMapKeys text-index queries have no try/catch, so a transient error from mergeTreeTextIndex propagates out instead of falling through to the MV/scan tiers as the sibling getMapValues cascade does.
    • Fix: Wrap the keysIndex block (lines ~498-518) and the itemsIndex block (lines ~519-539) in try/catch with console.warn, mirroring the pattern in getMapValues at lines 819-865.
    • correctness, reliability
  • packages/common-utils/src/core/metadata.ts:955getMapValues main-table-scan fallback uses Promise.all(keys.map(...)); one rejected per-key query discards every successful sibling result, concentrating blast radius vs. the prior per-key API.
    • Fix: Replace Promise.all with Promise.allSettled (or wrap each per-key body in try/catch returning []) and aggregate the successful entries.
    • adversarial, performance, reliability

🟡 P2 — recommended

  • packages/common-utils/src/core/metadata.ts:786getMapValues no longer wraps the query path in cache.getOrFetch; every call re-runs the full cascade and the fetchPerKeyFallback paths at lines 1670/1815 issue uncached ClickHouse queries on repeat.
    • Fix: Restore a result-level cache.getOrFetch keyed on (connectionId, db, table, column, sorted-keys, dateRange, timestampValueExpression), or cache inside the fallback paths.
    • correctness, kieran-typescript, performance, reliability
  • packages/common-utils/src/core/metadata.ts:847 — Text-index queries (this site and getMapKeys at lines ~505 and ~526) pass this.getClickHouseSettings() with no max_execution_time/timeout_overflow_mode, while the MV branch bounds execution at 15s; a slow mergeTreeTextIndex scan stalls to the server default before cascading.
    • Fix: Apply {...this.getClickHouseSettings(), timeout_overflow_mode: 'break', max_execution_time: 15, max_rows_to_read: '0'} consistent with the MV branch.
    • performance, reliability
  • packages/common-utils/src/core/metadata.ts:422partsOverlapFilter returns chSql\1`whentimestampValueExpressionis absent; theuseMetadata.tsx -> getAllFieldsAndValues -> getMapValueschain never threadstimestampValueExpression, so text-index queries scan ALL parts even when the caller passed a narrow dateRange`.
    • Fix: Thread timestampValueExpression through getAllFieldsAndValues -> getMapValues, or have partsOverlapFilter resolve it from source metadata when not supplied.
    • adversarial, correctness
  • packages/common-utils/src/core/metadata.ts:835 — Text-index split uses position(token, sep), which returns the FIRST occurrence; a map whose KEY contains the separator literal (e.g., key a=b with separator =) parses as key a / value b=c while the OR-chain startsWith still matches the row, returning corrupted pairs only on the text-index path.
    • Fix: Compute the split offset from each prefixed key's known length (the OR-chain already constrains the prefix), or validate at index-discovery time that map keys cannot contain the separator.
    • adversarial, correctness
  • packages/common-utils/src/core/metadata.ts:515getMapKeys text-index and rollup branches use cache.get + manual cache.set instead of cache.getOrFetch, losing the pendingQueries dedup that previously joined concurrent identical fetches; two simultaneous autocomplete callers now fire identical mergeTreeTextIndex queries.
    • Fix: Wrap each branch's query in await this.cache.getOrFetch(cacheKey, async () => {...}).
    • performance
  • packages/common-utils/src/core/metadata.ts:863 — New catch blocks at lines 578, 863, 922 blanket-swallow ALL errors as console.warn, including non-recoverable failures (auth, syntax, abort); the user-facing error eventually surfaces from the final tier and the root cause is invisible, doubling time-to-error.
    • Fix: Re-throw on AbortError and on non-transient ClickHouse error codes (ACCESS_DENIED, SYNTAX_ERROR), or attach an aggregated tier-failure diagnostic to the final thrown error.
    • reliability
🔵 P3 nitpicks (19)
  • packages/common-utils/src/queryParser.ts:19 — Re-export statements interleaved between two import blocks trip the project's simple-import-sort/imports: 'error' lint rule and the pre-commit hook will reformat.
    • Fix: Move export type { KvItemsInfo, KvItemsLookup } and export { parseKvItems... } below all imports.
    • maintainability, project-standards
  • packages/common-utils/src/core/metadata.ts:440getMapKeys does not accept or thread AbortSignal, while getMapValues was updated to accept one; aborts from autocomplete UI leave getMapKeys queries running server-side.
    • Fix: Add signal?: AbortSignal to getMapKeys and thread it into all three clickhouseClient.query calls.
    • reliability
  • packages/common-utils/src/core/metadata.ts:1720 — The new fetchPerKeyFallback repeats the spread-after-defaults clickhouse_settings pattern, so user settings silently override the in-function safety caps (same shape as the pre-existing site at line 951).
    • Fix: Put ...this.getClickHouseSettings() first, then the explicit max_rows_to_read and read_overflow_mode overrides last.
    • correctness
  • packages/api/src/tasks/checkAlerts/__tests__/renderAlertTemplate.test.ts:36 — Mock getMapValues: jest.fn().mockResolvedValue([]) returns a bare array, but the new production signature returns Map<string, string[]>; latent landmine for any future test that exercises a path consuming .get().
    • Fix: Change both renderAlertTemplate.test.ts:36 and checkAlerts.test.ts:1109 to mockResolvedValue(new Map()).
    • adversarial, api-contract, correctness, kieran-typescript, testing
  • packages/common-utils/src/core/kvItems.ts:241 — Two parallel dispatch shapes (KV_ITEMS_STRATEGIES array iterated inline in queryParser.ts, plus the parseKvItemsDefaultExpression dispatcher used by metadata.ts) duplicate the same strategy loop.
    • Fix: Drop the inline loop in queryParser.ts and call parseKvItemsDefaultExpression, or stop exporting one of the two shapes.
    • maintainability
  • packages/common-utils/src/core/metadata.ts:375 — Local norm = (s) => s.replace(/\s+/g, '').replace(/\/g, '')is byte-identical tonormalizeChExpressioninqueryParser.ts`; both expressions must stay in sync forever.
    • Fix: Hoist a single normalizeClickHouseExpression into core/utils.ts and import from both modules.
    • maintainability
  • packages/common-utils/src/core/metadata.ts:460getMapKeys cache key dropped the alignedDateRange MV branch; two callers in the same MV bucket no longer share cache entries.
    • Fix: Restore the aligned-bucket variant for the MV path or document the deliberate change.
    • correctness, maintainability
  • packages/common-utils/src/core/metadata.ts:582 — When getMapKeys text-index returns empty and the MV path succeeds, the MV value is cached under a shared key; a later call after the text index catches up still returns the cached MV result.
    • Fix: Distinguish cache entries by source path, or shorten TTL for fallback-tier results.
  • packages/common-utils/src/core/metadata.ts:1895 — Text-index cache key in getAllFieldsAndValues uses raw dateRange[0]/[1] while the MV cache key uses alignedDateRange; equivalent callers may double-cache identical results.
    • Fix: Use alignedDateRange for the text-index cache key, or document the divergence.
    • correctness, maintainability
  • packages/common-utils/src/core/metadata.ts:951settings spread ...this.getClickHouseSettings() after explicit max_rows_to_read/read_overflow_mode overrides the safety caps (pre-existing pattern, restated for parity with the new sibling at 1720).
    • Fix: Put the spread first, then the explicit overrides last.
    • correctness, kieran-typescript
  • packages/common-utils/src/core/metadata.ts:397getMapColumnTextIndexes silently drops a skip index when an alias matches by expression but its default_expression isn't a recognized KV-items shape.
    • Fix: Log a debug warning, or record a keys-only entry as a fallback.
    • correctness
  • packages/common-utils/src/core/metadata.ts:1947excludedIds.sort() mutates the array in place inside a cache-key template literal with no explanatory comment.
    • Fix: Use [...excludedIds].sort() and add // sort for deterministic cache key.
    • maintainability
  • packages/common-utils/src/core/metadata.ts:355getMapColumnTextIndexes is exposed publicly (no private modifier); if only intended for internal use plus tests, narrow the surface.
    • Fix: Mark private; tests can still spy via jest.spyOn(Metadata.prototype as any, 'getMapColumnTextIndexes').
    • api-contract
  • packages/common-utils/src/core/metadata.ts:134MapColumnTextIndexes allows the {} empty state because both fields are optional.
    • Fix: Tighten to a discriminated union requiring at least one of keysIndex or itemsIndex.
    • kieran-typescript, maintainability
  • packages/common-utils/src/core/kvItems.ts:71 — Tokenizer rejects backtick-quoted ClickHouse identifiers (e.g. `LogAttributes`), silently disabling text-index detection for those schemas.
    • Fix: Add a backtick-identifier branch to tokenizeExpression that reads `...` as a single identifier token and strips the backticks.
    • adversarial
  • packages/common-utils/src/core/metadata.ts:499 — Text-index keys-only path has no ORDER BY; the MV path ranks by sum(count) DESC, so autocomplete shows arbitrary (not popular) keys after this PR.
    • Fix: Accept the documented divergence or prefer the MV path when both are available for ranking quality.
    • adversarial, performance
  • packages/common-utils/src/__tests__/metadata.test.ts:1580 — Tests named "no cache poisoning" only verify a single invocation; a second-call probe is needed to prove the cache wasn't actually written.
    • Fix: After the first await, issue an identical call and assert that the underlying ClickHouse query fires again.
    • testing
  • packages/common-utils/src/__tests__/metadata.test.ts:2348 — Items-index .each table only covers default_type === 'ALIAS'; the MATERIALIZED predicate branch is uncovered.
    • Fix: Add a MATERIALIZED row to the parameterized table.
    • testing
  • packages/common-utils/src/__tests__/metadata.test.ts:2265(prototype.method as jest.Mock).mockRestore() + finally re-spy pattern leaks if finally doesn't run; couples test ordering to a global beforeAll spy shape.
    • Fix: Extract a withRealGetMapColumnTextIndexes(md, fn) helper, or scope the un-stubbing via beforeEach/afterEach inside the describe block.
    • maintainability, testing

Reviewers (7): adversarial, api-contract, correctness, kieran-typescript, maintainability, performance, project-standards, reliability, testing

Testing gaps:

  • No test covers a text-index tier failure in getMapKeys falling through to the MV/scan tiers (the cascade contract is not enforced by tests today).
  • No test exercises getMapValues Promise.all partial-failure semantics — one per-key query failing alongside successes.
  • No test covers getMapValues text-index with a multi-character separator (position(...) + separator.length arithmetic is untested for non-1 lengths).
  • No test for getMapValues({ keys: [] }) early-return or for getMapColumnTextIndexes engine === 'Distributed' early-return.
  • No test asserts AbortSignal propagation through getMapKeys (also blocked on threading the parameter through).

@greptile-apps

greptile-apps Bot commented Jun 8, 2026

Copy link
Copy Markdown

Greptile Summary

This PR wires the new mergeTreeTextIndex table function (CH ≥ 26.3) into the metadata layer so that filters and autocomplete can be served from text skip-indices instead of — or before — the kv rollup materialized views. It also extracts the KV items expression parser into a shared kvItems.ts module and batches getMapValues to accept multiple keys at once.

  • getMapColumnTextIndexes introspects system.data_skipping_indices to discover text(tokenizer=array) indices for map columns, guarding against Distributed tables and CH < 26.3 versions.
  • getMapKeys / getMapValues / getAllKeyValues / getAllFieldsAndValues each gain a text-index fast path that tries mergeTreeTextIndex(…) first, then falls through to the MV rollup, then to a main-table scan.
  • The kv rollup MV SQL drops the three attribute-map UNION ALL branches (ResourceAttributes, LogAttributes, ScopeAttributes), relying on the text index path to cover those columns on CH ≥ 26.3.

Confidence Score: 4/5

The text-index fast path and the three-level cascade (text-index → MV → scan) are architecturally sound, but open items from earlier review rounds — the incomplete time-range overlap predicate in partsOverlapFilter, the absent caching in the new getMapValues, and the removal of attribute-map branches from the MV for fresh CH < 26.3 deployments — should be resolved before merging.

The cascade fallback logic and the new getMapColumnTextIndexes guard (version check + Distributed check) are correct. The main concerns that remain from the review history involve the system.parts overlap filter missing the 'part fully contains query range' case, which causes the text-index path to silently under-read on any query whose window is fully bracketed by a single large part, and the removal of rollup coverage for attribute map columns on CH < 26.3 deployments where no text-index path exists.

packages/common-utils/src/core/metadata.ts — the partsOverlapFilter predicate and the absence of caching in getMapValues; docker/otel-collector/schema/seed/00006_otel_logs_rollups.sql — the MV now omits attribute map columns, which matters for fresh deployments below CH 26.3.

Important Files Changed

Filename Overview
docker/otel-collector/schema/seed/00006_otel_logs_rollups.sql Removes ResourceAttributes, LogAttributes, and ScopeAttributes from the kv rollup MV; only NativeColumn entries remain. The IF NOT EXISTS guard means existing deployments keep the old MV, but fresh deployments on CH < 26.3 lose rollup coverage for attribute map columns entirely.
packages/common-utils/src/core/metadata.ts Adds getMapColumnTextIndexes, partsOverlapFilter, and text-index paths to getMapKeys, getMapValues, getAllKeyValues, and getAllFieldsAndValues. Previous review threads identified the partsOverlapFilter overlap predicate as incorrect and getMapValues as having no caching. Additionally, fetchPerKeyFallback redundantly re-issues text-index and MV queries for keys already proven empty.
packages/common-utils/src/core/kvItems.ts New module housing the expression parser previously inlined in queryParser.ts. Both parseKvItemsExpression (inline-cast :: form) and parseKvItemsCastExpression (CAST function form) are correctly tokenized and validated; boundary checks and pos === tokens.length guards look solid.
packages/common-utils/src/tests/metadata.test.ts Adds ~1,000 lines of tests covering all new text-index code paths. Tests correctly use system.parts/min_time/max_time assertions matching the implementation. The getMapValues API is updated from a single key to a keys array returning Map<string, string[]>.
packages/common-utils/src/core/clickhouseVersion.ts Adds supportsMergeTreeTextIndex gated at CH 26.3.0.0; follows the exact same pattern as the existing supportsDirectReadMap. Tests validate both version boundaries and the undefined guard.
packages/common-utils/src/queryParser.ts Removes the inlined KvItemsInfo, KvItemsLookup, tokenizer, and parser code (now in kvItems.ts) and replaces them with imports and re-exports. The re-export block appears between two import groups which is unconventional but compiles correctly.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[getMapKeys / getMapValues / getAllKeyValues] --> B{getMapColumnTextIndexes\nCH >= 26.3 & local table?}
    B -- No index / CH < 26.3 --> D{metadataMVs\n& dateRange?}
    B -- keysIndex exists --> C1[mergeTreeTextIndex\ntoken AS key]
    B -- itemsIndex exists --> C2[mergeTreeTextIndex\nsplitByString sep token]
    C1 -- empty --> C2
    C2 -- empty --> D
    C1 -- results --> Z[Return results]
    C2 -- results --> Z
    D -- yes --> E[kvRollupTable\nfiltered by bucket]
    D -- no --> F[Main-table scan\nSELECT DISTINCT col key]
    E -- empty --> F
    E -- results --> Z
    F --> Z
Loading

Fix All in Claude Code Fix All in Conductor Fix All in Cursor Fix All in Codex

Reviews (4): Last reviewed commit: "fix fallback behavior" | Re-trigger Greptile

Comment thread packages/common-utils/src/core/metadata.ts
Comment thread packages/common-utils/src/core/metadata.ts Outdated
Comment thread packages/common-utils/src/core/metadata.ts
Comment thread docker/otel-collector/schema/seed/00006_otel_logs_rollups.sql
Comment thread packages/common-utils/src/core/metadata.ts
Comment thread packages/common-utils/src/core/metadata.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review/tier-4 Critical — deep review + domain expert sign-off

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant