Skip to content

Conversation

@v0y4g3r
Copy link
Contributor

@v0y4g3r v0y4g3r commented Nov 29, 2025

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

Summary

This PR optimizes TSID (Time Series ID) generation in the metric engine by replacing mur3::Hasher128 with fxhash::FxHasher and introducing a fast-path optimization for rows without null values. The changes result in 5-6x performance improvement for typical use cases.

Changes

Hash Algorithm Migration

  • Replaced mur3::Hasher128 with fxhash::FxHasher for TSID hashing
  • Removed mur3 dependency, added fxhash dependency
  • Changed hash output from 128-bit to 64-bit (TSID only uses 64 bits)

Fast-Path Optimization

  • Pre-computes label name hash during index construction when no nulls are present
  • Fast path: Reuses pre-computed label name hash, only hashes label values
  • Slow path: Recomputes label name hash when null values are detected (fallback)

Performance Benchmarks

  • Added criterion benchmark suite comparing:
    • Original mur3 implementation
    • Current fxhash fast path
    • Current fxhash slow path
  • Benchmarks cover 2, 5, and 10 label scenarios plus null handling

Performance Results

Benchmark results show significant performance improvements:

Scenario Original (mur3) Current (fxhash) Speedup
Small (2 labels) 46.4 ns 8.2 ns 5.6x
Medium (5 labels) 98.2 ns 16.9 ns 5.8x
Large (10 labels) 184.4 ns 35.7 ns 5.2x
Slow Path (with nulls) 60.4 ns 24.3 ns 2.5x
image

Technical Details

Fast Path Implementation

When a row has no null label values, the implementation:

  1. Uses the pre-computed label name hash (computed once during index construction)
  2. Only hashes the label values
  3. Combines them to generate the final TSID

Slow Path Implementation

When null values are detected:

  1. Recomputes label name hash for non-null labels only
  2. Uses that hash as seed for hashing label values
  3. Generates the final TSID

Breaking Changes

⚠️ TSID values will change due to the hash algorithm switch.

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.
  • API changes are backward compatible.
  • Schema or data changes are backward compatible.

 perf(metric-engine): replace mur3 with fxhash for faster TSID generation

 - Switches from mur3::Hasher128 to fxhash::FxHasher for TSID hashing
 - Pre-computes label-name hash when no nulls are present, avoiding redundant work
 - Adds fast-path for rows without nulls; falls back to slow path otherwise
 - Updates Cargo.toml and lockfile to reflect dependency change

Signed-off-by: Lei, HUANG <[email protected]>
 fix: only check primary-key labels for null when re-using cached hash

 - Rename has_null() → has_null_labels() and restrict the check to the
   primary-key columns so that non-label NULLs do not force a full
   TSID re-computation.
 - Update expected hashes in tests to match the new logic.

Signed-off-by: Lei, HUANG <[email protected]>
 test: add comprehensive TSID generation tests for label ordering and null handling

Signed-off-by: Lei, HUANG <[email protected]>
 bench: add criterion benchmark for TSID generator

 - Compare original mur3 vs current fxhash fast/slow paths
 - Test 2, 5, 10 label sets plus null-value slow path
 - Add mur3 & criterion dev-deps; register bench target

Signed-off-by: Lei, HUANG <[email protected]>
 test: stabilize metric-engine tests by fixing non-deterministic row order

 - Add ORDER BY to SELECTs in TTL tests to ensure consistent output
 - Update expected __tsid values after hash function change
 - Swap expected OTLP metric rows to match new ordering

Signed-off-by: Lei, HUANG <[email protected]>
@v0y4g3r v0y4g3r requested review from a team, WenyXu and waynexia as code owners November 29, 2025 08:35
@github-actions github-actions bot added size/M docs-not-required This change does not impact docs. breaking-change This pull request contains breaking changes. labels Nov 29, 2025
 refactor: simplify Default impls and remove redundant code

 - Replace manual Default for TsidGenerator with derive
 - Remove unnecessary into_iter() call
 - Simplify Option::unwrap_or_else to unwrap_or

Signed-off-by: Lei, HUANG <[email protected]>
@v0y4g3r v0y4g3r force-pushed the feat/change-tsid-gen branch from d0eb80b to ac0e314 Compare November 30, 2025 06:09
@v0y4g3r v0y4g3r mentioned this pull request Dec 1, 2025
3 tasks
@WenyXu WenyXu requested a review from Copilot December 2, 2025 04:29
Copilot finished reviewing on behalf of WenyXu December 2, 2025 04:33
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces the mur3::Hasher128 with fxhash::FxHasher for TSID (Time Series ID) generation in the metric engine, achieving 5-6x performance improvement through both a faster hash algorithm and a smart fast-path optimization. The change is breaking as TSID values will differ, requiring comprehensive test updates to reflect new hash outputs and ensure deterministic ordering.

Key changes:

  • Migrated from 128-bit mur3 hash to 64-bit fxhash for TSID generation
  • Introduced fast-path optimization that pre-computes label name hashes when no nulls are present
  • Added comprehensive test coverage for TSID generation edge cases and invariants
  • Updated all affected tests with ORDER BY clauses and new expected TSID values

Reviewed changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/metric-engine/src/row_modifier.rs Core implementation: replaced mur3::Hasher128 with fxhash::FxHasher, added fast/slow path logic, pre-computes label name hash in IterIndex, added comprehensive unit tests for TSID generation invariants
src/metric-engine/benches/bench_tsid_generator.rs New benchmark suite comparing original mur3 vs current fxhash implementation across different label counts and null-handling scenarios
src/metric-engine/src/engine/put.rs Updated test expectations with new TSID values resulting from hash algorithm change
src/metric-engine/Cargo.toml Added fxhash dependency, moved mur3 to dev-dependencies for benchmarking, added benchmark configuration
Cargo.lock Dependency lock file updates for fxhash and criterion additions
tests/cases/standalone/common/ttl/metric_engine_ttl.sql Added ORDER BY clauses to queries for deterministic result ordering
tests/cases/standalone/common/ttl/metric_engine_ttl.result Updated expected results with correct ordering and new TSID values
tests/cases/standalone/common/ttl/database_ttl_with_metric_engine.sql Added ORDER BY clauses to queries for deterministic result ordering
tests/cases/standalone/common/ttl/database_ttl_with_metric_engine.result Updated expected results with correct ordering and new TSID values
tests/cases/standalone/common/insert/logical_metric_table.result Updated expected TSID values and result ordering to reflect new hash algorithm
tests-integration/tests/region_migration.rs Added ORDER BY clauses to test queries for deterministic results
tests-integration/tests/http.rs Updated expected result ordering in HTTP endpoint tests

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@WenyXu WenyXu requested a review from evenyag December 2, 2025 06:11
@WenyXu WenyXu added this pull request to the merge queue Dec 2, 2025
Merged via the queue into GreptimeTeam:main with commit 931556d Dec 2, 2025
49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking-change This pull request contains breaking changes. docs-not-required This change does not impact docs. size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants