perf(metric-engine)!: Replace mur3 with fxhash for faster TSID generation #7316

v0y4g3r · 2025-11-29T08:35:21Z

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

Summary

This PR optimizes TSID (Time Series ID) generation in the metric engine by replacing mur3::Hasher128 with fxhash::FxHasher and introducing a fast-path optimization for rows without null values. The changes result in 5-6x performance improvement for typical use cases.

Changes

Hash Algorithm Migration

Replaced mur3::Hasher128 with fxhash::FxHasher for TSID hashing
Removed mur3 dependency, added fxhash dependency
Changed hash output from 128-bit to 64-bit (TSID only uses 64 bits)

Fast-Path Optimization

Pre-computes label name hash during index construction when no nulls are present
Fast path: Reuses pre-computed label name hash, only hashes label values
Slow path: Recomputes label name hash when null values are detected (fallback)

Performance Benchmarks

Added criterion benchmark suite comparing:
- Original mur3 implementation
- Current fxhash fast path
- Current fxhash slow path
Benchmarks cover 2, 5, and 10 label scenarios plus null handling

Performance Results

Benchmark results show significant performance improvements:

Scenario	Original (mur3)	Current (fxhash)	Speedup
Small (2 labels)	46.4 ns	8.2 ns	5.6x
Medium (5 labels)	98.2 ns	16.9 ns	5.8x
Large (10 labels)	184.4 ns	35.7 ns	5.2x
Slow Path (with nulls)	60.4 ns	24.3 ns	2.5x

Technical Details

Fast Path Implementation

When a row has no null label values, the implementation:

Uses the pre-computed label name hash (computed once during index construction)
Only hashes the label values
Combines them to generate the final TSID

Slow Path Implementation

When null values are detected:

Recomputes label name hash for non-null labels only
Uses that hash as seed for hashing label values
Generates the final TSID

Breaking Changes

⚠️ TSID values will change due to the hash algorithm switch.

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

I have written the necessary rustdoc comments.
I have added the necessary unit tests and integration tests.
This PR requires documentation updates.
API changes are backward compatible.
Schema or data changes are backward compatible.

perf(metric-engine): replace mur3 with fxhash for faster TSID generation - Switches from mur3::Hasher128 to fxhash::FxHasher for TSID hashing - Pre-computes label-name hash when no nulls are present, avoiding redundant work - Adds fast-path for rows without nulls; falls back to slow path otherwise - Updates Cargo.toml and lockfile to reflect dependency change Signed-off-by: Lei, HUANG <[email protected]>

fix: only check primary-key labels for null when re-using cached hash - Rename has_null() → has_null_labels() and restrict the check to the primary-key columns so that non-label NULLs do not force a full TSID re-computation. - Update expected hashes in tests to match the new logic. Signed-off-by: Lei, HUANG <[email protected]>

test: add comprehensive TSID generation tests for label ordering and null handling Signed-off-by: Lei, HUANG <[email protected]>

bench: add criterion benchmark for TSID generator - Compare original mur3 vs current fxhash fast/slow paths - Test 2, 5, 10 label sets plus null-value slow path - Add mur3 & criterion dev-deps; register bench target Signed-off-by: Lei, HUANG <[email protected]>

test: stabilize metric-engine tests by fixing non-deterministic row order - Add ORDER BY to SELECTs in TTL tests to ensure consistent output - Update expected __tsid values after hash function change - Swap expected OTLP metric rows to match new ordering Signed-off-by: Lei, HUANG <[email protected]>

refactor: simplify Default impls and remove redundant code - Replace manual Default for TsidGenerator with derive - Remove unnecessary into_iter() call - Simplify Option::unwrap_or_else to unwrap_or Signed-off-by: Lei, HUANG <[email protected]>

Copilot

Pull request overview

This PR replaces the mur3::Hasher128 with fxhash::FxHasher for TSID (Time Series ID) generation in the metric engine, achieving 5-6x performance improvement through both a faster hash algorithm and a smart fast-path optimization. The change is breaking as TSID values will differ, requiring comprehensive test updates to reflect new hash outputs and ensure deterministic ordering.

Key changes:

Migrated from 128-bit mur3 hash to 64-bit fxhash for TSID generation
Introduced fast-path optimization that pre-computes label name hashes when no nulls are present
Added comprehensive test coverage for TSID generation edge cases and invariants
Updated all affected tests with ORDER BY clauses and new expected TSID values

Reviewed changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`src/metric-engine/src/row_modifier.rs`	Core implementation: replaced `mur3::Hasher128` with `fxhash::FxHasher`, added fast/slow path logic, pre-computes label name hash in `IterIndex`, added comprehensive unit tests for TSID generation invariants
`src/metric-engine/benches/bench_tsid_generator.rs`	New benchmark suite comparing original mur3 vs current fxhash implementation across different label counts and null-handling scenarios
`src/metric-engine/src/engine/put.rs`	Updated test expectations with new TSID values resulting from hash algorithm change
`src/metric-engine/Cargo.toml`	Added `fxhash` dependency, moved `mur3` to dev-dependencies for benchmarking, added benchmark configuration
`Cargo.lock`	Dependency lock file updates for fxhash and criterion additions
`tests/cases/standalone/common/ttl/metric_engine_ttl.sql`	Added ORDER BY clauses to queries for deterministic result ordering
`tests/cases/standalone/common/ttl/metric_engine_ttl.result`	Updated expected results with correct ordering and new TSID values
`tests/cases/standalone/common/ttl/database_ttl_with_metric_engine.sql`	Added ORDER BY clauses to queries for deterministic result ordering
`tests/cases/standalone/common/ttl/database_ttl_with_metric_engine.result`	Updated expected results with correct ordering and new TSID values
`tests/cases/standalone/common/insert/logical_metric_table.result`	Updated expected TSID values and result ordering to reflect new hash algorithm
`tests-integration/tests/region_migration.rs`	Added ORDER BY clauses to test queries for deterministic results
`tests-integration/tests/http.rs`	Updated expected result ordering in HTTP endpoint tests

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/metric-engine/src/row_modifier.rs

v0y4g3r added 5 commits November 28, 2025 22:29

feat/change-tsid-gen:

32d210c

test: add comprehensive TSID generation tests for label ordering and null handling Signed-off-by: Lei, HUANG <[email protected]>

feat/change-tsid-gen:

ab5e8f2

bench: add criterion benchmark for TSID generator - Compare original mur3 vs current fxhash fast/slow paths - Test 2, 5, 10 label sets plus null-value slow path - Add mur3 & criterion dev-deps; register bench target Signed-off-by: Lei, HUANG <[email protected]>

v0y4g3r requested review from a team, WenyXu and waynexia as code owners November 29, 2025 08:35

github-actions bot added size/M docs-not-required This change does not impact docs. breaking-change This pull request contains breaking changes. labels Nov 29, 2025

feat/change-tsid-gen:

ac0e314

refactor: simplify Default impls and remove redundant code - Replace manual Default for TsidGenerator with derive - Remove unnecessary into_iter() call - Simplify Option::unwrap_or_else to unwrap_or Signed-off-by: Lei, HUANG <[email protected]>

v0y4g3r force-pushed the feat/change-tsid-gen branch from d0eb80b to ac0e314 Compare November 30, 2025 06:09

v0y4g3r mentioned this pull request Dec 1, 2025

Release v1.0.0-beta.2 #7319

Open

3 tasks

WenyXu requested a review from Copilot December 2, 2025 04:29

Copilot started reviewing on behalf of WenyXu December 2, 2025 04:30 View session

Copilot finished reviewing on behalf of WenyXu December 2, 2025 04:33

Copilot AI reviewed Dec 2, 2025

View reviewed changes

src/metric-engine/src/row_modifier.rs Show resolved Hide resolved

src/metric-engine/src/row_modifier.rs Show resolved Hide resolved

WenyXu requested a review from evenyag December 2, 2025 06:11

WenyXu approved these changes Dec 2, 2025

View reviewed changes

evenyag approved these changes Dec 2, 2025

View reviewed changes

WenyXu added this pull request to the merge queue Dec 2, 2025

Merged via the queue into GreptimeTeam:main with commit 931556d Dec 2, 2025
49 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(metric-engine)!: Replace mur3 with fxhash for faster TSID generation #7316

perf(metric-engine)!: Replace mur3 with fxhash for faster TSID generation #7316

Uh oh!

v0y4g3r commented Nov 29, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

perf(metric-engine)!: Replace mur3 with fxhash for faster TSID generation #7316

perf(metric-engine)!: Replace mur3 with fxhash for faster TSID generation #7316

Uh oh!

Conversation

v0y4g3r commented Nov 29, 2025

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

Summary

Changes

Hash Algorithm Migration

Fast-Path Optimization

Performance Benchmarks

Performance Results

Technical Details

Fast Path Implementation

Slow Path Implementation

Breaking Changes

PR Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants