-
Notifications
You must be signed in to change notification settings - Fork 433
perf(metric-engine)!: Replace mur3 with fxhash for faster TSID generation #7316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
perf(metric-engine): replace mur3 with fxhash for faster TSID generation - Switches from mur3::Hasher128 to fxhash::FxHasher for TSID hashing - Pre-computes label-name hash when no nulls are present, avoiding redundant work - Adds fast-path for rows without nulls; falls back to slow path otherwise - Updates Cargo.toml and lockfile to reflect dependency change Signed-off-by: Lei, HUANG <[email protected]>
fix: only check primary-key labels for null when re-using cached hash - Rename has_null() → has_null_labels() and restrict the check to the primary-key columns so that non-label NULLs do not force a full TSID re-computation. - Update expected hashes in tests to match the new logic. Signed-off-by: Lei, HUANG <[email protected]>
test: add comprehensive TSID generation tests for label ordering and null handling Signed-off-by: Lei, HUANG <[email protected]>
bench: add criterion benchmark for TSID generator - Compare original mur3 vs current fxhash fast/slow paths - Test 2, 5, 10 label sets plus null-value slow path - Add mur3 & criterion dev-deps; register bench target Signed-off-by: Lei, HUANG <[email protected]>
test: stabilize metric-engine tests by fixing non-deterministic row order - Add ORDER BY to SELECTs in TTL tests to ensure consistent output - Update expected __tsid values after hash function change - Swap expected OTLP metric rows to match new ordering Signed-off-by: Lei, HUANG <[email protected]>
refactor: simplify Default impls and remove redundant code - Replace manual Default for TsidGenerator with derive - Remove unnecessary into_iter() call - Simplify Option::unwrap_or_else to unwrap_or Signed-off-by: Lei, HUANG <[email protected]>
d0eb80b to
ac0e314
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR replaces the mur3::Hasher128 with fxhash::FxHasher for TSID (Time Series ID) generation in the metric engine, achieving 5-6x performance improvement through both a faster hash algorithm and a smart fast-path optimization. The change is breaking as TSID values will differ, requiring comprehensive test updates to reflect new hash outputs and ensure deterministic ordering.
Key changes:
- Migrated from 128-bit mur3 hash to 64-bit fxhash for TSID generation
- Introduced fast-path optimization that pre-computes label name hashes when no nulls are present
- Added comprehensive test coverage for TSID generation edge cases and invariants
- Updated all affected tests with ORDER BY clauses and new expected TSID values
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/metric-engine/src/row_modifier.rs |
Core implementation: replaced mur3::Hasher128 with fxhash::FxHasher, added fast/slow path logic, pre-computes label name hash in IterIndex, added comprehensive unit tests for TSID generation invariants |
src/metric-engine/benches/bench_tsid_generator.rs |
New benchmark suite comparing original mur3 vs current fxhash implementation across different label counts and null-handling scenarios |
src/metric-engine/src/engine/put.rs |
Updated test expectations with new TSID values resulting from hash algorithm change |
src/metric-engine/Cargo.toml |
Added fxhash dependency, moved mur3 to dev-dependencies for benchmarking, added benchmark configuration |
Cargo.lock |
Dependency lock file updates for fxhash and criterion additions |
tests/cases/standalone/common/ttl/metric_engine_ttl.sql |
Added ORDER BY clauses to queries for deterministic result ordering |
tests/cases/standalone/common/ttl/metric_engine_ttl.result |
Updated expected results with correct ordering and new TSID values |
tests/cases/standalone/common/ttl/database_ttl_with_metric_engine.sql |
Added ORDER BY clauses to queries for deterministic result ordering |
tests/cases/standalone/common/ttl/database_ttl_with_metric_engine.result |
Updated expected results with correct ordering and new TSID values |
tests/cases/standalone/common/insert/logical_metric_table.result |
Updated expected TSID values and result ordering to reflect new hash algorithm |
tests-integration/tests/region_migration.rs |
Added ORDER BY clauses to test queries for deterministic results |
tests-integration/tests/http.rs |
Updated expected result ordering in HTTP endpoint tests |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
What's changed and what's your intention?
Summary
This PR optimizes TSID (Time Series ID) generation in the metric engine by replacing
mur3::Hasher128withfxhash::FxHasherand introducing a fast-path optimization for rows without null values. The changes result in 5-6x performance improvement for typical use cases.Changes
Hash Algorithm Migration
mur3::Hasher128withfxhash::FxHasherfor TSID hashingmur3dependency, addedfxhashdependencyFast-Path Optimization
Performance Benchmarks
Performance Results
Benchmark results show significant performance improvements:
Technical Details
Fast Path Implementation
When a row has no null label values, the implementation:
Slow Path Implementation
When null values are detected:
Breaking Changes
PR Checklist
Please convert it to a draft if some of the following conditions are not met.