-
Notifications
You must be signed in to change notification settings - Fork 211
ClickHouse Performance Optimizations by Tencent #412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This is excellent, thanks! This PR against the ClickBench repository is similar in spirit as @kitaisreal's Ursa (i.e. a research fork of ClickHouse). If all PRs are being integrated into the main codebase anyways, perhaps we don't need this PR (or can keep it open and continuosly update it for the time being)? |
Thanks for the feedback! I'd actually prefer to have this PR merged into the ClickBench repository for a few reasons:
Interesting — the string layout modification mentioned there is also implemented in ByConity (as BigString). I’ve encountered a similar need when working on the projection index feature (row-level index), where faster row seeking on string columns is critical. I’ll look into whether we can achieve this in a backward-compatible way. |
As long as the results are reproducible, let's merge. |
Yes, it is entirely possible.
|
Thanks, I wanted it for a long time! |
Hmm, I was actually thinking of a different strategy: keep using the same type, but recognize the underlying streams — and if there's a separate size stream, apply a new serde logic accordingly. This behavior would only apply to MergeTree's wide format, which I believe should be sufficient. |
Maybe we can try. Although, having to look up an additional file looks hacky. |
Sure, a different serde in |
I've just merged an additional optimization from my team that addresses the Q23 issue . With this fix, the results should now be fully reproducible without any manual post-processing. I've updated @rschu1ze @nickitat Could you help re-run the benchmarks and update the results on both Thanks a lot! |
This submission builds on top of the latest ClickHouse with a series of performance optimizations, developed with support from Tencent. Each optimization has been carefully validated and is intended to be contributed upstream incrementally through individual PRs—some of which have already been merged.
Benchmark results were generated using artifacts built by the official CI pipeline of #81944, with great help from @nickitat — thank you!
The following optimizations are included:
1. Push TopN threshold to
MergeTreeSource
Pushes the TopN threshold into
MergeTreeSource
to enable early filtering during the read phase. By passing the (N–1)th threshold value from the TopN state, rows below the threshold can be skipped earlier, reducing IO and improving performance.2. Precompute hashes and prefetch for prealloc variants (previous prealloc optimization)
For
ColumnsHashing
implementations that support the prealloc strategy:Also introduced the
optimize_trivial_group_by_limit_query
setting, which appliesmax_rows_to_group_by
for trivialGROUP BY LIMIT
queries to avoid unnecessary aggregation work.3. Extend string hash map with inlined hash
The string hash map is optimized by combining string length and hash into a single 8-byte value. Since most string lengths and CRC32 hashes fit within 4 bytes, combining them:
4. Optimize index analysis with earlier QCC filtering (#82380)
Refactored the integration of Query Condition Cache (QCC) with index analysis:
This notably accelerates short queries when index analysis is the dominant cost.
5. Optimize single
COUNT()
aggregation onNOT NULL
columns (#82104)When an aggregation query only includes a single
COUNT()
on aNOT NULL
column:This reduces memory usage and CPU overhead, significantly speeding up the aggregation.
6. Rewrite regular expression functions into simplified forms (#81992)
Primarily targets Q28. Introduced the
optimize_rewrite_regexp_functions
setting (enabled by default), allowing the optimizer to rewrite certain calls toreplaceRegexpAll
,replaceRegexpOne
, andextract
into simpler and faster forms when specific patterns are detected.Additionally:
count_distinct_optimization
by default with several related edge cases fixed.All these optimizations have been tested and validated via the ClickHouse CI pipeline. Although benchmarked on ClickBench, they were made possible thanks to the extensive support and real-world production environment provided by Tencent (TCHouse-C). I'm continuously working on additional improvements, and will persist in contributing until ClickHouse achieves top-tier performance on ClickBench once more :)