-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Hot shard design documentation #12413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr on Linux RHEL 9
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great!
I think it's also worth documenting the max shard size is dynamically calculated in
int64_t getMaxShardSize(double dbSizeEstimate) {
int64_t size = std::min((SERVER_KNOBS->MIN_SHARD_BYTES + (int64_t)std::sqrt(std::max<double>(dbSizeEstimate, 0)) *
SERVER_KNOBS->SHARD_BYTES_PER_SQRT_BYTES) *
SERVER_KNOBS->SHARD_BYTES_RATIO,
(int64_t)SERVER_KNOBS->MAX_SHARD_BYTES);
i.e., a formula of (MIN_SHARD_BYTES + sqrt(DB_size) * SHARD_BYTES_PER_SQRT_BYTES) * SHARD_BYTES_RATIO
, and then max'ed at MAX_SHARD_BYTES
.
|
||
A shard is "hot" when it absorbs a disproportionate share of the cluster's read or write workload, driving CPU or bandwidth saturation on its replica servers [6]. | ||
|
||
Storage servers continually sample bytes-read, ops-read, and shard sizes; a shard whose read bandwidth density exceeds configured thresholds is tagged as read-hot so the distributor can identify the offending range [4]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also write hot. Right now, only write-hot shard triggers split.
- Randomize key prefixes so consecutive writes land on different shard ranges; for example, hash user IDs or add a short random salt before the natural key. This way inserts will scatter instead of piling onto one shard. | ||
- If you need to store a counter, consider sharding them across N disjoint keys (e.g., counter/<bucket>/…) and aggregate in clients or background jobs; this keeps the per-key mutation rate below the commit proxy’s hot-shard throttle. | ||
- If you are storing append-only logs, split them into multiple partitions (such as log/<partition>/<ts>), rotating partitions over time rather than funneling through a single key path. | ||
- Avoid “read-modify-write” cycles. Use FDB's atomic operations (like ADD) when possible, and throttle/queue work in clients so they don’t stampede on that hot key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/ADD/ATOMIC_ADD/
- Master switch to enable/disable hot shard throttling at commit proxies | ||
- When enabled, commit proxies track hot shards and reject transactions writing to them | ||
- Disabled by default as the feature is experimental | ||
- Location: fdbclient/ServerKnobs.cpp:999 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can delete this, since the actual line number will likely be changed. Or use a permanent link instead.
**`HOT_SHARD_THROTTLING_EXPIRE_AFTER`** (double, default: `3.0` seconds) | ||
- Duration after which a throttled hot shard expires and is removed from the throttle list | ||
- Prevents indefinite throttling if load decreases | ||
- Location: fdbclient/ServerKnobs.cpp:1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
**`HOT_SHARD_THROTTLING_TRACKED`** (int64_t, default: `1`) | ||
- Maximum number of hot shards to track and throttle per storage server | ||
- Limits the size of the hot shard list to prevent excessive memory usage | ||
- Location: fdbclient/ServerKnobs.cpp:1001 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
**`HOT_SHARD_MONITOR_FREQUENCY`** (double, default: `5.0` seconds) | ||
- How often Ratekeeper queries storage servers for hot shard information | ||
- Lower values provide faster hot shard detection but increase RPC overhead | ||
- Location: fdbclient/ServerKnobs.cpp:1002 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
Hot shard documentation
[X]explains what a hot shard is
[X] - explains how to avoid a hot shard (e.g. structure data and/or access patterns in a certain way)
[ ] - explains the server management side options for fixing a hot shard, should it arise. Please emphasize that the options are rather bad / near non-existent and hot shards should be avoided by design for this reason
Code-Reviewer Section
The general pull request guidelines can be found here.
Please check each of the following things and check all boxes before accepting a PR.
For Release-Branches
If this PR is made against a release-branch, please also check the following:
release-branch
ormain
if this is the youngest branch)