Skip to content

Commit d78a7fc

Browse files
authored
Rollup merge of rust-lang#39988 - arthurprs:hm-adapt2, r=alexcrichton
Simplify/fix adaptive hashmap Please see rust-lang#38368 (comment) for context. The shift length math is broken. It turns out that checking for the shift length is complicated. Using simulations it's possible to see that a value of 2000 will only get probabilities down to ~1e-7 when the hashmap load factor is 90% (rust goes up to 90.9% as of today). That's probably not good enough to go into the stdlib with pluggable hashers. So this PR simplify the adaptive behavior to only consider displacement, which is much safer and very useful by itself. There's two comments because one of them is already being tested to be merged by bors.
2 parents 33c1912 + 25b1488 commit d78a7fc

File tree

1 file changed

+11
-23
lines changed
  • src/libstd/collections/hash

1 file changed

+11
-23
lines changed

src/libstd/collections/hash/map.rs

+11-23
Original file line numberDiff line numberDiff line change
@@ -182,46 +182,37 @@ impl DefaultResizePolicy {
182182
// ----------------------
183183
// To protect against degenerate performance scenarios (including DOS attacks),
184184
// the implementation includes an adaptive behavior that can resize the map
185-
// early (before its capacity is exceeded) when suspiciously long probe or
186-
// forward shifts sequences are encountered.
185+
// early (before its capacity is exceeded) when suspiciously long probe sequences
186+
// are encountered.
187187
//
188188
// With this algorithm in place it would be possible to turn a CPU attack into
189189
// a memory attack due to the aggressive resizing. To prevent that the
190-
// adaptive behavior only triggers when the map occupancy is half the maximum occupancy.
190+
// adaptive behavior only triggers when the map is at least half full.
191191
// This reduces the effectiveness of the algorithm but also makes it completely safe.
192192
//
193193
// The previous safety measure also prevents degenerate interactions with
194194
// really bad quality hash algorithms that can make normal inputs look like a
195195
// DOS attack.
196196
//
197197
const DISPLACEMENT_THRESHOLD: usize = 128;
198-
const FORWARD_SHIFT_THRESHOLD: usize = 512;
199198
//
200-
// The thresholds of 128 and 512 are chosen to minimize the chance of exceeding them.
199+
// The threshold of 128 is chosen to minimize the chance of exceeding it.
201200
// In particular, we want that chance to be less than 10^-8 with a load of 90%.
202201
// For displacement, the smallest constant that fits our needs is 90,
203-
// so we round that up to 128. For the number of forward-shifted buckets,
204-
// we choose k=512. Keep in mind that the run length is a sum of the displacement and
205-
// the number of forward-shifted buckets, so its threshold is 128+512=640.
206-
// Even though the probability of having a run length of more than 640 buckets may be
207-
// higher than the probability we want, it should be low enough.
202+
// so we round that up to 128.
208203
//
209204
// At a load factor of α, the odds of finding the target bucket after exactly n
210205
// unsuccesful probes[1] are
211206
//
212207
// Pr_α{displacement = n} =
213208
// (1 - α) / α * ∑_{k≥1} e^(-kα) * (kα)^(k+n) / (k + n)! * (1 - kα / (k + n + 1))
214209
//
215-
// We use this formula to find the probability of loading half of triggering the adaptive behavior
210+
// We use this formula to find the probability of triggering the adaptive behavior
216211
//
217212
// Pr_0.909{displacement > 128} = 1.601 * 10^-11
218213
//
219-
// FIXME: Extend with math for shift threshold in [2]
220-
//
221214
// 1. Alfredo Viola (2005). Distributional analysis of Robin Hood linear probing
222215
// hashing with buckets.
223-
// 2. http://www.cs.tau.ac.il/~zwick/Adv-Alg-2015/Linear-Probing.pdf
224-
225216

226217
/// A hash map implementation which uses linear probing with Robin Hood bucket
227218
/// stealing.
@@ -494,7 +485,7 @@ fn robin_hood<'a, K: 'a, V: 'a>(bucket: FullBucketMut<'a, K, V>,
494485
mut hash: SafeHash,
495486
mut key: K,
496487
mut val: V)
497-
-> (usize, &'a mut V) {
488+
-> &'a mut V {
498489
let start_index = bucket.index();
499490
let size = bucket.table().size();
500491
// Save the *starting point*.
@@ -519,15 +510,14 @@ fn robin_hood<'a, K: 'a, V: 'a>(bucket: FullBucketMut<'a, K, V>,
519510
Empty(bucket) => {
520511
// Found a hole!
521512
let bucket = bucket.put(hash, key, val);
522-
let end_index = bucket.index();
523513
// Now that it's stolen, just read the value's pointer
524514
// right out of the table! Go back to the *starting point*.
525515
//
526516
// This use of `into_table` is misleading. It turns the
527517
// bucket, which is a FullBucket on top of a
528518
// FullBucketMut, into just one FullBucketMut. The "table"
529519
// refers to the inner FullBucketMut in this context.
530-
return (end_index - start_index, bucket.into_table().into_mut_refs().1);
520+
return bucket.into_table().into_mut_refs().1;
531521
}
532522
Full(bucket) => bucket,
533523
};
@@ -2128,18 +2118,16 @@ impl<'a, K: 'a, V: 'a> VacantEntry<'a, K, V> {
21282118
pub fn insert(self, value: V) -> &'a mut V {
21292119
match self.elem {
21302120
NeqElem(bucket, disp) => {
2131-
let (shift, v_ref) = robin_hood(bucket, disp, self.hash, self.key, value);
2132-
if disp >= DISPLACEMENT_THRESHOLD || shift >= FORWARD_SHIFT_THRESHOLD {
2121+
if disp >= DISPLACEMENT_THRESHOLD {
21332122
*self.long_probes = true;
21342123
}
2135-
v_ref
2124+
robin_hood(bucket, disp, self.hash, self.key, value)
21362125
},
21372126
NoElem(bucket, disp) => {
21382127
if disp >= DISPLACEMENT_THRESHOLD {
21392128
*self.long_probes = true;
21402129
}
2141-
let bucket = bucket.put(self.hash, self.key, value);
2142-
bucket.into_mut_refs().1
2130+
bucket.put(self.hash, self.key, value).into_mut_refs().1
21432131
},
21442132
}
21452133
}

0 commit comments

Comments
 (0)