Avoid most allocations in `Canonicalizer`. #52342

nnethercote · 2018-07-13T13:06:09Z

Extra allocations are a significant cost of NLL, and the most common
ones come from within Canonicalizer. In particular, canonical_var()
contains this code:

indices
.entry(kind)
.or_insert_with(|| {
    let cvar1 = variables.push(info);
    let cvar2 = var_values.push(kind);
    assert_eq!(cvar1, cvar2);
    cvar1
})
.clone()

variables and var_values are Vecs. indices is a HashMap used
to track what elements have been inserted into var_values. If kind
hasn't been seen before, indices, variables and var_values all get
a new element. (The number of elements in each container is always the
same.) This results in lots of allocations.

In practice, most of the time these containers only end up holding a few
elements. This PR changes them to avoid heap allocations in the common
case, by changing the Vecs to SmallVecs and only using indices
once enough elements are present. (When the number of elements is small,
a direct linear search of var_values is as good or better than a
hashmap lookup.)

The changes to variables are straightforward and contained within
Canonicalizer. The changes to indices are more complex but also
contained within Canonicalizer. The changes to var_values are more
intrusive because they require defining a new type
SmallCanonicalVarValues -- which is to CanonicalVarValues as
SmallVec is to `Vec -- and passing stack-allocated values of that type
in from outside.

All this speeds up a number of NLL "check" builds, the best by 2%.

r? @nikomatsakis

nnethercote · 2018-07-13T13:08:06Z

Here are the NLL builds with a non-negligible speed-up:

coercions-check
        avg: -2.4%      min: -2.4%      max: -2.4%
sentry-cli-check
        avg: -0.9%      min: -0.9%      max: -0.9%
regex-check
        avg: -0.9%      min: -0.9%      max: -0.9%
webrender-check
        avg: -0.8%      min: -0.8%      max: -0.8%
clap-rs-check
        avg: -0.8%      min: -0.8%      max: -0.8%
cargo-check
        avg: -0.8%      min: -0.8%      max: -0.8%
serde-check
        avg: -0.8%      min: -0.8%      max: -0.8%
ripgrep-check
        avg: -0.7%      min: -0.7%      max: -0.7%
style-servo-check
        avg: -0.7%?     min: -0.7%?     max: -0.7%?
regression-31157-check
        avg: -0.7%      min: -0.7%      max: -0.7%
tokio-webpush-simple-check
        avg: -0.6%      min: -0.6%      max: -0.6% 
encoding-check
        avg: -0.6%      min: -0.6%      max: -0.6%
syn-check
        avg: -0.5%      min: -0.5%      max: -0.5% 
futures-check
        avg: -0.5%      min: -0.5%      max: -0.5%
piston-image-check
        avg: -0.5%      min: -0.5%      max: -0.5%
crates.io-check
        avg: -0.4%      min: -0.4%      max: -0.4%

Mark-Simulacrum · 2018-07-13T13:31:37Z

src/librustc/infer/canonical/canonicalizer.rs

    where
        V: TypeFoldable<'tcx> + Lift<'gcx>,
    {
+        let mut _var_values = SmallVec::new();


Can probably drop the leading underscore here?

Mark-Simulacrum · 2018-07-13T13:33:34Z

src/librustc/infer/canonical/canonicalizer.rs

+                assert_eq!(variables.len(), var_values.len());
+
+                // If `var_values` has become big enough to be heap-allocated,
+                // fill up `indices` to hasten subsequent lookups.


AFAICT, the code will not behave correctly if this prefill is removed -- perhaps we should update the comment to not say "hasten subsequent lookups"? i.e. this isn't an optimization but a requirement

I'll change it to "facilitate"

Mark-Simulacrum · 2018-07-13T13:34:00Z

src/librustc/infer/canonical/canonicalizer.rs

+                // If `var_values` has become big enough to be heap-allocated,
+                // fill up `indices` to hasten subsequent lookups.
+                if !var_values.is_array() {
+                    for (i, &kind) in var_values.iter().enumerate() {


Might as well do an indices.reserve here

I realize that I can use collect here instead.

nnethercote · 2018-07-16T01:15:10Z

New version addresses all the comment.

nikomatsakis · 2018-07-16T11:33:28Z

Nice.

nikomatsakis

Seems like a good change! Got a few questions.

nikomatsakis · 2018-07-16T12:08:42Z

src/librustc/infer/canonical/mod.rs

@@ -74,6 +75,10 @@ pub struct CanonicalVarValues<'tcx> {
    pub var_values: IndexVec<CanonicalVar, Kind<'tcx>>,
 }

+/// Like CanonicalVarValues, but for use in places where a SmallVec is
+/// appropriate.
+pub type SmallCanonicalVarValues<'tcx> = SmallVec<[Kind<'tcx>; 8]>;


Why not just use this everywhere...?

Because CanonicalVarValue derives Clone, Debug, PartialEq, Eq, Hash, RustcDecodable, and RustcEncodable. In contrast, SmallVec doesn't define any of those. Also, I don't know what the impact of possible copying of SmallVecs (which are quite large, in terms of the number of bytes they take up on the stack) in lots of other places.

nikomatsakis · 2018-07-16T12:09:25Z

src/librustc/infer/canonical/canonicalizer.rs

@@ -295,7 +304,8 @@ impl<'cx, 'gcx, 'tcx> Canonicalizer<'cx, 'gcx, 'tcx> {
        infcx: Option<&'cx InferCtxt<'cx, 'gcx, 'tcx>>,
        tcx: TyCtxt<'cx, 'gcx, 'tcx>,
        canonicalize_region_mode: CanonicalizeRegionMode,
-    ) -> (Canonicalized<'gcx, V>, CanonicalVarValues<'tcx>)
+        var_values: &'cx mut SmallCanonicalVarValues<'tcx>
+    ) -> Canonicalized<'gcx, V>


Why not just return this? I guess it's more efficient this way...?

Yes. Copying the SmallCanonicalVars reduces the size of the win by about 20--25%. I figure we need every saving we can get for NLL!

nikomatsakis · 2018-07-16T12:10:28Z

src/librustc/infer/canonical/canonicalizer.rs

+                // fill up `indices` to facilitate subsequent lookups.
+                if !var_values.is_array() {
+                    assert!(indices.is_empty());
+                    ::std::mem::replace(


Why replace and not *indices = ...? Seems simpler.

Extra allocations are a significant cost of NLL, and the most common ones come from within `Canonicalizer`. In particular, `canonical_var()` contains this code: indices .entry(kind) .or_insert_with(|| { let cvar1 = variables.push(info); let cvar2 = var_values.push(kind); assert_eq!(cvar1, cvar2); cvar1 }) .clone() `variables` and `var_values` are `Vec`s. `indices` is a `HashMap` used to track what elements have been inserted into `var_values`. If `kind` hasn't been seen before, `indices`, `variables` and `var_values` all get a new element. (The number of elements in each container is always the same.) This results in lots of allocations. In practice, most of the time these containers only end up holding a few elements. This PR changes them to avoid heap allocations in the common case, by changing the `Vec`s to `SmallVec`s and only using `indices` once enough elements are present. (When the number of elements is small, a direct linear search of `var_values` is as good or better than a hashmap lookup.) The changes to `variables` are straightforward and contained within `Canonicalizer`. The changes to `indices` are more complex but also contained within `Canonicalizer`. The changes to `var_values` are more intrusive because they require defining a new type `SmallCanonicalVarValues` -- which is to `CanonicalVarValues` as `SmallVec` is to `Vec -- and passing stack-allocated values of that type in from outside. All this speeds up a number of NLL "check" builds, the best by 2%.

nnethercote · 2018-07-17T03:49:06Z

Comments have been addressed. r? @nikomatsakis

nikomatsakis · 2018-07-17T20:08:48Z

@bors r+ -- seems good. we can revisit later, as I would like to do an even more aggressive optimization here anyway (#48417)

bors · 2018-07-17T20:08:49Z

📌 Commit 7cc5277 has been approved by nikomatsakis

bors · 2018-07-18T00:46:07Z

⌛ Testing commit 7cc5277 with merge f686885...

@nikomatsakis

Avoid most allocations in `Canonicalizer`. Extra allocations are a significant cost of NLL, and the most common ones come from within `Canonicalizer`. In particular, `canonical_var()` contains this code: indices .entry(kind) .or_insert_with(|| { let cvar1 = variables.push(info); let cvar2 = var_values.push(kind); assert_eq!(cvar1, cvar2); cvar1 }) .clone() `variables` and `var_values` are `Vec`s. `indices` is a `HashMap` used to track what elements have been inserted into `var_values`. If `kind` hasn't been seen before, `indices`, `variables` and `var_values` all get a new element. (The number of elements in each container is always the same.) This results in lots of allocations. In practice, most of the time these containers only end up holding a few elements. This PR changes them to avoid heap allocations in the common case, by changing the `Vec`s to `SmallVec`s and only using `indices` once enough elements are present. (When the number of elements is small, a direct linear search of `var_values` is as good or better than a hashmap lookup.) The changes to `variables` are straightforward and contained within `Canonicalizer`. The changes to `indices` are more complex but also contained within `Canonicalizer`. The changes to `var_values` are more intrusive because they require defining a new type `SmallCanonicalVarValues` -- which is to `CanonicalVarValues` as `SmallVec` is to `Vec -- and passing stack-allocated values of that type in from outside. All this speeds up a number of NLL "check" builds, the best by 2%. r? @nikomatsakis

bors · 2018-07-18T03:05:22Z

☀️ Test successful - status-appveyor, status-travis
Approved by: nikomatsakis
Pushing f686885 to master...

rust-highfive assigned nikomatsakis Jul 13, 2018

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 13, 2018

Mark-Simulacrum reviewed Jul 13, 2018

View reviewed changes

nnethercote force-pushed the CanonicalVar branch from c93eb64 to 4fbdc01 Compare July 16, 2018 01:14

nikomatsakis reviewed Jul 16, 2018

View reviewed changes

nnethercote force-pushed the CanonicalVar branch from 4fbdc01 to 7cc5277 Compare July 17, 2018 03:48

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 17, 2018

bors merged commit 7cc5277 into rust-lang:master Jul 18, 2018

nnethercote deleted the CanonicalVar branch July 18, 2018 04:07

Avoid most allocations in Canonicalizer. #52342

Avoid most allocations in Canonicalizer. #52342

Uh oh!

Conversation

nnethercote commented Jul 13, 2018

Uh oh!

nnethercote commented Jul 13, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nnethercote commented Jul 16, 2018

Uh oh!

nikomatsakis commented Jul 16, 2018

Uh oh!

nikomatsakis left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nnethercote commented Jul 17, 2018

Uh oh!

nikomatsakis commented Jul 17, 2018

Uh oh!

bors commented Jul 17, 2018

Uh oh!

bors commented Jul 18, 2018

Uh oh!

bors commented Jul 18, 2018

Uh oh!

Uh oh!

Avoid most allocations in `Canonicalizer`. #52342

Avoid most allocations in `Canonicalizer`. #52342