Skip to content

Avoid most allocations in Canonicalizer. #52342

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 18, 2018

Conversation

nnethercote
Copy link
Contributor

Extra allocations are a significant cost of NLL, and the most common
ones come from within Canonicalizer. In particular, canonical_var()
contains this code:

indices
.entry(kind)
.or_insert_with(|| {
    let cvar1 = variables.push(info);
    let cvar2 = var_values.push(kind);
    assert_eq!(cvar1, cvar2);
    cvar1
})
.clone()

variables and var_values are Vecs. indices is a HashMap used
to track what elements have been inserted into var_values. If kind
hasn't been seen before, indices, variables and var_values all get
a new element. (The number of elements in each container is always the
same.) This results in lots of allocations.

In practice, most of the time these containers only end up holding a few
elements. This PR changes them to avoid heap allocations in the common
case, by changing the Vecs to SmallVecs and only using indices
once enough elements are present. (When the number of elements is small,
a direct linear search of var_values is as good or better than a
hashmap lookup.)

The changes to variables are straightforward and contained within
Canonicalizer. The changes to indices are more complex but also
contained within Canonicalizer. The changes to var_values are more
intrusive because they require defining a new type
SmallCanonicalVarValues -- which is to CanonicalVarValues as
SmallVec is to `Vec -- and passing stack-allocated values of that type
in from outside.

All this speeds up a number of NLL "check" builds, the best by 2%.

r? @nikomatsakis

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 13, 2018
@nnethercote
Copy link
Contributor Author

Here are the NLL builds with a non-negligible speed-up:

coercions-check
        avg: -2.4%      min: -2.4%      max: -2.4%
sentry-cli-check
        avg: -0.9%      min: -0.9%      max: -0.9%
regex-check
        avg: -0.9%      min: -0.9%      max: -0.9%
webrender-check
        avg: -0.8%      min: -0.8%      max: -0.8%
clap-rs-check
        avg: -0.8%      min: -0.8%      max: -0.8%
cargo-check
        avg: -0.8%      min: -0.8%      max: -0.8%
serde-check
        avg: -0.8%      min: -0.8%      max: -0.8%
ripgrep-check
        avg: -0.7%      min: -0.7%      max: -0.7%
style-servo-check
        avg: -0.7%?     min: -0.7%?     max: -0.7%?
regression-31157-check
        avg: -0.7%      min: -0.7%      max: -0.7%
tokio-webpush-simple-check
        avg: -0.6%      min: -0.6%      max: -0.6% 
encoding-check
        avg: -0.6%      min: -0.6%      max: -0.6%
syn-check
        avg: -0.5%      min: -0.5%      max: -0.5% 
futures-check
        avg: -0.5%      min: -0.5%      max: -0.5%
piston-image-check
        avg: -0.5%      min: -0.5%      max: -0.5%
crates.io-check
        avg: -0.4%      min: -0.4%      max: -0.4%

where
V: TypeFoldable<'tcx> + Lift<'gcx>,
{
let mut _var_values = SmallVec::new();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can probably drop the leading underscore here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

assert_eq!(variables.len(), var_values.len());

// If `var_values` has become big enough to be heap-allocated,
// fill up `indices` to hasten subsequent lookups.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, the code will not behave correctly if this prefill is removed -- perhaps we should update the comment to not say "hasten subsequent lookups"? i.e. this isn't an optimization but a requirement

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change it to "facilitate"

// If `var_values` has become big enough to be heap-allocated,
// fill up `indices` to hasten subsequent lookups.
if !var_values.is_array() {
for (i, &kind) in var_values.iter().enumerate() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well do an indices.reserve here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize that I can use collect here instead.

@nnethercote
Copy link
Contributor Author

New version addresses all the comment.

@nikomatsakis
Copy link
Contributor

Nice.

Copy link
Contributor

@nikomatsakis nikomatsakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good change! Got a few questions.

@@ -74,6 +75,10 @@ pub struct CanonicalVarValues<'tcx> {
pub var_values: IndexVec<CanonicalVar, Kind<'tcx>>,
}

/// Like CanonicalVarValues, but for use in places where a SmallVec is
/// appropriate.
pub type SmallCanonicalVarValues<'tcx> = SmallVec<[Kind<'tcx>; 8]>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use this everywhere...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because CanonicalVarValue derives Clone, Debug, PartialEq, Eq, Hash, RustcDecodable, and RustcEncodable. In contrast, SmallVec doesn't define any of those. Also, I don't know what the impact of possible copying of SmallVecs (which are quite large, in terms of the number of bytes they take up on the stack) in lots of other places.

@@ -295,7 +304,8 @@ impl<'cx, 'gcx, 'tcx> Canonicalizer<'cx, 'gcx, 'tcx> {
infcx: Option<&'cx InferCtxt<'cx, 'gcx, 'tcx>>,
tcx: TyCtxt<'cx, 'gcx, 'tcx>,
canonicalize_region_mode: CanonicalizeRegionMode,
) -> (Canonicalized<'gcx, V>, CanonicalVarValues<'tcx>)
var_values: &'cx mut SmallCanonicalVarValues<'tcx>
) -> Canonicalized<'gcx, V>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just return this? I guess it's more efficient this way...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Copying the SmallCanonicalVars reduces the size of the win by about 20--25%. I figure we need every saving we can get for NLL!

// fill up `indices` to facilitate subsequent lookups.
if !var_values.is_array() {
assert!(indices.is_empty());
::std::mem::replace(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why replace and not *indices = ...? Seems simpler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True!

Extra allocations are a significant cost of NLL, and the most common
ones come from within `Canonicalizer`. In particular, `canonical_var()`
contains this code:

    indices
	.entry(kind)
	.or_insert_with(|| {
	    let cvar1 = variables.push(info);
	    let cvar2 = var_values.push(kind);
	    assert_eq!(cvar1, cvar2);
	    cvar1
	})
	.clone()

`variables` and `var_values` are `Vec`s. `indices` is a `HashMap` used
to track what elements have been inserted into `var_values`. If `kind`
hasn't been seen before, `indices`, `variables` and `var_values` all get
a new element. (The number of elements in each container is always the
same.) This results in lots of allocations.

In practice, most of the time these containers only end up holding a few
elements. This PR changes them to avoid heap allocations in the common
case, by changing the `Vec`s to `SmallVec`s and only using `indices`
once enough elements are present. (When the number of elements is small,
a direct linear search of `var_values` is as good or better than a
hashmap lookup.)

The changes to `variables` are straightforward and contained within
`Canonicalizer`. The changes to `indices` are more complex but also
contained within `Canonicalizer`. The changes to `var_values` are more
intrusive because they require defining a new type
`SmallCanonicalVarValues` -- which is to `CanonicalVarValues` as
`SmallVec` is to `Vec -- and passing stack-allocated values of that type
in from outside.

All this speeds up a number of NLL "check" builds, the best by 2%.
@nnethercote
Copy link
Contributor Author

Comments have been addressed. r? @nikomatsakis

@nikomatsakis
Copy link
Contributor

@bors r+ -- seems good. we can revisit later, as I would like to do an even more aggressive optimization here anyway (#48417)

@bors
Copy link
Collaborator

bors commented Jul 17, 2018

📌 Commit 7cc5277 has been approved by nikomatsakis

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 17, 2018
@bors
Copy link
Collaborator

bors commented Jul 18, 2018

⌛ Testing commit 7cc5277 with merge f686885...

bors added a commit that referenced this pull request Jul 18, 2018
Avoid most allocations in `Canonicalizer`.

Extra allocations are a significant cost of NLL, and the most common
ones come from within `Canonicalizer`. In particular, `canonical_var()`
contains this code:

    indices
	.entry(kind)
	.or_insert_with(|| {
	    let cvar1 = variables.push(info);
	    let cvar2 = var_values.push(kind);
	    assert_eq!(cvar1, cvar2);
	    cvar1
	})
	.clone()

`variables` and `var_values` are `Vec`s. `indices` is a `HashMap` used
to track what elements have been inserted into `var_values`. If `kind`
hasn't been seen before, `indices`, `variables` and `var_values` all get
a new element. (The number of elements in each container is always the
same.) This results in lots of allocations.

In practice, most of the time these containers only end up holding a few
elements. This PR changes them to avoid heap allocations in the common
case, by changing the `Vec`s to `SmallVec`s and only using `indices`
once enough elements are present. (When the number of elements is small,
a direct linear search of `var_values` is as good or better than a
hashmap lookup.)

The changes to `variables` are straightforward and contained within
`Canonicalizer`. The changes to `indices` are more complex but also
contained within `Canonicalizer`. The changes to `var_values` are more
intrusive because they require defining a new type
`SmallCanonicalVarValues` -- which is to `CanonicalVarValues` as
`SmallVec` is to `Vec -- and passing stack-allocated values of that type
in from outside.

All this speeds up a number of NLL "check" builds, the best by 2%.

r? @nikomatsakis
@bors
Copy link
Collaborator

bors commented Jul 18, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: nikomatsakis
Pushing f686885 to master...

@bors bors merged commit 7cc5277 into rust-lang:master Jul 18, 2018
@nnethercote nnethercote deleted the CanonicalVar branch July 18, 2018 04:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants