- 
                Notifications
    You must be signed in to change notification settings 
- Fork 13.9k
          Avoid most allocations in Canonicalizer.
          #52342
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Here are the NLL builds with a non-negligible speed-up:  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can probably drop the leading underscore here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT, the code will not behave correctly if this prefill is removed -- perhaps we should update the comment to not say "hasten subsequent lookups"? i.e. this isn't an optimization but a requirement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change it to "facilitate"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well do an indices.reserve here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize that I can use collect here instead.
c93eb64    to
    4fbdc01      
    Compare
  
    | New version addresses all the comment. | 
| Nice. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a good change! Got a few questions.
        
          
                src/librustc/infer/canonical/mod.rs
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just use this everywhere...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because CanonicalVarValue derives Clone, Debug, PartialEq, Eq, Hash, RustcDecodable, and RustcEncodable. In contrast, SmallVec doesn't define any of those. Also, I don't know what the impact of possible copying of SmallVecs (which are quite large, in terms of the number of bytes they take up on the stack) in lots of other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just return this? I guess it's more efficient this way...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Copying the SmallCanonicalVars reduces the size of the win by about 20--25%. I figure we need every saving we can get for NLL!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why replace and not *indices = ...? Seems simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True!
Extra allocations are a significant cost of NLL, and the most common
ones come from within `Canonicalizer`. In particular, `canonical_var()`
contains this code:
    indices
	.entry(kind)
	.or_insert_with(|| {
	    let cvar1 = variables.push(info);
	    let cvar2 = var_values.push(kind);
	    assert_eq!(cvar1, cvar2);
	    cvar1
	})
	.clone()
`variables` and `var_values` are `Vec`s. `indices` is a `HashMap` used
to track what elements have been inserted into `var_values`. If `kind`
hasn't been seen before, `indices`, `variables` and `var_values` all get
a new element. (The number of elements in each container is always the
same.) This results in lots of allocations.
In practice, most of the time these containers only end up holding a few
elements. This PR changes them to avoid heap allocations in the common
case, by changing the `Vec`s to `SmallVec`s and only using `indices`
once enough elements are present. (When the number of elements is small,
a direct linear search of `var_values` is as good or better than a
hashmap lookup.)
The changes to `variables` are straightforward and contained within
`Canonicalizer`. The changes to `indices` are more complex but also
contained within `Canonicalizer`. The changes to `var_values` are more
intrusive because they require defining a new type
`SmallCanonicalVarValues` -- which is to `CanonicalVarValues` as
`SmallVec` is to `Vec -- and passing stack-allocated values of that type
in from outside.
All this speeds up a number of NLL "check" builds, the best by 2%.
    4fbdc01    to
    7cc5277      
    Compare
  
    | Comments have been addressed. r? @nikomatsakis | 
| 📌 Commit 7cc5277 has been approved by  | 
Avoid most allocations in `Canonicalizer`.
Extra allocations are a significant cost of NLL, and the most common
ones come from within `Canonicalizer`. In particular, `canonical_var()`
contains this code:
    indices
	.entry(kind)
	.or_insert_with(|| {
	    let cvar1 = variables.push(info);
	    let cvar2 = var_values.push(kind);
	    assert_eq!(cvar1, cvar2);
	    cvar1
	})
	.clone()
`variables` and `var_values` are `Vec`s. `indices` is a `HashMap` used
to track what elements have been inserted into `var_values`. If `kind`
hasn't been seen before, `indices`, `variables` and `var_values` all get
a new element. (The number of elements in each container is always the
same.) This results in lots of allocations.
In practice, most of the time these containers only end up holding a few
elements. This PR changes them to avoid heap allocations in the common
case, by changing the `Vec`s to `SmallVec`s and only using `indices`
once enough elements are present. (When the number of elements is small,
a direct linear search of `var_values` is as good or better than a
hashmap lookup.)
The changes to `variables` are straightforward and contained within
`Canonicalizer`. The changes to `indices` are more complex but also
contained within `Canonicalizer`. The changes to `var_values` are more
intrusive because they require defining a new type
`SmallCanonicalVarValues` -- which is to `CanonicalVarValues` as
`SmallVec` is to `Vec -- and passing stack-allocated values of that type
in from outside.
All this speeds up a number of NLL "check" builds, the best by 2%.
r? @nikomatsakis
    | ☀️ Test successful - status-appveyor, status-travis | 
Extra allocations are a significant cost of NLL, and the most common
ones come from within
Canonicalizer. In particular,canonical_var()contains this code:
variablesandvar_valuesareVecs.indicesis aHashMapusedto track what elements have been inserted into
var_values. Ifkindhasn't been seen before,
indices,variablesandvar_valuesall geta new element. (The number of elements in each container is always the
same.) This results in lots of allocations.
In practice, most of the time these containers only end up holding a few
elements. This PR changes them to avoid heap allocations in the common
case, by changing the
Vecs toSmallVecs and only usingindicesonce enough elements are present. (When the number of elements is small,
a direct linear search of
var_valuesis as good or better than ahashmap lookup.)
The changes to
variablesare straightforward and contained withinCanonicalizer. The changes toindicesare more complex but alsocontained within
Canonicalizer. The changes tovar_valuesare moreintrusive because they require defining a new type
SmallCanonicalVarValues-- which is toCanonicalVarValuesasSmallVecis to `Vec -- and passing stack-allocated values of that typein from outside.
All this speeds up a number of NLL "check" builds, the best by 2%.
r? @nikomatsakis