-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Avoid most allocations in Canonicalizer
.
#52342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Here are the NLL builds with a non-negligible speed-up:
|
where | ||
V: TypeFoldable<'tcx> + Lift<'gcx>, | ||
{ | ||
let mut _var_values = SmallVec::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can probably drop the leading underscore here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
assert_eq!(variables.len(), var_values.len()); | ||
|
||
// If `var_values` has become big enough to be heap-allocated, | ||
// fill up `indices` to hasten subsequent lookups. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT, the code will not behave correctly if this prefill is removed -- perhaps we should update the comment to not say "hasten subsequent lookups"? i.e. this isn't an optimization but a requirement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change it to "facilitate"
// If `var_values` has become big enough to be heap-allocated, | ||
// fill up `indices` to hasten subsequent lookups. | ||
if !var_values.is_array() { | ||
for (i, &kind) in var_values.iter().enumerate() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well do an indices.reserve
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize that I can use collect
here instead.
c93eb64
to
4fbdc01
Compare
New version addresses all the comment. |
Nice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a good change! Got a few questions.
@@ -74,6 +75,10 @@ pub struct CanonicalVarValues<'tcx> { | |||
pub var_values: IndexVec<CanonicalVar, Kind<'tcx>>, | |||
} | |||
|
|||
/// Like CanonicalVarValues, but for use in places where a SmallVec is | |||
/// appropriate. | |||
pub type SmallCanonicalVarValues<'tcx> = SmallVec<[Kind<'tcx>; 8]>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just use this everywhere...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because CanonicalVarValue
derives Clone
, Debug
, PartialEq
, Eq
, Hash
, RustcDecodable
, and RustcEncodable
. In contrast, SmallVec
doesn't define any of those. Also, I don't know what the impact of possible copying of SmallVecs
(which are quite large, in terms of the number of bytes they take up on the stack) in lots of other places.
@@ -295,7 +304,8 @@ impl<'cx, 'gcx, 'tcx> Canonicalizer<'cx, 'gcx, 'tcx> { | |||
infcx: Option<&'cx InferCtxt<'cx, 'gcx, 'tcx>>, | |||
tcx: TyCtxt<'cx, 'gcx, 'tcx>, | |||
canonicalize_region_mode: CanonicalizeRegionMode, | |||
) -> (Canonicalized<'gcx, V>, CanonicalVarValues<'tcx>) | |||
var_values: &'cx mut SmallCanonicalVarValues<'tcx> | |||
) -> Canonicalized<'gcx, V> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just return this? I guess it's more efficient this way...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Copying the SmallCanonicalVars
reduces the size of the win by about 20--25%. I figure we need every saving we can get for NLL!
// fill up `indices` to facilitate subsequent lookups. | ||
if !var_values.is_array() { | ||
assert!(indices.is_empty()); | ||
::std::mem::replace( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why replace
and not *indices = ...
? Seems simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True!
Extra allocations are a significant cost of NLL, and the most common ones come from within `Canonicalizer`. In particular, `canonical_var()` contains this code: indices .entry(kind) .or_insert_with(|| { let cvar1 = variables.push(info); let cvar2 = var_values.push(kind); assert_eq!(cvar1, cvar2); cvar1 }) .clone() `variables` and `var_values` are `Vec`s. `indices` is a `HashMap` used to track what elements have been inserted into `var_values`. If `kind` hasn't been seen before, `indices`, `variables` and `var_values` all get a new element. (The number of elements in each container is always the same.) This results in lots of allocations. In practice, most of the time these containers only end up holding a few elements. This PR changes them to avoid heap allocations in the common case, by changing the `Vec`s to `SmallVec`s and only using `indices` once enough elements are present. (When the number of elements is small, a direct linear search of `var_values` is as good or better than a hashmap lookup.) The changes to `variables` are straightforward and contained within `Canonicalizer`. The changes to `indices` are more complex but also contained within `Canonicalizer`. The changes to `var_values` are more intrusive because they require defining a new type `SmallCanonicalVarValues` -- which is to `CanonicalVarValues` as `SmallVec` is to `Vec -- and passing stack-allocated values of that type in from outside. All this speeds up a number of NLL "check" builds, the best by 2%.
4fbdc01
to
7cc5277
Compare
Comments have been addressed. r? @nikomatsakis |
📌 Commit 7cc5277 has been approved by |
Avoid most allocations in `Canonicalizer`. Extra allocations are a significant cost of NLL, and the most common ones come from within `Canonicalizer`. In particular, `canonical_var()` contains this code: indices .entry(kind) .or_insert_with(|| { let cvar1 = variables.push(info); let cvar2 = var_values.push(kind); assert_eq!(cvar1, cvar2); cvar1 }) .clone() `variables` and `var_values` are `Vec`s. `indices` is a `HashMap` used to track what elements have been inserted into `var_values`. If `kind` hasn't been seen before, `indices`, `variables` and `var_values` all get a new element. (The number of elements in each container is always the same.) This results in lots of allocations. In practice, most of the time these containers only end up holding a few elements. This PR changes them to avoid heap allocations in the common case, by changing the `Vec`s to `SmallVec`s and only using `indices` once enough elements are present. (When the number of elements is small, a direct linear search of `var_values` is as good or better than a hashmap lookup.) The changes to `variables` are straightforward and contained within `Canonicalizer`. The changes to `indices` are more complex but also contained within `Canonicalizer`. The changes to `var_values` are more intrusive because they require defining a new type `SmallCanonicalVarValues` -- which is to `CanonicalVarValues` as `SmallVec` is to `Vec -- and passing stack-allocated values of that type in from outside. All this speeds up a number of NLL "check" builds, the best by 2%. r? @nikomatsakis
☀️ Test successful - status-appveyor, status-travis |
Extra allocations are a significant cost of NLL, and the most common
ones come from within
Canonicalizer
. In particular,canonical_var()
contains this code:
variables
andvar_values
areVec
s.indices
is aHashMap
usedto track what elements have been inserted into
var_values
. Ifkind
hasn't been seen before,
indices
,variables
andvar_values
all geta new element. (The number of elements in each container is always the
same.) This results in lots of allocations.
In practice, most of the time these containers only end up holding a few
elements. This PR changes them to avoid heap allocations in the common
case, by changing the
Vec
s toSmallVec
s and only usingindices
once enough elements are present. (When the number of elements is small,
a direct linear search of
var_values
is as good or better than ahashmap lookup.)
The changes to
variables
are straightforward and contained withinCanonicalizer
. The changes toindices
are more complex but alsocontained within
Canonicalizer
. The changes tovar_values
are moreintrusive because they require defining a new type
SmallCanonicalVarValues
-- which is toCanonicalVarValues
asSmallVec
is to `Vec -- and passing stack-allocated values of that typein from outside.
All this speeds up a number of NLL "check" builds, the best by 2%.
r? @nikomatsakis