-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Rust: Rework call resolution and type inference for calls #20282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Rust: Rework call resolution and type inference for calls #20282
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
e4cfb86
to
4a8c37c
Compare
4a8c37c
to
e75d79e
Compare
61866bf
to
2d1ed65
Compare
2d1ed65
to
3d19a06
Compare
3d19a06
to
153c10b
Compare
153c10b
to
e161d4c
Compare
dd45f7b
to
a20c440
Compare
f45d2d5
to
f9f8782
Compare
0c1d6df
to
41275ca
Compare
| main.rs:212:24:212:33 | source(...) | main.rs:1:1:3:1 | fn source | | ||
| main.rs:214:5:214:11 | sink(...) | main.rs:5:1:7:1 | fn sink | | ||
| main.rs:228:10:228:14 | * ... | main.rs:235:5:237:5 | fn deref | | ||
| main.rs:236:11:236:15 | * ... | main.rs:235:5:237:5 | fn deref | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These targets were actually wrong, so it is correct that they are now removed.
b3ddde3
to
63744d1
Compare
// `app` uses inconsistent type parameter instantiations | ||
exists(TypeParameter tp | | ||
potentialInstantiationOf(app, abs, tm) and | ||
app.getTypeAt(getNthTypeParameterPath(tm, tp, _)) != | ||
app.getTypeAt(getNthTypeParameterPath(tm, tp, _)) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check has been removed, because it would sometimes apply to entities that were assigned multiple copies of the same type, existing in different versions of a library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds great, and I agree the DCA and QA results look fantastic overall. As does the reduction of inconsistencies in tests. Thank you for the explanation, thorough testing and explaining the impact here so there won't be any surprises after this is merged - and keeping track of some of the regressions that are part of this net improvement. 👍
I haven't looked at the code changes yet. @paldepind is probably the better reviewer for this, but I'll have a look as well.
88c9f0f
to
2ad8e2b
Compare
Rebased to resolve merge conflicts in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked through the code, I don't have much to say there. It's a complicated area of code and thus important it stays well documented.
I also tried running a data flow query on one of the projects that slowed down the most on DCA - it does seem like something may be going wrong in the type inference recursion in some cases. Given that there's a net performance improvement on DCA (and QA) I don't think this needs to block merging the PR, but it might be worth looking into afterwards.
Already on it :-) |
6e54fa0
to
8a25e32
Compare
8a25e32
to
89da13c
Compare
89da13c
to
b6bbffd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Submitting the comments that I have thus far, in case you want to look at some of this before I'm completely done.
private newtype TNode = | ||
TTrait(Trait t) { relevantTraitVisible(_, t) } or | ||
TItemNode(ItemNode i) or | ||
TElement(Element e) { relevantTraitVisible(e, _) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would performance be worse if instead of a newtype we just used AstNode
(or maybe Locatable
) for doublyBoundedFastTC
? One could hope that it's performance only depends on the sizes of the predicates and not the type that they're on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to use a newtype
to distinguish source and sink nodes from all the intermediate nodes, because edges out of sources and into sinks are different from all other edges.
* [1]: https://doc.rust-lang.org/stable/reference/items/associated-items.html#r-items.associated.fn.method.self-ty | ||
*/ | ||
pragma[nomagic] | ||
predicate complexSelfRoot(Type root, TypeParameter tp) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This predicate does two orthogonal things. 1/ Picking the types that can appear for self
and 2/ getting the first positional type parameter of a type. I think it would be worthwhile to have to former as a separate predicate called something like validSelfType
?
s instanceof BoxStruct | ||
or | ||
s instanceof RcStruct | ||
or | ||
s instanceof ArcStruct | ||
or | ||
s instanceof PinStruct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s instanceof BoxStruct | |
or | |
s instanceof RcStruct | |
or | |
s instanceof ArcStruct | |
or | |
s instanceof PinStruct | |
s instanceof BoxStruct or | |
s instanceof RcStruct or | |
s instanceof ArcStruct or | |
s instanceof PinStruct |
class FunctionTypePosition extends TFunctionTypePosition { | ||
predicate isSelf() { this.asArgumentPosition().isSelf() } | ||
|
||
int asPositional() { result = this.asArgumentPosition().asPosition() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be nicer to reuse the same name? It also seems more natural to say that the predicate returns a "position" rather than a "positional".
int asPositional() { result = this.asArgumentPosition().asPosition() } | |
int asPosition() { result = this.asArgumentPosition().asPosition() } |
private predicate hasTypeParameterAt(TypePath path, TypeParameter tp) { | ||
this.getDeclaredTypeAt(path) = tp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be natural as a predicate with result:
private predicate hasTypeParameterAt(TypePath path, TypeParameter tp) { | |
this.getDeclaredTypeAt(path) = tp | |
private TypeParameter getTypeParameterAt(TypePath path) { | |
result = this.getDeclaredTypeAt(path) |
* `i` is `type`. | ||
*/ | ||
pragma[nomagic] | ||
predicate assocFunctionTypeAtPath( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
predicate assocFunctionTypeAtPath( | |
predicate assocFunctionTypeAt( |
private predicate assocFunctionInfo( | ||
Function f, string name, int arity, ImplOrTraitItemNode i, FunctionTypePosition pos, | ||
AssocFunctionType t | ||
) { | ||
f = i.getASuccessor(name) and | ||
arity = f.getParamList().getNumberOfParams() and | ||
t.appliesTo(f, pos, i) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that (f, i, pos)
uniquely determines and is uniquely determined by t
, we're including the same data twice. Could we remove one of them like this?
private predicate assocFunctionInfo( | |
Function f, string name, int arity, ImplOrTraitItemNode i, FunctionTypePosition pos, | |
AssocFunctionType t | |
) { | |
f = i.getASuccessor(name) and | |
arity = f.getParamList().getNumberOfParams() and | |
t.appliesTo(f, pos, i) | |
} | |
private predicate assocFunctionInfo(AssocFunctionType t, string name, int arity) { | |
t.getFunction() = t.getImpl().getASuccessor(name) and | |
arity = t.getFunction().getParamList().getNumberOfParams() | |
} |
Same question for several predicates below, like methodInfo
, methodCallNonBlanketCandidate
, etc.
abstract class MethodCall extends Expr { | ||
abstract predicate hasNameAndArity(string name, int arity); | ||
|
||
override TypeParameter getTypeParameter(TypeParameterPosition ppos) { | ||
typeParamMatchPosition(this.getGenericParamList().getATypeParam(), result, ppos) | ||
} | ||
abstract Expr getArgument(ArgumentPosition pos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class duplicates a bunch of stuff already present in Call
. For instance, the getArgument
overrides below are identical to the getArgument
implementations in Call
.
I think we should be able to extend Call
, remove hasNameAndArity
(which duplicates getMethodName
, and remove getNumberOfArguments
) and getArgument
(which duplicates getArgument
).
abstract class MethodCall extends Expr { | |
abstract predicate hasNameAndArity(string name, int arity); | |
override TypeParameter getTypeParameter(TypeParameterPosition ppos) { | |
typeParamMatchPosition(this.getGenericParamList().getATypeParam(), result, ppos) | |
} | |
abstract Expr getArgument(ArgumentPosition pos); | |
abstract class MethodCall extends Call { | |
MethodCall() { exists(super.getMethodName()) } |
The suggestion doesn't work as-is, because there is a difference in which CallExpr
s are considered method calls. For instance, this call is considered a method call by MethodCall
but not by Call
. I suppose we should just change Call
to behave like MethodCall
is currently doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to ultimately get rid of the Call
class, which is why I duplicated some of the logic here.
* [1]: https://doc.rust-lang.org/reference/expressions/method-call-expr.html#r-expr.method.candidate-receivers | ||
*/ | ||
pragma[nomagic] | ||
Type getACandidateReceiverTypeAt(TypePath path, string derefChainBorrow) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that there can only be a single borrow at the end, could we encode this with a separate boolean instead of in the string?
Type getACandidateReceiverTypeAt(TypePath path, string derefChainBorrow) { | |
Type getACandidateReceiverTypeAt(TypePath path, string derefChain, boolean borrow) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will, at some point, need to distinguish borrows from mut
borrows, at which point a boolean
does not suffice; but perhaps that is OK for now.
exists(FunctionTypePosition pos | | ||
assocFunctionInfo(m, name, arity, i, pos, selfType) and | ||
strippedType = selfType.getTypeAt(strippedTypePath) and | ||
isComplexRootStripped(strippedTypePath, strippedType) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work if we are in fact adding a method to one of the types that can appear for self, like Box
or Pin
?
Overview
This PR rewrites how we do call resolution and type inference for calls, to make it more faithful to what actually happens in the compiler.
Impact
The changes to expected test output shows that this PR resolves many shortcomings, as well as removes a lot of inconsistencies.
DCA is excellent: On some projects we achieve a whopping ~90 % reduction in analysis time, which follows the decrease in
Nodes With Type At Length Limit
for those projects. Oncoreutils
andrendiation
(incorrectly tagged asradiance
in the run), however, we see increases in both analysis time andNodes With Type At Length Limit
.I also did a QA run, which confirms the overall reduction in analysis time:
Top 50 largest absolute deltas
The QA run also showed that we have resolved analysis timeouts/failures for 29 projects:
Projects timeout/failure before
williamlion218/rust-sgx-sdk
zkMIPS/zkMIPS
zama-ai/tfhe-rs
veloren/veloren
kentakom1213/kyopro
TimTheBig/geo-3d
Univa/rumcake
10XGenomics/cellranger
ricosjp/truck
okaponta/atcoder-rust
jblindsay/whitebox-tools
futureversecom/trn-seed
mycroft/challenges
galacticcouncil/hydration-node
ChristopherBiscardi/advent-of-code
gasp-xyz/gasp-monorepo
dimforge/nalgebra
Apollo-Lab-Yale/apollo-rust
awsdocs/aws-doc-sdk-examples
rickyota/genoboost
SparkyPotato/radiance
strawlab/strand-braid
10XGenomics/spaceranger
sarah-quinones/faer-rs
hackmad/pbrt-v3-rs
attack68/rateslib
wingrew/thcore
Gleb-Zaslavsky/RustedSciThe
feos-org/feos
However, timeouts/failures have been introduced for 7 new projects
Projects timeout/failure after
golemfactory/yagna
carthage-software/mago
MaterializeInc/materialize
stencila/stencila
typedb/typedb
Feodor2/Mypal68
mattwparas/steel
In summary, both DCA and QA indicate overall performans wins, which is not necessarily expected (and certainly not the case for many earlier iterations of this PR), as this PR extends on the kinds of calls we are able to resolve.
For the reviewer
Note for review: As usual, commit-by-commit review is encouraged. As for the changes to
TypeInference.qll
, I very much recommend using split diff view.Method call resolution
According to the spec, when resolving a method call
x.m()
:Before this PR, we handled the above in a very ad hoc way, where we did attempt to model implicit dereferencing and borrowing, but we did not model the construction of candidate receiver types and prioritized lookup order. In particular, if
x
had type&Foo
, we would only lookup the method inFoo
.With this PR, we model prioritized method lookup in the list of candidate receiver types in the module
MethodResolution
, but instead of constructing the full list of candidate receiver types, we recursively compute a set of candidates, only adding a new candidate receiver type to the set when we can rule out that the method cannot be found for the current candidate:Care must be taken to ensure that the
not current_candidate matches method
check is monotonic, which we achieve using the monotonicisNotInstantiationOf
predicate from the shared type inference library.Method lookup
For a given candidate receiver type
C
, we need to match that type against the type of theself
parameters of all potential call targets, taking into account thatself
parameters can have both explicit types and use shorthand syntax. Further care must be taken for methods that are inherited (either a trait method with a default implementation inherited by animpl
block or a trait method inherited by a sub trait), so it only makes sense to talk about the type of aself
parameter in the context of a givenimpl
block or trait where that method is available (either directly or inherited). We model this using the classAssocFunctionType
in the newly introducedFunctionType.qll
library.As before this PR, we use the
IsInstantiationOf
library for matchingC
against a givenAssocFunctionType
typeS
, now distinguishing between the following three cases:C
that matches the blanket type parameter also satisfies the blanket constraint. This means that blanket implementations are now also taken into account in the context of auto-dereferencing/borrowing.S
represents the type of aself
parameter for a method in a trait: In caseC
is e.g.dyn Trait
, then we wantC
to matchS
, but only if the traits match up as well. We achieve this by substituting in the trait in bothS
andC
before performing theIsInstantiationOf
check.IsInstantiationOf
check.Method call type inference
When we have identified a valid call target for
x.m()
with a given candidate receiver typeC
, we need to use that type as well when doing type inference. Before this PR, we used theMatching
module from the shared type inference library, but now we instead useMatchingWithEnvironment
, where we recordC
in the environment via the sequence of auto-dereferences and borrows that happened to obtainC
. This means we replace the ad hoc handling mentioned earlier, because we now have explicit knowledge about dereferencing/borrowing. The implementation is in the newMethodCallMatchingInput
module.Non-method call resolution
Resolution of non-method calls is much easier, since there is no such thing as auto-dereferencing and borrowing, even if the target is a method (
Foo::m(&x)
vsx.m()
). However, as for method call resolution, we still need to take three cases into account:IsInstantiationOf
check.IsInstantiationOf
check, using an argument or the call context, when it provides information about the return type.The implementation is in the module
NonMethodResolution
for calls that target non-methods, and in theMethodResolution
module for calls that target methods.Non-method call type inference
When the call is an operator call, we need to take into account that implicit borrowing may happen. For example,
x == y
is syntactic sugar forPartialEq::eq(&x, &y)
, so in order for the types to properly match up, we adjust the types of the operator, by stripping away the&
s. This is done in the newOperationMatchingInput
module.When the call is not an operator call, we can match types directly, which happens in the
NonMethodCallMatchingInput
module for calls that target non-methods, and in theMethodCallMatchingInput
module for calls that target methods.Future work
Deref
trait when performing auto-dereferencing and unsized coercions. With this PR, however, it should be much easier to support that.C_i
, we will still lookup inC_(i+1)
as well. Supporting this is not straightforward, since we need a monotonic way of checking blanket constraint non-satisfaction.