Skip to content

Commit a353969

Browse files
committed
Auto merge of rust-lang#124756 - xFrednet:uu-ownership-analysis, r=<try>
[EXPERIMENT] Ownership analysis [DO NOT MERGE] TL;DR: I want to collect data about ownership in Rust's ecosystem. I modified Clippy to extract things from MIR. This PR is intended for a crater run, but should definitely not be merged. --- *Roses are red,* *Violets are blue,* *Can I borrow,* *A pattern from you?* So, what is this project all about? Well, last year I was looking for a topic for my master's thesis. I had the incredible luck that the programming languages group at my university is not only a collection of incredibly cool people but also has some interest in Rust. It just so happens that `@amandasystems` is part of the group. We then spent some time looking for a thesis project in the domain of Rust's memory ownership model, and finally settled on the idea to analyze how the memory ownership model is used in practice. For Polonius, it could be interesting to know how many bodies contain borrows and if the changes from the NLL RFC are actually used, etc. ### Implementation Disclaimer, this PR contains some of the most convoluted and spaghettified code I've ever written. Do not use this as an example for anything! I'm pretty sure that the insufficiencies of this PR single-handedly increases the compile time by 100%. It's honestly a miracle that the whole thing doesn't ICE on every function. So at the beginning, I decided to use MIR as that is the IR that the borrow check uses, and it contains all the borrows and moves which might not be explicit in the HIR. While all nice and good it turns out that MIR also gets writ of information which I would have wanted for my analysis. To be clear, MIR is an excellent tool for what it was designed to do, but its loss of information has been a struggle for this project specifically. I would like to collect way more metadata, more complex patterns, and data how named references are used, but the deadline for my thesis is coming up. We decided to limit the scope to named owned variables and see how the analysis goes. If I find some interesting things there is potential to extend this work. ### Run I decided to implement the analysis on Clippy for the shorter compile times and familiar codebase. This branch disables all other lints by default to safe on some runtime. The crater run with Clippy from master might also expose some ICEs which could be interesting for us. The results of this will be aggregated and published as part of my thesis. The time imposed limitations of this implementation probably mean that the results have to be taken with a metric ton of salt, but they should be sufficient to pass my thesis and know if more work in this area would be interesting. I don't know if there is much more too to say. Let's hope: * The CI passes * This run goes smoothly * Results in some interesting data * Teaches me how to implement the whole thing better the next time. --- r? `@ghost`
2 parents 872a856 + 4d7e11d commit a353969

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

76 files changed

+11428
-123
lines changed

Cargo.lock

+1
Original file line numberDiff line numberDiff line change
@@ -671,6 +671,7 @@ dependencies = [
671671
"semver",
672672
"serde",
673673
"serde_json",
674+
"smallvec",
674675
"tempfile",
675676
"toml 0.7.8",
676677
"unicode-normalization",

src/tools/clippy/.gitignore

+2
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ rustc-ice-*
44
# Used by CI to be able to push:
55
/.github/deploy_key
66
out
7+
output.txt
8+
rustc-ice*
79

810
# Compiled files
911
*.o

src/tools/clippy/CHANGELOG.md

+1
Original file line numberDiff line numberDiff line change
@@ -5171,6 +5171,7 @@ Released 2018-09-13
51715171
[`borrow_as_ptr`]: https://rust-lang.github.io/rust-clippy/master/index.html#borrow_as_ptr
51725172
[`borrow_deref_ref`]: https://rust-lang.github.io/rust-clippy/master/index.html#borrow_deref_ref
51735173
[`borrow_interior_mutable_const`]: https://rust-lang.github.io/rust-clippy/master/index.html#borrow_interior_mutable_const
5174+
[`borrow_pats`]: https://rust-lang.github.io/rust-clippy/master/index.html#borrow_pats
51745175
[`borrowed_box`]: https://rust-lang.github.io/rust-clippy/master/index.html#borrowed_box
51755176
[`box_collection`]: https://rust-lang.github.io/rust-clippy/master/index.html#box_collection
51765177
[`box_default`]: https://rust-lang.github.io/rust-clippy/master/index.html#box_default

src/tools/clippy/clippy_lints/Cargo.toml

+4-3
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,8 @@ declare_clippy_lint = { path = "../declare_clippy_lint" }
1717
itertools = "0.12"
1818
quine-mc_cluskey = "0.2"
1919
regex-syntax = "0.8"
20-
serde = { version = "1.0", features = ["derive"] }
21-
serde_json = { version = "1.0", optional = true }
20+
serde = { version = "1.0", features = ["derive", "std"] }
21+
serde_json = { version = "1.0" }
2222
tempfile = { version = "3.3.0", optional = true }
2323
toml = "0.7.3"
2424
regex = { version = "1.5", optional = true }
@@ -27,14 +27,15 @@ unicode-script = { version = "0.5", default-features = false }
2727
semver = "1.0"
2828
rustc-semver = "1.1"
2929
url = "2.2"
30+
smallvec = { version = "1.8.1", features = ["union", "may_dangle"] }
3031

3132
[dev-dependencies]
3233
walkdir = "2.3"
3334

3435
[features]
3536
deny-warnings = ["clippy_config/deny-warnings", "clippy_utils/deny-warnings"]
3637
# build clippy with internal lints enabled, off by default
37-
internal = ["serde_json", "tempfile", "regex"]
38+
internal = ["tempfile", "regex"]
3839

3940
[package.metadata.rust-analyzer]
4041
# This crate uses #[feature(rustc_private)]

src/tools/clippy/clippy_lints/src/assigning_clones.rs

+4-5
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@ use clippy_config::msrvs::{self, Msrv};
22
use clippy_utils::diagnostics::span_lint_and_then;
33
use clippy_utils::macros::HirNode;
44
use clippy_utils::sugg::Sugg;
5-
use clippy_utils::{is_trait_method, path_to_local};
5+
use clippy_utils::{is_trait_method, local_is_initialized, path_to_local};
66
use rustc_errors::Applicability;
7-
use rustc_hir::{self as hir, Expr, ExprKind, Node};
7+
use rustc_hir::{self as hir, Expr, ExprKind};
88
use rustc_lint::{LateContext, LateLintPass};
99
use rustc_middle::ty::{self, Instance, Mutability};
1010
use rustc_session::impl_lint_pass;
@@ -36,6 +36,7 @@ declare_clippy_lint! {
3636
/// Use instead:
3737
/// ```rust
3838
/// struct Thing;
39+
///
3940
/// impl Clone for Thing {
4041
/// fn clone(&self) -> Self { todo!() }
4142
/// fn clone_from(&mut self, other: &Self) { todo!() }
@@ -163,9 +164,7 @@ fn is_ok_to_suggest<'tcx>(cx: &LateContext<'tcx>, lhs: &Expr<'tcx>, call: &CallC
163164
// TODO: This check currently bails if the local variable has no initializer.
164165
// That is overly conservative - the lint should fire even if there was no initializer,
165166
// but the variable has been initialized before `lhs` was evaluated.
166-
if let Some(Node::LetStmt(local)) = cx.tcx.hir().parent_id_iter(local).next().map(|p| cx.tcx.hir_node(p))
167-
&& local.init.is_none()
168-
{
167+
if !local_is_initialized(cx, local) {
169168
return false;
170169
}
171170
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,294 @@
1+
//! This module analyzes the relationship between the function signature and
2+
//! the internal dataflow. Specifically, it checks for the following things:
3+
//!
4+
//! - Might an owned argument be returned
5+
//! - Are arguments stored in `&mut` loans
6+
//! - Are dependent loans returned
7+
//! - Might a returned loan be `'static`
8+
//! - Are all returned values const
9+
//! - Is the unit type returned
10+
//! - How often do `&mut self` refs need to be `&mut`
11+
//! - Are all the dependencies from the function signature used.
12+
13+
#![warn(unused)]
14+
15+
use super::prelude::*;
16+
use super::{calc_fn_arg_relations, has_mut_ref, visit_body, BodyStats};
17+
18+
use clippy_utils::ty::for_each_region;
19+
20+
mod pattern;
21+
pub use pattern::*;
22+
mod flow;
23+
use flow::DfWalker;
24+
use rustc_middle::ty::Region;
25+
26+
#[derive(Debug)]
27+
pub struct BodyAnalysis<'a, 'tcx> {
28+
info: &'a AnalysisInfo<'tcx>,
29+
pats: BTreeSet<BodyPat>,
30+
data_flow: IndexVec<Local, SmallVec<[MutInfo; 2]>>,
31+
stats: BodyStats,
32+
}
33+
34+
/// This indicates an assignment to `to`. In most cases, there is also a `from`.
35+
#[derive(Debug, Clone)]
36+
enum MutInfo {
37+
/// A different place was copied or moved into this one
38+
Place(Local),
39+
Const,
40+
Arg,
41+
Calc,
42+
/// This is typical for loans and function calls.
43+
Dep(Vec<Local>),
44+
/// A value was constructed from this data
45+
Ctor(Vec<Local>),
46+
/// This is not an assignment, but the notification that a mut borrow was created
47+
Loan(Local),
48+
MutRef(Local),
49+
}
50+
51+
impl<'a, 'tcx> BodyAnalysis<'a, 'tcx> {
52+
fn new(info: &'a AnalysisInfo<'tcx>, arg_ctn: usize) -> Self {
53+
let mut data_flow: IndexVec<Local, SmallVec<[MutInfo; 2]>> = IndexVec::default();
54+
data_flow.resize(info.locals.len(), SmallVec::default());
55+
56+
(0..arg_ctn).for_each(|idx| data_flow[Local::from_usize(idx + 1)].push(MutInfo::Arg));
57+
58+
let mut pats = BTreeSet::default();
59+
if arg_ctn == 0 {
60+
pats.insert(BodyPat::NoArguments);
61+
}
62+
63+
Self {
64+
info,
65+
pats,
66+
data_flow,
67+
stats: Default::default(),
68+
}
69+
}
70+
71+
pub fn run(
72+
info: &'a AnalysisInfo<'tcx>,
73+
def_id: LocalDefId,
74+
hir_sig: &rustc_hir::FnSig<'_>,
75+
context: BodyContext,
76+
) -> (BodyInfo, BTreeSet<BodyPat>) {
77+
let mut anly = Self::new(info, hir_sig.decl.inputs.len());
78+
79+
visit_body(&mut anly, info);
80+
anly.check_fn_relations(def_id);
81+
82+
let body_info = BodyInfo::from_sig(hir_sig, info, context);
83+
84+
anly.stats.arg_relation_possibly_missed_due_generics =
85+
info.stats.borrow().arg_relation_possibly_missed_due_generics;
86+
anly.stats.arg_relation_possibly_missed_due_to_late_bounds =
87+
info.stats.borrow().arg_relation_possibly_missed_due_to_late_bounds;
88+
info.stats.replace(anly.stats);
89+
(body_info, anly.pats)
90+
}
91+
92+
fn check_fn_relations(&mut self, def_id: LocalDefId) {
93+
let mut rels = calc_fn_arg_relations(self.info.cx.tcx, def_id);
94+
let return_rels = rels.remove(&RETURN_LOCAL).unwrap_or_default();
95+
96+
// Argument relations
97+
for (child, maybe_parents) in &rels {
98+
self.check_arg_relation(*child, maybe_parents);
99+
}
100+
101+
self.check_return_relations(&return_rels, def_id);
102+
}
103+
104+
fn check_return_relations(&mut self, sig_parents: &[Local], def_id: LocalDefId) {
105+
self.stats.return_relations_signature = sig_parents.len();
106+
107+
let arg_ctn = self.info.body.arg_count;
108+
let args: Vec<_> = (0..arg_ctn).map(|i| Local::from(i + 1)).collect();
109+
110+
let mut checker = DfWalker::new(self.info, &self.data_flow, RETURN_LOCAL, &args);
111+
checker.walk();
112+
113+
for arg in &args {
114+
if checker.found_connection(*arg) {
115+
// These two branches are mutually exclusive:
116+
if sig_parents.contains(arg) {
117+
self.stats.return_relations_found += 1;
118+
}
119+
// FIXME: It would be nice if we can say, if an argument was
120+
// returned, but it feels like all we can say is that there is an connection between
121+
// this and the other thing else if !self.info.body.local_decls[*
122+
// arg].ty.is_ref() { println!("Track owned argument returned");
123+
// }
124+
}
125+
}
126+
127+
// check for static returns
128+
let mut all_regions_static = true;
129+
let mut region_count = 0;
130+
let fn_sig = self.info.cx.tcx.fn_sig(def_id).instantiate_identity().skip_binder();
131+
for_each_region(fn_sig.output(), &mut |region: Region<'_>| {
132+
region_count += 1;
133+
all_regions_static &= region.is_static();
134+
});
135+
136+
// Check if there can be static returns
137+
if region_count >= 1 && !all_regions_static {
138+
let mut pending_return_df = std::mem::take(&mut self.data_flow[RETURN_LOCAL]);
139+
let mut checked_return_df = SmallVec::with_capacity(pending_return_df.len());
140+
// We check for every assignment, if it's constant and therefore static
141+
while let Some(return_df) = pending_return_df.pop() {
142+
self.data_flow[RETURN_LOCAL].push(return_df);
143+
144+
let mut checker = DfWalker::new(self.info, &self.data_flow, RETURN_LOCAL, &args);
145+
checker.walk();
146+
let all_const = checker.all_const();
147+
148+
checked_return_df.push(self.data_flow[RETURN_LOCAL].pop().unwrap());
149+
150+
if all_const {
151+
self.pats.insert(BodyPat::ReturnedStaticLoanForNonStatic);
152+
break;
153+
}
154+
}
155+
156+
checked_return_df.append(&mut pending_return_df);
157+
self.data_flow[RETURN_LOCAL] = checked_return_df;
158+
}
159+
}
160+
161+
fn check_arg_relation(&mut self, child: Local, maybe_parents: &[Local]) {
162+
let mut checker = DfWalker::new(self.info, &self.data_flow, child, maybe_parents);
163+
checker.walk();
164+
165+
self.stats.arg_relations_signature += maybe_parents.len();
166+
self.stats.arg_relations_found += checker.connection_count();
167+
}
168+
}
169+
170+
impl<'a, 'tcx> Visitor<'tcx> for BodyAnalysis<'a, 'tcx> {
171+
fn visit_assign(&mut self, target: &Place<'tcx>, rval: &Rvalue<'tcx>, _loc: mir::Location) {
172+
match rval {
173+
Rvalue::Ref(_reg, BorrowKind::Fake(_), _src) => {
174+
#[allow(clippy::needless_return)]
175+
return;
176+
},
177+
Rvalue::Ref(_reg, kind, src) => {
178+
self.stats.ref_stmt_ctn += 1;
179+
180+
let is_mut = matches!(kind, BorrowKind::Mut { .. });
181+
if is_mut {
182+
self.data_flow[src.local].push(MutInfo::MutRef(target.local));
183+
}
184+
if matches!(src.projection.as_slice(), [mir::PlaceElem::Deref]) {
185+
// &(*_1) => Copy
186+
self.data_flow[target.local].push(MutInfo::Place(src.local));
187+
return;
188+
}
189+
190+
// _1 = &_2 => simple loan
191+
self.data_flow[target.local].push(MutInfo::Loan(src.local));
192+
},
193+
Rvalue::Cast(_, op, _) | Rvalue::Use(op) => {
194+
let event = match &op {
195+
Operand::Constant(_) => MutInfo::Const,
196+
Operand::Copy(from) | Operand::Move(from) => MutInfo::Place(from.local),
197+
};
198+
self.data_flow[target.local].push(event);
199+
},
200+
Rvalue::CopyForDeref(from) => {
201+
self.data_flow[target.local].push(MutInfo::Place(from.local));
202+
},
203+
Rvalue::Repeat(op, _) => {
204+
let event = match &op {
205+
Operand::Constant(_) => MutInfo::Const,
206+
Operand::Copy(from) | Operand::Move(from) => MutInfo::Ctor(vec![from.local]),
207+
};
208+
self.data_flow[target.local].push(event);
209+
},
210+
// Constructed Values
211+
Rvalue::Aggregate(_, fields) => {
212+
let args = fields
213+
.iter()
214+
.filter_map(rustc_middle::mir::Operand::place)
215+
.map(|place| place.local)
216+
.collect();
217+
self.data_flow[target.local].push(MutInfo::Ctor(args));
218+
},
219+
// Casts should depend on the input data
220+
Rvalue::ThreadLocalRef(_)
221+
| Rvalue::NullaryOp(_, _)
222+
| Rvalue::AddressOf(_, _)
223+
| Rvalue::Discriminant(_)
224+
| Rvalue::ShallowInitBox(_, _)
225+
| Rvalue::Len(_)
226+
| Rvalue::BinaryOp(_, _)
227+
| Rvalue::UnaryOp(_, _)
228+
| Rvalue::CheckedBinaryOp(_, _) => {
229+
self.data_flow[target.local].push(MutInfo::Calc);
230+
},
231+
}
232+
}
233+
234+
fn visit_terminator(&mut self, term: &Terminator<'tcx>, loc: Location) {
235+
let TerminatorKind::Call {
236+
destination: dest,
237+
args,
238+
..
239+
} = &term.kind
240+
else {
241+
return;
242+
};
243+
244+
for arg in args {
245+
if let Some(place) = arg.node.place() {
246+
let ty = self.info.body.local_decls[place.local].ty;
247+
if has_mut_ref(ty) {
248+
self.data_flow[place.local].push(MutInfo::Calc);
249+
}
250+
}
251+
}
252+
253+
assert!(dest.just_local());
254+
self.data_flow[dest.local].push(MutInfo::Calc);
255+
256+
let rels = &self.info.terms[&loc.block];
257+
for (target, sources) in rels {
258+
self.data_flow[*target].push(MutInfo::Dep(sources.clone()));
259+
}
260+
}
261+
}
262+
263+
pub(crate) fn update_pats_from_stats(pats: &mut BTreeSet<BodyPat>, info: &AnalysisInfo<'_>) {
264+
let stats = info.stats.borrow();
265+
266+
if stats.ref_stmt_ctn > 0 {
267+
pats.insert(BodyPat::Borrow);
268+
}
269+
270+
if stats.owned.named_borrow_count > 0 {
271+
pats.insert(BodyPat::OwnedNamedBorrow);
272+
}
273+
if stats.owned.named_borrow_mut_count > 0 {
274+
pats.insert(BodyPat::OwnedNamedBorrowMut);
275+
}
276+
277+
if stats.owned.arg_borrow_count > 0 {
278+
pats.insert(BodyPat::OwnedArgBorrow);
279+
}
280+
if stats.owned.arg_borrow_mut_count > 0 {
281+
pats.insert(BodyPat::OwnedArgBorrowMut);
282+
}
283+
284+
if stats.owned.two_phased_borrows > 0 {
285+
pats.insert(BodyPat::OwnedTwoPhaseBorrow);
286+
}
287+
288+
if stats.owned.borrowed_for_closure_count > 0 {
289+
pats.insert(BodyPat::OwnedClosureBorrow);
290+
}
291+
if stats.owned.borrowed_mut_for_closure_count > 0 {
292+
pats.insert(BodyPat::OwnedClosureBorrowMut);
293+
}
294+
}

0 commit comments

Comments
 (0)