Skip to content

Commit cb0d39b

Browse files
authored
Merge pull request #972 from rylev/meaningful-not-signficiant
Use the term 'relevant' instead of 'significant' in try run results
2 parents e9ebec3 + bdcfe3e commit cb0d39b

File tree

3 files changed

+61
-34
lines changed

3 files changed

+61
-34
lines changed

docs/glossary.md

+1
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ The following is a glossary of domain specific terminology. Although benchmarks
3636
* **test result delta**: the difference between two test results for the same metric and test case.
3737
* **significant test result delta**: a test result delta that is above some threshold that we have determined to be an actual change in performance and not noise.
3838
* **noisy test case**: a test case for which the non-significant test result deltas are on average above some threshold. Non-noisy data tends to be very close to zero, and so a test case that produces non-significant test result deltas that are not close enough to zero on average are classified as noisy.
39+
* **relevant test result delta**: a synonym for *significant test result delta* in situations where the term "significant" might be ambiguous and readers may potentially interpret *significant* as "large" or "statistically significant". For example, in try run results, we use the term relevant instead of significant.
3940
* **highly variable test case**: a test case that frequently produces (over some threshold) significant test result deltas. This differs from sensitive benchmarks in that the deltas aren't necessarily large, they just happen more frequently than one would expect, and it is unclear why the significant deltas are happening. Note: highly variable test cases can be classified as noisy. They are different from classically noisy benchmarks in that they often produce deltas that are above the significance threshold. We cannot therefore be 100% if they are variable because they are noisy or because parts of the compiler being exercised by these benchmarks are just being changed very often.
4041
* **highly sensitive benchmark**: a test case that tends to produce large (over some threshold) significant test result deltas. This differs from highly variable test cases in that it is usually clear what type of changes tend to result in significant test result deltas. For example, some of the stress tests are highly sensitive; they produce significant test result deltas when parts of the compiler that they are stressing change and the percentage change is typically quite large.
4142

site/src/comparison.rs

+42-18
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,7 @@ async fn populate_report(
141141
}
142142

143143
pub struct ComparisonSummary {
144-
/// Significant comparisons ordered by magnitude
144+
/// Significant comparisons ordered by magnitude from largest to smallest
145145
comparisons: Vec<TestResultComparison>,
146146
}
147147

@@ -168,8 +168,15 @@ impl ComparisonSummary {
168168
Some(ComparisonSummary { comparisons })
169169
}
170170

171+
/// Gets the overall direction and magnitude of the changes
172+
///
173+
/// Returns `None` if there are no relevant changes.
174+
pub fn direction_and_magnitude(&self) -> Option<(Direction, Magnitude)> {
175+
self.direction().zip(self.magnitude())
176+
}
177+
171178
/// The direction of the changes
172-
pub fn direction(&self) -> Option<Direction> {
179+
fn direction(&self) -> Option<Direction> {
173180
if self.comparisons.len() == 0 {
174181
return None;
175182
}
@@ -225,6 +232,16 @@ impl ComparisonSummary {
225232
}
226233
}
227234

235+
/// Get the largest magnitude of any change in the comparison.
236+
///
237+
/// Returns `None` if there are no relevant_changes
238+
fn magnitude(&self) -> Option<Magnitude> {
239+
[self.largest_improvement(), self.largest_regression()]
240+
.iter()
241+
.filter_map(|c| c.map(|c| c.magnitude()))
242+
.max_by(|c1, c2| c1.cmp(c2))
243+
}
244+
228245
pub fn relevant_changes<'a>(&'a self) -> [Option<&TestResultComparison>; 2] {
229246
match self.direction() {
230247
Some(Direction::Improvement) => [self.largest_improvement(), None],
@@ -234,15 +251,12 @@ impl ComparisonSummary {
234251
}
235252
}
236253

237-
pub fn largest_improvement(&self) -> Option<&TestResultComparison> {
238-
self.comparisons
239-
.iter()
240-
.filter(|s| !s.is_regression())
241-
.next()
254+
fn largest_improvement(&self) -> Option<&TestResultComparison> {
255+
self.comparisons.iter().find(|s| s.is_improvement())
242256
}
243257

244-
pub fn largest_regression(&self) -> Option<&TestResultComparison> {
245-
self.comparisons.iter().filter(|s| s.is_regression()).next()
258+
fn largest_regression(&self) -> Option<&TestResultComparison> {
259+
self.comparisons.iter().find(|s| s.is_regression())
246260
}
247261

248262
pub fn confidence(&self) -> ComparisonConfidence {
@@ -744,6 +758,10 @@ impl TestResultComparison {
744758
b > a
745759
}
746760

761+
fn is_improvement(&self) -> bool {
762+
!self.is_regression()
763+
}
764+
747765
fn is_significant(&self) -> bool {
748766
self.relative_change().abs() > self.signifcance_threshold()
749767
}
@@ -798,7 +816,7 @@ impl TestResultComparison {
798816
write!(
799817
summary,
800818
"{} {} in {}",
801-
magnitude,
819+
magnitude.display_as_title(),
802820
self.direction(),
803821
match link {
804822
Some(l) => format!("[instruction counts]({})", l),
@@ -853,8 +871,8 @@ impl std::fmt::Display for Direction {
853871
}
854872

855873
/// The relative size of a performance change
856-
#[derive(Debug)]
857-
enum Magnitude {
874+
#[derive(Debug, PartialOrd, PartialEq, Ord, Eq)]
875+
pub enum Magnitude {
858876
Small,
859877
Medium,
860878
Large,
@@ -865,17 +883,23 @@ impl Magnitude {
865883
fn is_medium_or_above(&self) -> bool {
866884
!matches!(self, Self::Small)
867885
}
868-
}
869886

870-
impl std::fmt::Display for Magnitude {
871-
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
872-
let s = match self {
887+
pub fn display_as_title(&self) -> &'static str {
888+
match self {
873889
Self::Small => "Small",
874890
Self::Medium => "Moderate",
875891
Self::Large => "Large",
876892
Self::VeryLarge => "Very large",
877-
};
878-
f.write_str(s)
893+
}
894+
}
895+
896+
pub fn display(&self) -> &'static str {
897+
match self {
898+
Self::Small => "small",
899+
Self::Medium => "moderate",
900+
Self::Large => "large",
901+
Self::VeryLarge => "very large",
902+
}
879903
}
880904
}
881905

site/src/github.rs

+18-16
Original file line numberDiff line numberDiff line change
@@ -605,29 +605,31 @@ async fn categorize_benchmark(
605605
};
606606
const DISAGREEMENT: &str = "If you disagree with this performance assessment, \
607607
please file an issue in [rust-lang/rustc-perf](https://github.com/rust-lang/rustc-perf/issues/new).";
608-
let (summary, direction) = match ComparisonSummary::summarize_comparison(&comparison) {
609-
Some(s) if s.confidence().is_atleast_probably_relevant() => {
610-
let direction = s.direction().unwrap();
611-
(s, direction)
612-
}
613-
_ => {
614-
return (
615-
format!(
616-
"This benchmark run did not return any significant changes.\n\n{}",
617-
DISAGREEMENT
618-
),
619-
None,
620-
)
621-
}
622-
};
608+
let (summary, (direction, magnitude)) =
609+
match ComparisonSummary::summarize_comparison(&comparison) {
610+
Some(s) if s.confidence().is_atleast_probably_relevant() => {
611+
let direction_and_magnitude = s.direction_and_magnitude().unwrap();
612+
(s, direction_and_magnitude)
613+
}
614+
_ => {
615+
return (
616+
format!(
617+
"This benchmark run did not return any relevant changes.\n\n{}",
618+
DISAGREEMENT
619+
),
620+
None,
621+
)
622+
}
623+
};
623624

624625
let category = match direction {
625626
Direction::Improvement => "improvements 🎉",
626627
Direction::Regression => "regressions 😿",
627628
Direction::Mixed => "mixed results 🤷",
628629
};
629630
let mut result = format!(
630-
"This change led to significant {} in compiler performance.\n",
631+
"This change led to {} relevant {} in compiler performance.\n",
632+
magnitude.display(),
631633
category
632634
);
633635
for change in summary.relevant_changes().iter().filter_map(|c| *c) {

0 commit comments

Comments
 (0)