Skip to content

perf: improve performance of 2018 day 2 part 2 #11

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 12, 2025
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 21 additions & 18 deletions src/year2018/day02.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
//! # Inventory Management System
use crate::util::hash::*;

pub fn parse(input: &str) -> Vec<&[u8]> {
input.lines().map(str::as_bytes).collect()
Expand Down Expand Up @@ -44,29 +43,33 @@ pub fn part1(input: &[&[u8]]) -> u32 {
}

pub fn part2(input: &[&[u8]]) -> String {
let width = input[0].len();
// Manually compare all IDs, as it is faster than other methods considering there are so few total IDs
for i in 0..input.len() {
for ii in i + 1..input.len() {
let id1 = input[i];
let id2 = input[ii];

let mut seen = FastSet::with_capacity(input.len());
let mut buffer = [0; 32];

// Use a set to check for duplicates after replacing a single character with '*' in each column.
for column in 0..width {
for &id in input {
buffer[0..width].copy_from_slice(id);
buffer[column] = b'*';
let mut diff = false;
for (a, b) in id1.iter().zip(id2) {
Copy link
Owner

@maneatingape maneatingape Jul 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes a performance regression on my input, 87 µs to 129 µs.

Two nested loops have worst case quadratic complexity O(n²).
Hashing is a constant time operation so the original solution worst case complexity is O(1) * 26 * O(n) = O(n).

So it looks like you're getting lucky, while I'm getting unlucky with the input order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am no expert in O(N) notation, but wouldn't the N here be the number of IDs, so it should be lower than O(N^2) (second loop starts at the index of the first loop)?

Anyway, fair enough about the performance being worse. Just luck based that it's so much faster for my input then. I'll close the PR if that's alright with you, but out of curiosity would mind sharing your input too?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am no expert in O(N) notation, but wouldn't the N here be the number of IDs, so it should be lower than O(N^2) (second loop starts at the index of the first loop)?

Say you have 10 ids. First inner loop iteration will check 9 items, next iteration 8 and so on...
9 + 8 + 7 + ... + 2 + 1 = n(n + 1) / 2 = (1/2)(n² + n)
So you save half the comparisons but it's still overall quadratic.

Anyway, fair enough about the performance being worse. Just luck based that it's so much faster for my input then. I'll close the PR if that's alright with you, but out of curiosity would mind sharing your input too?

If you re-order the elements (for example swap the first and second halves) you can change the performance drastically. With a few random swaps I got benchmarks from 33 µs to 250 µs.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think it might be possible to improve the performance by taking advantage of the special structure of the input (always 26 lower case ASCII characters).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you save half the comparisons but it's still overall quadratic.

I seeee, thank you for helping me understand.

If you re-order the elements (for example swap the first and second halves) you can change the performance drastically

Ah yes, can confirm moving one of the solutions can make it much worse. I really did get quite lucky.

I still think it might be possible to improve the performance by taking advantage of the special structure of the input (always 26 lower case ASCII characters).

I'll have a look at it again later and see if I can come up with anything (unless you do first)

if a != b {
if diff {
diff = false;
break;
}
diff = true;
}
}

if !seen.insert(buffer) {
// Convert to String
return buffer
if diff {
// Build the string of characters which are the same between both IDs
return id1
.iter()
.filter(|&&b| b.is_ascii_lowercase())
.map(|&b| b as char)
.zip(id2)
.filter(|&(a, b)| a == b)
.map(|(a, _)| char::from(*a))
.collect();
}
}

seen.clear();
}

unreachable!()
}