-
Notifications
You must be signed in to change notification settings - Fork 18
perf: improve performance of 2018 day 2 part 2 #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: improve performance of 2018 day 2 part 2 #11
Conversation
02094ef
to
f6c50ce
Compare
I really appreciate anyone taking the time to make performance improvements or showing how solutions could be better, so all PR are much appreciated! |
src/year2018/day02.rs
Outdated
buffer[0..width].copy_from_slice(id); | ||
buffer[column] = b'*'; | ||
let mut diff = false; | ||
for (a, b) in id1.iter().zip(id2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This causes a performance regression on my input, 87 µs to 129 µs.
Two nested loops have worst case quadratic complexity O(n²)
.
Hashing is a constant time operation so the original solution worst case complexity is O(1) * 26 * O(n) = O(n)
.
So it looks like you're getting lucky, while I'm getting unlucky with the input order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am no expert in O(N)
notation, but wouldn't the N
here be the number of IDs, so it should be lower than O(N^2)
(second loop starts at the index of the first loop)?
Anyway, fair enough about the performance being worse. Just luck based that it's so much faster for my input then. I'll close the PR if that's alright with you, but out of curiosity would mind sharing your input too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am no expert in O(N) notation, but wouldn't the N here be the number of IDs, so it should be lower than O(N^2) (second loop starts at the index of the first loop)?
Say you have 10 ids. First inner loop iteration will check 9 items, next iteration 8 and so on...
9 + 8 + 7 + ... + 2 + 1 = n(n + 1) / 2 = (1/2)(n² + n)
So you save half the comparisons but it's still overall quadratic.
Anyway, fair enough about the performance being worse. Just luck based that it's so much faster for my input then. I'll close the PR if that's alright with you, but out of curiosity would mind sharing your input too?
If you re-order the elements (for example swap the first and second halves) you can change the performance drastically. With a few random swaps I got benchmarks from 33 µs to 250 µs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think it might be possible to improve the performance by taking advantage of the special structure of the input (always 26 lower case ASCII characters).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you save half the comparisons but it's still overall quadratic.
I seeee, thank you for helping me understand.
If you re-order the elements (for example swap the first and second halves) you can change the performance drastically
Ah yes, can confirm moving one of the solutions can make it much worse. I really did get quite lucky.
I still think it might be possible to improve the performance by taking advantage of the special structure of the input (always 26 lower case ASCII characters).
I'll have a look at it again later and see if I can come up with anything (unless you do first)
BTW I welcome PRs (even if they don't always work out). It's always an interesting discussion and could spark an improvement either in this or another solution. |
Sure yeah. Thank you for being so understanding. |
…prefix and suffix of IDs into the hashset
I tried a couple different (more complicated) approaches without much luck. Went back to the original approach and changed it from storing the ID with a character replaced in the set, to just storing the prefix and suffix around the character, and I believe there is some performance gained from that |
If you're interested @maneatingape, it was just a simple change but for me it's giving around 30-45% speed up: pub fn part2(input: &[&[u8]]) -> String {
let width = input[0].len();
let mut seen = FastSet::with_capacity(input.len());
// Use a set to check for duplicates by comparing the prefix and suffix of IDs excluding one
// column at a time.
for column in 0..width {
for &id in input {
let prefix = &id[..column];
let suffix = &id[column + 1..];
if !seen.insert([prefix, suffix]) {
// Convert to String
return prefix.iter().chain(suffix).cloned().map(char::from).collect();
}
}
seen.clear();
}
unreachable!()
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat! Faster and more elegant.
79 µs => 49 µs.
Missed that lint - not getting these from clippy locally but yeah |
Hi again. This is a quick one for improving the performance of 2018 day 2 part 2 (85μs => 15μs for me). Let me know if you want any changes, and no hard feelings if you don't want the PR at all.