-
-
Notifications
You must be signed in to change notification settings - Fork 193
Use immedietly invoked lambdas to make error checking less expensive #2249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
How portable between compilers are these attributes? Do you have any example model, or at least microbenchmark using some semi-realistic set of function calls that is showing any significant speedup from this? |
As portable as
Here's a benchmark just calling the https://gist.github.com/SteveBronder/84ba7d4bfb0f5ded4289f3b702f6b940
tbh I'm pretty surprised by these answers. To have it be like 30% faster seems like way more than I thought it would be? |
Can you try to rerun the benchmark without the |
Oh I think your right! Running I think the pattern lends itself nicely to writing error checks that are better inlined so I'll leave this issue open if anyone wants to have a go at it
|
Description
Something kind of neat I was reading the other day
https://rigtorp.se/iife/
the tl;dr is that using some compiler attributes and immediately invoked lambdas we can decrease the size of the error functions, and since those error paths take up less space it can allow for better inlining. This is also nice because we don't really care about performance when a function errors so it makes sense to put that code on a cold path.
Example
I have an example below on godbolt that has our current version of
check_range()
and a version that uses the immediately invoked lambdas and some compiler attributes. You can ctrl+f for "runner" to see the main change which is that the block of asm in compiler #2 (.L28
) has the tag[clone .cold]
which means those instructions are never pre-fetched. for compiler #1 the block of asm it jumps to (.L50
) is going to be pre-fetched because of the CPUs branch prediction.https://gcc.godbolt.org/z/6zWTcY
I'm not sure how much these would save, but we do call these functions a lot.
Expected Output
Should be the same
Current Version:
v3.4.0
The text was updated successfully, but these errors were encountered: