Description
Description
Something kind of neat I was reading the other day
the tl;dr is that using some compiler attributes and immediately invoked lambdas we can decrease the size of the error functions, and since those error paths take up less space it can allow for better inlining. This is also nice because we don't really care about performance when a function errors so it makes sense to put that code on a cold path.
Example
I have an example below on godbolt that has our current version of check_range()
and a version that uses the immediately invoked lambdas and some compiler attributes. You can ctrl+f for "runner" to see the main change which is that the block of asm in compiler #2 (.L28
) has the tag [clone .cold]
which means those instructions are never pre-fetched. for compiler #1 the block of asm it jumps to (.L50
) is going to be pre-fetched because of the CPUs branch prediction.
https://gcc.godbolt.org/z/6zWTcY
I'm not sure how much these would save, but we do call these functions a lot.
Expected Output
Should be the same
Current Version:
v3.4.0