Skip to content

adds macro for making bounds checks a no-op #2423

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Mar 19, 2021

Conversation

SteveBronder
Copy link
Collaborator

@SteveBronder SteveBronder commented Mar 13, 2021

Summary

From issue #2420 the default in Stan now is to always perform bounds checks for single index access. This has a noticeable slowdown for users. This PR uses a STAN_NO_RANGE_CHECKS macro which if defined then makes the range and bounds checks into no-ops. It does this by defining another macro STAN_NO_RANGE_CHECK_RETURN which is equal to return; if STAN_NO_RANGE_CHECKS is defined.

At first I thought we could just use NDEBUG but it seems that is specifically to turn on and off asserts such as eigen_assert() or the standard assert(). Our bounds checks are sort of an assert, though I think it's safer just to have a new macro.

It's possible for the user to specify this in the makefile, or we could have an option for the compiler like -fno-debug which at the top of the stan model would define STAN_NO_RANGE_CHECKS

Tests

tbh I'm not really sure how to test this since it changes behavior in the preprocessor. We could have a separate jenkins node that does the checks just for these errors? Since they are all no-ops it would be pretty fast

Side Effects

Yes this whole PR is a side effect to turn off bounds checks.

Release notes

Adds STAN_NO_RANGE_CHECKS macro which if defined turns off bounds and range checks

Checklist

  • Math issue [FR] No bounds checks when NDEBUG flag is set #2420

  • Copyright holder: Steve Bronder

    The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
    - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
    - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • dependencies checks pass, (make test-math-dependencies)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • the code is written in idiomatic C++ and changes are documented in the doxygen

  • the new changes are tested

@rok-cesnovar
Copy link
Member

If we will be defining a separate macro, which I think is the right call here, might as well give it a more descriptive name? Most of Stan users are not C++ developers and NDEBUG does not mean much to them. Maybe STAN_NO_RANGE_CHECKS or just NO_RANGE_CHECKS.

@rok-cesnovar
Copy link
Member

As for testing, I think easiest is a Github Action job that runs a list of tests. These standalone short tests are perfect for GHA as its free.

@SteveBronder
Copy link
Collaborator Author

If we will be defining a separate macro, which I think is the right call here, might as well give it a more descriptive name? Most of Stan users are not C++ developers and NDEBUG does not mean much to them. Maybe STAN_NO_RANGE_CHECKS or just NO_RANGE_CHECKS.

I have to think about that. Messing with macro's is certainly an advanced user function. I use STAN_NO_RANGE_CHECKS for the return;, though I could check if it exists and then undef and def it

As for testing, I think easiest is a Github Action job that runs a list of tests. These standalone short tests are perfect for GHA as its free.

Nice! Yeah that makes total sense I'll add the tests and make a gh action

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.49 3.47 1.01 0.6% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.0 0.24% faster
eight_schools/eight_schools.stan 0.11 0.11 1.01 1.31% faster
gp_regr/gp_regr.stan 0.16 0.16 0.98 -1.88% slower
irt_2pl/irt_2pl.stan 5.25 5.24 1.0 0.09% faster
performance.compilation 90.87 88.9 1.02 2.16% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 9.06 9.06 1.0 0.1% faster
pkpd/one_comp_mm_elim_abs.stan 29.64 28.98 1.02 2.23% faster
sir/sir.stan 129.03 128.78 1.0 0.2% faster
gp_regr/gen_gp_data.stan 0.03 0.04 0.94 -6.43% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.12 3.1 1.01 0.72% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.37 0.37 1.01 0.87% faster
arK/arK.stan 1.91 1.9 1.01 0.85% faster
arma/arma.stan 0.72 0.72 1.01 0.84% faster
garch/garch.stan 0.52 0.51 1.02 2.16% faster
Mean result: 1.00312493554

Jenkins Console Log
Blue Ocean
Commit hash: b9fe223


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Copy link
Member

@bbbales2 bbbales2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment.

@@ -74,6 +74,7 @@ struct bounded<T_y, T_low, T_high, true> {
template <typename T_y, typename T_low, typename T_high>
inline void check_bounded(const char* function, const char* name, const T_y& y,
const T_low& low, const T_high& high) {
STAN_NO_RANGE_AND_SIZE_CHECK;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used to check that real values are in a certain range (like probabilities are bounded [0, 1]) as well as indexes in range.

It looks like this pull is only turning off range checks, but why not value checks too? I think you'd turn them off in similar situations for the same reasons.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I forgot this was used for both. Umm I think @syclik had some opinions on this. Also pinging @wds15. I figure this is something users have to explicitly turn on so I'm more in favor of also not doing value checks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Value checks? Like what?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not NaN, not infinite, positive definite, valid unit vector, valid simplex -- this sort of thing.

Copy link
Contributor

@wds15 wds15 Mar 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a different story which one should be able to turn off separately (if at all), I think. Something like STAN_FAST_MATH (a synonym for "not safe math, do it fast")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we also not going to remove size checks?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but what is wrong with two macros which do a more tailored thing themselves

I don't see why one would be useful without another -- unsafe is unsafe and it's just hard to remember multiple things.

not going to remove size checks?

Optimizing out size checks makes sense with this macro I think. check_bounded is probly the only one that needs to go if we're not using values.

If we're going for STAN_NDEBUG in the future, does it make sense to just start with that? Or STAN_UNSAFE or STAN_NO_DEBUG since Rok pointed out NDEBUG probably doesn't mean much to people? Then in the future if we decide to split that into multiple things we can and the overall super-unsafe easy-mode flag is still there.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd like to leave the size checks for another time because it changes the exceptions that are returned for some functions like check_symmetric() etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going for STAN_NDEBUG in the future, does it make sense to just start with that? Or STAN_UNSAFE or STAN_NO_DEBUG since Rok pointed out NDEBUG probably doesn't mean much to people? Then in the future if we decide to split that into multiple things we can and the overall super-unsafe easy-mode flag is still there.

Let's do this in a separate PR once we sort out what that means. Right now I like just having an option to turn off the range checks, then in another PR or an issue we can sort out what we want turned on and off

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

@t4c1
Copy link
Contributor

t4c1 commented Mar 15, 2021

I use STAN_NO_RANGE_CHECKS for the return;, though I could check if it exists and then undef and def it

You should probably rename that to STAN_NO_RANGE_CHECKS_RETURN and define that if a user defines STAN_NO_RANGE_CHECKS.

Comment on lines 7 to 21
#ifdef __has_attribute
#if __has_attribute(noinline) && __has_attribute(cold)
/**
* Functions tagged with this attribute are not inlined and moved
* to a cold branch to tell the CPU to not attempt to pre-fetch
* the associated function.
*/
#define STAN_COLD_PATH __attribute__((noinline, cold))
#else
#define STAN_COLD_PATH
#endif
#else
#define STAN_COLD_PATH
#endif
#else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes seem unrelated to what this PR is doing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little bit of a fix, if someone is using either not gcc or clang then we are not guaranteed __has_attribute is defined. So to compile on those we want to make sure that macro is defined so that we can then check whether the compiler has the attributes noinline and cold

Copy link
Contributor

@t4c1 t4c1 Mar 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but you could simplify this into:

Suggested change
#ifdef __has_attribute
#if __has_attribute(noinline) && __has_attribute(cold)
/**
* Functions tagged with this attribute are not inlined and moved
* to a cold branch to tell the CPU to not attempt to pre-fetch
* the associated function.
*/
#define STAN_COLD_PATH __attribute__((noinline, cold))
#else
#define STAN_COLD_PATH
#endif
#else
#define STAN_COLD_PATH
#endif
#else
#ifdef __has_attribute
#if __has_attribute(noinline) && __has_attribute(cold)
/**
* Functions tagged with this attribute are not inlined and moved
* to a cold branch to tell the CPU to not attempt to pre-fetch
* the associated function.
*/
#define STAN_COLD_PATH __attribute__((noinline, cold))
#endif
#endif
#ifndef STAN_COLD_PATH
#define STAN_COLD_PATH
#endif

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.42 3.4 1.01 0.73% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.97 -2.69% slower
eight_schools/eight_schools.stan 0.11 0.11 0.97 -3.32% slower
gp_regr/gp_regr.stan 0.16 0.16 1.01 0.93% faster
irt_2pl/irt_2pl.stan 5.33 5.24 1.02 1.8% faster
performance.compilation 90.57 88.71 1.02 2.04% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.89 9.07 0.98 -2.02% slower
pkpd/one_comp_mm_elim_abs.stan 30.68 29.3 1.05 4.5% faster
sir/sir.stan 131.63 130.73 1.01 0.68% faster
gp_regr/gen_gp_data.stan 0.04 0.04 1.0 0.01% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.08 3.08 1.0 -0.1% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.38 0.37 1.02 2.36% faster
arK/arK.stan 1.93 1.9 1.02 1.66% faster
arma/arma.stan 0.63 0.65 0.97 -2.72% slower
garch/garch.stan 0.51 0.52 0.98 -1.8% slower
Mean result: 1.00185071967

Jenkins Console Log
Blue Ocean
Commit hash: 2f2fe23


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Copy link
Member

@bbbales2 bbbales2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me. Since we've been doing this pull by committee, I'll leave it to someone else to also look over and click the merge button.

@@ -2,13 +2,48 @@
#define STAN_MATH_PRIM_META_COMPILER_ATTRIBUTES_HPP

#ifdef __GNUC__
#ifndef likely
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it that old gccs didn't have likely and new ones do?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another package could have it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants