Skip to content

Feature/elementwise checks revert revert #2007

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Aug 20, 2020

Conversation

bbbales2
Copy link
Member

@bbbales2 bbbales2 commented Aug 8, 2020

Summary

This is the revert of the revert to put it back in the elementwise checks from @peterwicksstringfield from #1798

We reverted that pull before the release cause of performance things (https://discourse.mc-stan.org/t/cmdstan-2-24-release-candidate-now-available/16818/39).

The trick was the optimization discussed here: #1798 (comment)

Eigen must do some tricks to make their .isFinite() check fast. For eigen types, we run the fast checks first to see if something isn't finite before running the slower checks that produce the good error messages.

Tests

There are new tests in here: #1798, but none on top of that

Side Effects

Just the side effects of #1798 (a bugfix for sparse stuff)

Release notes

New functions is_nonnegative and is_positive_finite to parallel check_nonnegative and check_positive_finite. They signal failure by returning false instead of by throwing std::domain_error.

Functions check_not_nan, check_nonnegative, check_positive, check_finite, check_positive_finite, is_not_nan, is_nonnegative, is_positive, is_scal_finite, and is_positive_finite now operate on nested containers.

Clearer error messages when csr_u_to_z is called with out of range indices.

check_positive now throws domain error when given an unsigned 0.

(Copied those from #1798)

Checklist

  • Math issue Deeply nested containers and error checks #1635 (don't close)

  • Copyright holder: Peter Wicks Stringfield

    The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
    - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
    - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • dependencies checks pass, (make test-math-dependencies)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • the code is written in idiomatic C++ and changes are documented in the doxygen

  • the new changes are tested

@wds15
Copy link
Contributor

wds15 commented Aug 8, 2020

Should I run the model which uncovered the performance regression on this pr?

@bbbales2
Copy link
Member Author

bbbales2 commented Aug 8, 2020

@wds15 that's probably the easiest way to review. I used that model to check if I was actually changing anything.

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.02 4.13 0.97 -2.87% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.0 -0.28% slower
eight_schools/eight_schools.stan 0.09 0.09 0.99 -0.86% slower
gp_regr/gp_regr.stan 0.2 0.19 1.02 1.67% faster
irt_2pl/irt_2pl.stan 5.41 5.3 1.02 2.04% faster
performance.compilation 87.74 86.66 1.01 1.23% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.81 8.15 0.96 -4.39% slower
pkpd/one_comp_mm_elim_abs.stan 27.21 26.33 1.03 3.25% faster
sir/sir.stan 130.95 129.81 1.01 0.87% faster
gp_regr/gen_gp_data.stan 0.04 0.04 0.99 -0.73% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.99 3.32 0.9 -11.06% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.39 0.39 1.0 -0.12% slower
arK/arK.stan 1.78 1.85 0.96 -3.87% slower
arma/arma.stan 0.59 0.73 0.8 -24.26% slower
garch/garch.stan 0.53 0.61 0.86 -15.81% slower
Mean result: 0.96895430916

Jenkins Console Log
Blue Ocean
Commit hash: c602571


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Copy link
Collaborator

@andrjohns andrjohns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looking pretty good (assuming the speed issue has been fixed), just some minor comments

@bbbales2
Copy link
Member Author

I added a couple screen functions to check if the checks are necessary.

The tests are a bit awkwardly long (I just copy pasted from check_finite_test and check_not_nan_test, but I figured it was the easiest way to do them.

Copy link
Collaborator

@andrjohns andrjohns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looking good to me, just a couple global statements in tests. Otherwise good to go!

@@ -3,10 +3,10 @@
#include <limits>
#include <vector>

const char* function = "check_positive";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed a global here

#include <stan/math/prim/fun/constants.hpp>
#include <vector>

using stan::math::is_positive_finite;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Little global

@wds15
Copy link
Contributor

wds15 commented Aug 11, 2020

The timing of my benchmark on this is not looking good:

> ## 2.24.0
> new_time
   user  system elapsed 
175.047   0.709 176.083 
> ## develop on this PR
> dev_time
   user  system elapsed 
177.532   0.801 226.577 

Do I need to turn on lto for develop??

@bbbales2
Copy link
Member Author

Do I need to turn on lto for develop??

Perhaps, but can you upload model/data? The one I was using was like 6.2 seconds vs. 6.5 seconds, but I was comparing develop with this to develop without this.

@wds15
Copy link
Contributor

wds15 commented Aug 11, 2020

Same model, same data as before

@bbbales2
Copy link
Member Author

@wds15 I checked this on my copy of the model and this branch doesn't have performance regressions vs. develop. var_value got merged after 2.24 so I assume that is what you're seeing and -flto would hopefully correct that (but that's a different issue).

Test file and data:
blrm_test.zip

Run command:

./blrm sample num_warmup=5000 num_samples=5000 data file=blrm.data.R random seed=1

Runtime this branch:

#  Elapsed Time: 13.337 seconds (Warm-up)
#                16.734 seconds (Sampling)
#                30.071 seconds (Total)

Runtime develop:

#  Elapsed Time: 13.367 seconds (Warm-up)
#                16.903 seconds (Sampling)
#                30.27 seconds (Total)

Last line of output this branch:

-76.4678,1,0.205705,4,15,0,122.916

Last line of output develop:

-76.4678,1,0.205705,4,15,0,122.916

@wds15
Copy link
Contributor

wds15 commented Aug 11, 2020

Ok..let me rerun with lto then. I thought it would have not been needed for this model, since I don’t really know where jacobians are needed (only a gradient, so not set zero adjoint)...but let’s check.

@wds15
Copy link
Contributor

wds15 commented Aug 12, 2020

Ok, so with -flto everything is fine:

> ## 2.24.1 with lto
> new_time
   user  system elapsed 
174.819   1.042 176.906 
> ## develop on this PR with lto
> dev_time
   user  system elapsed 
177.024   1.020 179.240 

We need lto to be on by default on develop... will file the issue for that now.

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.23 4.26 0.99 -0.74% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.99 -1.24% slower
eight_schools/eight_schools.stan 0.09 0.09 1.01 0.57% faster
gp_regr/gp_regr.stan 0.2 0.2 0.99 -0.56% slower
irt_2pl/irt_2pl.stan 5.42 5.26 1.03 2.81% faster
performance.compilation 88.95 87.48 1.02 1.65% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.9 8.22 0.96 -4.05% slower
pkpd/one_comp_mm_elim_abs.stan 26.39 26.94 0.98 -2.1% slower
sir/sir.stan 131.9 131.1 1.01 0.61% faster
gp_regr/gen_gp_data.stan 0.05 0.05 1.0 0.2% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.15 3.31 0.95 -4.78% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.38 0.41 0.94 -6.41% slower
arK/arK.stan 1.78 1.83 0.97 -2.76% slower
arma/arma.stan 0.73 0.59 1.24 19.47% faster
garch/garch.stan 0.53 0.61 0.87 -14.73% slower
Mean result: 0.997031123861

Jenkins Console Log
Blue Ocean
Commit hash: 14f08d9


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@bbbales2
Copy link
Member Author

@andrjohns tests passed so this is good to review again

@andrjohns andrjohns merged commit a406e8a into develop Aug 20, 2020
@rok-cesnovar rok-cesnovar deleted the feature/elementwise_checks_revert_revert branch August 20, 2020 06:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants