-
-
Notifications
You must be signed in to change notification settings - Fork 193
Vectorize checks called by compiler #2556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…4.1 (tags/RELEASE_600/final)
…4.1 (tags/RELEASE_600/final)
…4.1 (tags/RELEASE_600/final)
Jenkins Console Log Machine informationProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: G++: Clang: |
@rok-cesnovar @serban-nicusor-toptal it looks like the github actions are taking v long and it won't let me look at the raw logs. Is there some way to see what's going on? |
Jenkins Console Log Machine informationProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: G++: Clang: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to time myself out at 30 comments. A lot of them are redundant around two themes:
- specifying what the function does in the first sentence of the doc
- clarifying strict vs. non-strict comparison
- efficiency concerns around the eager use of
make_iter_name
- documenting what the constraints enforce
- not confusing types and values in the doc
Other than the efficiency tests, these are all minor. I'm happy to help with the doc changes once it's clarified for each test whether it's a strict inequality or not.
* elements are all positive. Note that Cholesky factors need not | ||
* be square, but require at least as many rows M as columns N | ||
* (i.e., M >= N). | ||
* @tparam StdVec A standard vector with inner type inheriting from `MatrixBase` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[question]
Should this be checked? Or is that all that's going to get passed to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only have a check_cholesky_factor
that works for this definition so if someone tried passing something like a std::vector<std::vector<double>>
it would just fail to compile
@@ -16,7 +19,7 @@ namespace math { | |||
* elements are all positive. Note that Cholesky factors need not | |||
* be square, but require at least as many rows M as columns N | |||
* (i.e., M >= N). | |||
* @tparam EigMat Type of the Cholesky factor (must be derived from \c | |||
* @tparam Mat Type of the Cholesky factor (must be derived from \c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[optional]
What convention does Eigen use for these matrix args? I think it'd be nice to follow that. I think they may call it Derived
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Eigen they use Derived
in places like
template<typename Derived>
bool solveInPlace(MatrixBase<Derived> &bAndX)
Where Derived
is the inner type used in the CRTP of MatrixBase
. Here we accept anything derived from MatrixBase
. This function also takes in `var_valueEigen::Matrix types and I'll update the docs to reflect that
* factor, if number of rows is less than the number of columns, | ||
* if there are 0 columns, or if any element in matrix is NaN | ||
*/ | ||
template <typename StdVec, require_std_vector_t<StdVec>* = nullptr> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[optional]
I'd rather keep shorter template parameters, like just V
for standard vectors or maybe C
for generic containers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For very generic functions I like using V
or T
etc. but for functions with specific requirements I like that the template parameter's name gives an idea of what the requirement is to use the function.
void check_cholesky_factor(const char* function, const char* name, | ||
const StdVec& y) { | ||
for (size_t i = 0; i < y.size(); ++i) { | ||
check_cholesky_factor(function, internal::make_iter_name(name, i).c_str(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a performance evaluation as it's going to proactively create string names for each entry, which is pretty expensive.
[question]
Is there a way to be lazy and avoid creating the name until the check fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with running a little performance check here, I def expect it to be slower for small matrices though hopefully not much.
Is there a way to be lazy and avoid creating the name until the check fails?
I tried thinking about this but the only thing I could figure out is to change all of the checks to take in a lambda instead of a const char* that doesn't evaluate until a throw occurs.
void check_cholesky_factor_corr(const char* function, const char* name, | ||
const StdVec& y) { | ||
for (size_t i = 0; i < y.size(); ++i) { | ||
check_cholesky_factor_corr(function, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question for all of these and efficiency.
stan/math/prim/err/check_less.hpp
Outdated
}(); | ||
} | ||
} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[question]
What happens with things like arrays of matrices? In that case, there are more than two indexes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this function is template by
template <typename T_y, typename T_high,
require_all_matrix_t<T_y, T_high>* = nullptr>
So it requires the template is either an Eigen matrix types or var_value<MatrixXd>
types. The ones below templated with requires for std_vectors
are the ones that look over arrays (and arrays of matrices etc.)
stan/math/prim/err/check_less.hpp
Outdated
} | ||
|
||
/** | ||
* Check if each element of <code>y</code> is strictly less than each associated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[question]
Is this strictly less or less than or equal? I'm confused about what's being checked in these. For our constraints, all the checks should be less than or equal in order to deal with rounding/underflow errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we have check_less
and check_less_or_equal
which works well with the rounding/underflow errors. There's other parts of Stan math which use check_less
, should those be changed to check_less_or_equal
? I think we would want to do that in a separate PR
stan/math/prim/err/check_simplex.hpp
Outdated
* @tparam C Eigen column type, either 1 if we have a column vector | ||
* or -1 if we have a row vector. Moreover, we either have | ||
* R = 1 and C = -1 or R = -1 and C = 1. | ||
* @tparam T A type inheriting from EigenBase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[optional]
Only sentences should have periods after them. Sorry the doc's already so inconsistent.
@@ -48,6 +52,26 @@ void check_unit_vector(const char* function, const char* name, | |||
} | |||
} | |||
|
|||
/** | |||
* Check if the each element in a standard vector is a unit vector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should say "unit Euclidean length" to specify what unit vector means here. "unit_vector" is a type in the Stan language.
Jenkins Console Log Machine informationProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: G++: Clang: |
@bob-carpenter fixed up the everything so this is ready for another look |
@bob-carpenter actually hang on I ran the performance checks and there's a regression I need to fix |
…4.1 (tags/RELEASE_600/final)
…o fix/check-less-greater
…4.1 (tags/RELEASE_600/final)
…4.1 (tags/RELEASE_600/final)
@bob-carpenter Alright so I think I have everything sorted out. I made a benchmark here for the vector case and the results seem good. The number in This PR--------------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------------
check_ge/2/manual_time 26.7 ns 57.1 ns 26208169
check_ge/4/manual_time 26.8 ns 57.1 ns 26002894
check_ge/8/manual_time 27.6 ns 57.9 ns 25198473
check_ge/16/manual_time 30.1 ns 60.4 ns 23274275
check_ge/32/manual_time 37.4 ns 67.6 ns 18707376
check_ge/64/manual_time 62.7 ns 93.0 ns 11294734
check_ge/128/manual_time 91.9 ns 122 ns 7793109
check_ge/256/manual_time 157 ns 187 ns 4503566
check_ge/512/manual_time 271 ns 301 ns 2528503
check_ge/1024/manual_time 521 ns 552 ns 1232462
check_ge/2048/manual_time 1007 ns 1037 ns 679082
check_ge/4096/manual_time 1978 ns 2008 ns 354471 Develop--------------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------------
check_ge/2/manual_time 24.8 ns 55.1 ns 28259239
check_ge/4/manual_time 25.9 ns 56.2 ns 26950003
check_ge/8/manual_time 27.6 ns 57.9 ns 25538389
check_ge/16/manual_time 30.4 ns 60.8 ns 22979588
check_ge/32/manual_time 37.1 ns 67.4 ns 18895034
check_ge/64/manual_time 58.5 ns 88.7 ns 11736093
check_ge/128/manual_time 91.4 ns 122 ns 7773196
check_ge/256/manual_time 151 ns 181 ns 4606845
check_ge/512/manual_time 261 ns 291 ns 2675818
check_ge/1024/manual_time 506 ns 536 ns 1348204
check_ge/2048/manual_time 1019 ns 1049 ns 694327
check_ge/4096/manual_time 1968 ns 1998 ns 356119 But what should I benchmark against for the ones that have To benchmark this right now I'm doing this for the current PR where we use the vectorized version and like what stanc3 does here for develop where we just a loop over There's def a big cost for constructing these names correctly. If we're not cool with paying that cost then I'm fine with just reporting This PR
Develop
|
So this PR would then also fix stan-dev/stanc3#676 is what you are saying? And that fix comes with a performance penalty? Not sure I completely follow. |
Yes it also does that fix (once these are used in the compiler). The performance penalty comes from constructing the string for the index number. |
…4.1 (tags/RELEASE_600/final)
…4.1 (tags/RELEASE_600/final)
Jenkins Console Log Machine informationProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: G++: Clang: |
@bob-carpenter this is ready for another look (and if you can look at the perf tests of the above) |
Interesting---so this isn't introducing any performance regressions in our current code? Should we run that again to be sure? I don't think we'd be willing to take a performance regression to check indices. Or is the code we'd be replacing also inefficient? I may be wrong here (and would very much like to be), but the code pattern seems to be something like this:
That is, the message string gets constructed eagerly. If that's not the case, then no worries on this PR and I just misunderstood. Rather than the above, it's more efficient to use this pattern:
The problem is passing the indexes down to the embedded check. I think the way to do that in this code would be to have the check functions take in zero or more indexes which would get appended to the names if there are errors. It'd require some fast footwork to do it in general with variadic last arguments consisting of the list of current indices. That way, you don't ever need to construct a string until there's an error and the string construction loop gets unfolded without any redundant copying. Do you think that'd be feasible? Or worthwhile? If we really are checking eagerly, I think it'd be a huge speed win. And it'd explain why everyone's been wanting to remove checks from the code! |
Yes that's a great idea!! With those changes the speed checks for This PR
Develop
|
Though I only implemented that for the
I can do (2) but imo I'd rather do it in another PR since this PR is pretty darn big already. I think we can do quite a lot to actually speed up these checks. Like I think |
…4.1 (tags/RELEASE_600/final)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Especially for the extensive answers to all of the questions I had reading the code.
Jenkins Console Log Machine informationProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: G++: Clang: |
Summary
This PR vectorizes the
check_*
functions that the compiler generates for objects created intransformed parameters/data / generated quantities
The stan compiler currently calls several of the
check_*
functions for transformed params (specifically the ones here where each check is generated ascheck_<matched_bound_name>
). For several of these checks the compiler will generate afor
loop to iterate over each underlying element of the matrix/vector/array, but for the new matrix type we don't want iterations like this that have to look at individual elements.So with this PR instead of stanc generating the following to test if an array of vectors is greater than or equal to another vector,
it can just generate
You'll notice this also fixes a bug in the compiler where if an error occurred we would call the name of the thrown object
"tp_9[sym1__, sym2__]"
. In this impl we clean that up so that the actual iteration number for arrays / vectors / matrices is thrown such astp_9[1][5]
.Tests
Tests were changed for each check to test the vectorized version of the inputs (which then checks the underlying impls for matrices and scalars.) Tests can be run with
Side Effects
Nope!
Release notes
Vectorize checks called by stanc compiler
Checklist
Math issue How to add static matrix? #1805
Copyright holder: Steve Bronder
The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
the basic tests are passing
./runTests.py test/unit
)make test-headers
)make test-math-dependencies
)make doxygen
)make cpplint
)the code is written in idiomatic C++ and changes are documented in the doxygen
the new changes are tested