Skip to content

use double loop for operator-/+ #2223

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Dec 12, 2020
Merged

Conversation

SteveBronder
Copy link
Collaborator

@SteveBronder SteveBronder commented Dec 1, 2020

Summary

Fixes the linear access bug found in #2213 by using a loop over the rows and columns

Tests

Adds tests that check add/subtract(x, x.transpose()) has the correct adjoint

Side Effects

Nope

Release notes

Checklist

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.46 3.54 0.98 -2.19% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.98 -2.05% slower
eight_schools/eight_schools.stan 0.12 0.11 1.02 2.1% faster
gp_regr/gp_regr.stan 0.16 0.17 0.98 -1.78% slower
irt_2pl/irt_2pl.stan 5.81 5.83 1.0 -0.42% slower
performance.compilation 88.28 85.89 1.03 2.71% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.43 8.43 1.0 -0.07% slower
pkpd/one_comp_mm_elim_abs.stan 29.06 29.44 0.99 -1.31% slower
sir/sir.stan 138.07 131.86 1.05 4.49% faster
gp_regr/gen_gp_data.stan 0.05 0.04 1.03 2.61% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.92 2.93 1.0 -0.06% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.39 0.39 1.01 0.95% faster
arK/arK.stan 1.79 1.8 0.99 -0.64% slower
arma/arma.stan 0.73 0.74 0.99 -1.13% slower
garch/garch.stan 0.61 0.61 1.0 0.17% faster
Mean result: 1.00263634182

Jenkins Console Log
Blue Ocean
Commit hash: be3c985


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@bbbales2 bbbales2 self-assigned this Dec 3, 2020
Copy link
Member

@bbbales2 bbbales2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a chance this could be a problem:

inline auto add(const Var& a, const VarMat& b) {

Will this work if the input is row major?

Will this work if the input is a block of another matrix?

@t4c1
Copy link
Contributor

t4c1 commented Dec 4, 2020

@bbbales2 Blocks do not support linear indexing, so you can not traverse them using a single loop. So the linked file will not work with blocks.

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.53 3.49 1.01 1.09% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.01 1.17% faster
eight_schools/eight_schools.stan 0.12 0.11 1.04 4.2% faster
gp_regr/gp_regr.stan 0.17 0.16 1.0 0.21% faster
irt_2pl/irt_2pl.stan 5.81 5.78 1.01 0.6% faster
performance.compilation 86.7 85.68 1.01 1.18% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.44 8.44 1.0 -0.04% slower
pkpd/one_comp_mm_elim_abs.stan 28.62 30.9 0.93 -7.97% slower
sir/sir.stan 129.15 134.61 0.96 -4.23% slower
gp_regr/gen_gp_data.stan 0.05 0.04 1.06 5.38% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.92 2.95 0.99 -1.1% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.38 0.38 1.01 0.72% faster
arK/arK.stan 1.77 1.79 0.99 -0.72% slower
arma/arma.stan 0.74 0.74 1.0 -0.3% slower
garch/garch.stan 0.61 0.61 1.0 0.49% faster
Mean result: 1.0013317653

Jenkins Console Log
Blue Ocean
Commit hash: cb9af42


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@bbbales2
Copy link
Member

bbbales2 commented Dec 8, 2020

@SteveBronder I want to get this specific fix in so #2213 can move again, so I went to investigate the scalar + matrix comment here: #2223 (review)

I added two tests to add, subtract, elt_multiply, and elt_divide, and I updated a thing in the tests where we were doing linear indexing there and shouldn't be.

This is an easy way we can test that functions work with transposes:

TEST(MathMixMatFun, add_transpose_test) {
  auto f = [](const auto& x) {
    return x + x.transpose();
  };

  Eigen::MatrixXd x(2, 2);

  stan::test::expect_ad_matvar(f, x);
}

Maybe that should be just the x.transpose() as well. Your tests were already good enough for operator+ and operator-, but I added these tests for elt_multiply and elt_divide too.

The other test I added was:

TEST(MathMixMatFun, add_transpose_test_scalar) {
  auto f = [](const auto& y, const auto& x) {
    return stan::math::add(y, x.block(0, 0, 2, 2));
  };

  Eigen::MatrixXd x(3, 3);

  stan::test::expect_ad_matvar(f, 1.0, x);
}

And I get compile errors. It seems like arena_t and such are not quite working right with block types yet. This is a test that should show a bug with: #2223 (review)

Anyway, I pushed that stuff up here. I would like to get the original fix you had in place for #2213 (just getting operator+ to work), so that that pull can move along too.

So feel free to remove whatever bits that I added that you don't want in this pull, and we can do that later (for instance, we don't have to fix block types for varmat here if they are messed up). The elt_divide and elt_multiply x + x.transpose() tests should stay, and the test_ad changes should as well though.

@bbbales2
Copy link
Member

bbbales2 commented Dec 8, 2020

have to fix block types for varmat here if they are messed up

And if block types are messed up and there is no easy fix, then that also means the test for #2223 (review) won't work, which means that that issue can't be verified and fixed, so if we don't fix that here, make a note in #2245

@SteveBronder
Copy link
Collaborator Author

Thanks for writing these! I think it's a good idea to fix them here so I'll update today

@SteveBronder
Copy link
Collaborator Author

I took a hard look at this and there were a couple underlying issues

  • I missed a loop (yeesh!)

  • One nice thing about var<matrix> is that when we do arena_t<> on something like a var_value<Eigen::Block<>> we just get back the var_value<Eigen::Block<>> so no copies get made. But this is bad for return types where we don't want our functions to be returning back something silly like a var_value<Eigen::Block<>> (I think this would fail anyway). So I added a return_var_matrix_t<> that returns plain types and changed promote_var_matrix_t<> so that it will pass through var_value's with inner expression types. return_var_matrix_t<> will always return a var_value with an inner plain type while promote_var_matrix_t<var_value<Eigen::Block<>>> just returns back var_value<Eigen::Block<>>.

I also made a specialization for plain_type_t<> so that when it sees a var_value<Eigen::Block<>> it runs plain_type_t over the inner type to get back a var_value<Eigen::Matrix>. I went through a bunch of the functions and standardized / fixed up anywhere this could happen

@bbbales2
Copy link
Member

bbbales2 commented Dec 8, 2020

@SteveBronder sounds good, I'll get a review for this tomorrow

Copy link
Member

@bbbales2 bbbales2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review!

Edit: There were a few minor/simple things I just fixed, but I left everything with questions.

using return_t = promote_var_matrix_t<T2, T1, T2>;
arena_t<return_t> res = value_of(A) * value_of(arena_B).array();
reverse_pass_callback([A, arena_B, res]() mutable {
arena_B.adj().array() += value_of(A) * res.adj().array();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a bug with A right? I wonder how this was passing. This should have segfaulted or given us a sanitizer error (unless A's destructor was called some other way or something).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's not a bug if A is a scalar, but anyway not sure if something else is going on, cause if it's a scalar, then it may as well have stayed the way it was.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't a bug, it just looked very must like a bug. I started fixing it because At first I thought it was a bug but then realized it's just a scalar type so it's fine. But then I was like well if it looks like a bug then at the very least it's probably bad code so I just kept the fix in here

@@ -135,31 +135,37 @@ template <typename T1, typename T2, require_not_matrix_t<T1>* = nullptr,
require_not_row_and_col_vector_t<T1, T2>* = nullptr>
inline auto multiply(const T1& A, const T2& B) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs on this function say the first argument should be a scalar, but the code makes it look like it's been upgraded to handle vector/row vector arguments. Why the change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

@@ -66,7 +66,7 @@ inline plain_type_t<T> matrix_power(const T& M, const int n) {
arena_M.adj() += adj_M + adj_C;
});

return res;
return ret_type(res);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

reverse_pass_callback([ret, b]() mutable { b.adj() -= ret.adj().sum(); });
return ret_type(ret);
return plain_type_t<ret_type>(ret);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plain_type_t not necessary here or on line 235

@bbbales2
Copy link
Member

bbbales2 commented Dec 9, 2020

@SteveBronder looks like the tests failed with some sort of RowMajor/Column Major problem:

stan::math::test::type_name() [Arg = stan::math::var_value<Eigen::Matrix<double, -1, 1, 0, -1, 1>, void> returns a  stan::math::test::type_name() [Arg = stan::math::var_value<Eigen::Matrix<double, 1, 1, 1, 1, 1>, void> but should return  stan::math::test::type_name() [Arg = stan::math::var_value<Eigen::Matrix<double, 1, 1, 0, 1, 1>, void>

@bbbales2
Copy link
Member

@rok-cesnovar @serban-nicusor-toptal the build here is saying gelman-group-linux went offline. Is that something that can be worked around or do we need to reboot a computer somewhere?

@rok-cesnovar
Copy link
Member

Needs a physical reboot I am afraid.

We are working on enabling OpenCL tests on the Windows Columbia machine so we have two machines that can run this test.

@bbbales2
Copy link
Member

Oooo, and this one lives on campus? Is this a Goodrich thing or a G-man thing?

@rok-cesnovar
Copy link
Member

I believe Nic usually writes to Ben G., other than that I have no idea where it resides.

@bbbales2
Copy link
Member

Okeedokee I will write the Newyorkerz

@serban-nicusor-toptal
Copy link
Contributor

serban-nicusor-toptal commented Dec 10, 2020

Hey, linux machine is back online! Sorry for the trouble. ( It went through some maintenance and its IP changed )

@SteveBronder
Copy link
Collaborator Author

@serban-nicusor-toptal what version of g++ are we using on jenkins?

I'm seeing

g++ -Werror  -std=c++1y -pthread -D_REENTRANT -Wno-sign-compare -Wno-ignored-attributes      -I lib/stan_math/lib/tbb_2019_U8/include   -O0 -I src -I . -I lib/stan_math/ -I lib/stan_math/lib/eigen_3.3.9 -I lib/stan_math/lib/boost_1.72.0 -I lib/stan_math/lib/sundials_5.5.0/include -I lib/stan_math/lib/benchmark_1.5.1/googletest/googletest/include -I lib/stan_math/lib/benchmark_1.5.1/googletest/googletest -I lib/stan_math/lib/benchmark_1.5.1/googletest/googletest/include -I lib/stan_math/lib/benchmark_1.5.1/googletest/googletest      -DBOOST_DISABLE_ASSERTS       -DGTEST_HAS_PTHREAD=0 -DGTEST_HAS_PTHREAD=0  -c -o /dev/null -include test/test-models/good/function-signatures/distributions/univariate/continuous/wiener/wiener_log_27.hpp test/test-model-main.cpp
g++: internal compiler error: Terminated (program cc1plus)
Please submit a full bug report,

@serban-nicusor-toptal
Copy link
Contributor

serban-nicusor-toptal commented Dec 11, 2020

Integration linux runs on linux tag so both aws and gelman-linux.
gelman-linux: g++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
aws: g++ (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609

Edit: I've updated aws to g++ (Ubuntu 9.3.0-10ubuntu2~16.04) 9.3.0 and restarted the job here

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.44 3.47 0.99 -0.95% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.06 5.75% faster
eight_schools/eight_schools.stan 0.12 0.11 1.05 5.19% faster
gp_regr/gp_regr.stan 0.15 0.15 1.02 1.59% faster
irt_2pl/irt_2pl.stan 5.83 5.46 1.07 6.32% faster
performance.compilation 88.17 85.75 1.03 2.74% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.42 8.39 1.0 0.38% faster
pkpd/one_comp_mm_elim_abs.stan 29.62 29.7 1.0 -0.27% slower
sir/sir.stan 141.36 141.7 1.0 -0.24% slower
gp_regr/gen_gp_data.stan 0.04 0.04 0.99 -1.26% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.94 2.92 1.01 0.82% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.4 0.38 1.06 5.29% faster
arK/arK.stan 2.46 2.49 0.99 -1.28% slower
arma/arma.stan 0.6 0.59 1.01 0.5% faster
garch/garch.stan 0.6 0.61 0.99 -0.61% slower
Mean result: 1.01699344156

Jenkins Console Log
Blue Ocean
Commit hash: 02eaf83


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

for (Eigen::Index i = 0; i < A_vm_f.size(); ++i) {
for (Eigen::Index j = 0; j < A_mv_f.size(); ++j) {
for (Eigen::Index j = 0; j < A_mv_f.cols(); ++j) {
for (Eigen::Index i = 0; i < A_vm_f.rows(); ++i) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol, two sets of indices and I screwed up two things. oof.

Copy link
Member

@bbbales2 bbbales2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good!

@bbbales2 bbbales2 merged commit 261dec0 into develop Dec 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants