Skip to content

Simplifying existing autodiff to prepare for adding var_value<Matrix<T, R, C>> support to rev/fun #2019

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 491 commits into from

Conversation

bbbales2
Copy link
Member

@bbbales2 bbbales2 commented Aug 18, 2020

Summary

This is an attempt to reimplement reverse mode autodiff using reverse_pass_callback in preparation for adding var_value<Matrix<T, R, C>> to rev/fun.

Side Effects

It's possible this is slower than the old stuff. Dunno till it's benchmarked.

Release notes

Re-implemented reverse mode autodiff using reverse_pass_callback and new arena tools

Checklist

  • Math issue Simplifying current reverse mode functions #2018

  • Copyright holder: Columbia University

    The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
    - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
    - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • dependencies checks pass, (make test-math-dependencies)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • the code is written in idiomatic C++ and changes are documented in the doxygen

  • the new changes are tested

t4c1 and others added 30 commits August 5, 2020 14:55
…var_value's grad so that it is only available when the value_type of the var_value is not a container
…/math into vari_base_ops_partials

# Conflicts:
#	stan/math/rev/functor/ad_stack_matrix.hpp
return {new internal::determinant_vari<T::RowsAtCompileTime,
T::ColsAtCompileTime>(m)};
auto arena_m = to_arena(m);
auto m_lu = arena_m.val().partialPivLu();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same algorithm that was being used previously. The implementation just looks different, see #2120

@bbbales2
Copy link
Member Author

bbbales2 commented Oct 5, 2020

@SteveBronder this is ready for review. I'm gonna work on the red light, but I tried to go through everything and update the code style to be like what we're converging on.

I also benchmarked and optimized everything. It should be good to go. I'm going to update the benchmarks overnight. You can browse them as of last night here (they're pretty okay): https://bbbales2.shinyapps.io/math-shiny-perf/

I wrote these things to be as fast as possible for mat<var>. We might have to branch some of them for var<mar>, but probably a lot should just be a matter of changing the return type calculations and adding tests. I did a lot of testing breaking stuff out into different variations for different template argument types. If it wasn't faster, I went back to the single if-statement version.

@SteveBronder
Copy link
Collaborator

Running through these now. Because it's a larger PR it's a bit easier for me to make another branch with suggested changes I'm working on / benching over here

feature/convert_to_tadj1...feature/convert-to-tadj1-changes

The tl;dr so far is that there's a good few functions here we know for a fact are much better and should port them into a separate PR so they can go in this release. But there's a handful of functions that I'm wishy washy on (gp_exp_quad_cov and the like among them) and need more work.

Why don't we cherry pick the ones that are much faster and are probably used a lot like

  • unit_vector_constrain
  • tcrossprod
  • simplex_constrain
  • log_softmax
  • mdivide_left_tri
  • ordered_constrain
  • divide

@bbbales2
Copy link
Member Author

bbbales2 commented Oct 6, 2020

Yo you around for a call? I don't like breaking this up.

The point was to convert stuff over. If there is something that is too slow, let's fix it now while we're going through it.

I see mat<var> stuff showing up here too which was not the intention of this pull either, though I'm not opposed to working on a mega-pull with that stuff too.

@bbbales2
Copy link
Member Author

bbbales2 commented Oct 8, 2020

I checked out the benchmarks for this. The comparison we want I think is this branch vs. develop.

So on the benchmark page (https://bbbales2.shinyapps.io/math-shiny-perf/)

Set the left dropdown to develop_50d56cc7ce_matvar and the right dropdown to convert_to_tadj1_654aa5119a_matvar

With that I think the minimum list of functions we want to move are:

matrix_power
tcrossprod (have to look at graphs for this -- the table is messed up)
log_softmax
multiply_lower_self_tri_etc_etc
rows_dot_product
log_determinant
mdivide_left
mdivide_left_tri
columns_dot_product

I would also like determinant, trace_quad_form and quad_form, but at this point the benchmarks get fishy.

It says this branch vs. development that the minimum speedup for dot product is 0.9 and the maximum speedup is 1.5. Similarly for matrix multiply. I don't believe it since neither of those things change, so something is going on. It might be that we need to randomize the order of the test repetition or running two at a time is bad.

I only want to do mat<var>. We can pick up var<mat> after Monday.

@bbbales2 bbbales2 marked this pull request as draft October 15, 2020 15:51
@bbbales2
Copy link
Member Author

@SteveBronder could you post the gp_exp_quad_cov benchmark you were using?

@SteveBronder
Copy link
Collaborator

I was just using the one you wrote

@bbbales2
Copy link
Member Author

There's a handful of functions like gp_exp_quad_cov for the general case that are much much slower than what's in develop right now (I ran the gp benchmark example with that branch and it's 13,000% slower!)

Was that a 13% (makes sense) or 13000% (I would not expect this and this would bear further investigation)?

@SteveBronder
Copy link
Collaborator

Ah that came from the cmdstan performance tests repo

https://github.com/stan-dev/performance-tests-cmdstan

You can also see it here when the performance tests ran on this PR

#2019 (comment)

@bbbales2
Copy link
Member Author

OOOo, nice, I was not paying attention to those numbers! Thanks

@bbbales2
Copy link
Member Author

I think we've done the majority of the conversions here, so I'm gonna close this. The issue itself should stay open cause there's still a lot of simplification/cleanup we can do.

@bbbales2 bbbales2 closed this Jan 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants