-
-
Notifications
You must be signed in to change notification settings - Fork 193
Changed calculation of log_sum_exp(x1, x2) #1772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Jenkins Console Log Machine informationProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: G++: Clang: |
Jenkins Console Log Machine informationProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: G++: Clang: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here yah go!
@@ -22,8 +22,8 @@ class log_sum_exp_vv_vari : public op_vv_vari { | |||
log_sum_exp_vv_vari(vari* avi, vari* bvi) | |||
: op_vv_vari(log_sum_exp(avi->val_, bvi->val_), avi, bvi) {} | |||
void chain() { | |||
avi_->adj_ += adj_ * calculate_chain(avi_->val_, val_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to get rid of calculate_chain everywhere? It seems like a strange function.
Also used here (which should be fixed as part of this pull):
math/stan/math/rev/fun/log_sum_exp.hpp
Line 37 in b513358
avi_->adj_ += adj_ * calculate_chain(avi_->val_, val_); |
In stan/math/rev/fun/log_diff_exp.hpp and in stan/math/rev/fun/log1p_exp.hpp.
EXPECT_FLOAT_EQ(a.adj(), 1.0); | ||
|
||
var a1 = 1e50; | ||
var a2 = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once you remove the other calculate_chain thing we'll need some tests where a1 is a var and a2 is a double.
The first rule of naming is that names should be short conventional placeholders where possible and mean something everywhere else. The second rule of naming is that names should be as short as possible, because big ones obfuscate the structure of your code and we want to do everything in our power to make the actual code readable. The third rule is to use your judgement in balancing the first and second rules. With The What is being computed? A partial derivative. So we can call the function something like Finally, always let your functions pull things apart if it doesn't involve extra overhead. So you can pass the P.S. This was just to give you an idea of how to think about naming. We're not going to micromanage every name in every program! |
Thanks @bob-carpenter and @bbbales2 Is it better to change the function
|
Keep the function (it removes code duplication) and rename it (so it self documents). |
I think it makes sense to get rid of this one. |
Ben's the official reviewer, so his opinion trumps mine in this one! It's really not as big a deal as we're making it here. It's just good to set out conventions when you start coding so you don't have to think later. |
Mm, I'm not convinced we should keep it. Really not much code duplication.
It also lives in rev, which I think is the wrong place and the comment on the implementation doesn't seem right:
|
And honestly is that just wrong? It just seems wrong to me. What is happening here.
|
In
So it works out that
I need to check how it's being used in the other functions to make sure that this switch would work. |
Aaaah, thanks for that clarification. |
Quick suggestion (not a blocking review comment), some of these could use For
For
|
Jenkins Console Log Machine informationProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: G++: Clang: |
Thanks for handling this! Few thoughts: I actually played with this issue in #1677 (as it was a conveninent example to play with). As the tools for testing identitites are not going to be ready soon (we decided with @bob-carpenter to take the longer route there), I am however not sure you can get much from it (although I do use a bit more identities for testing). I second @andrjohns on the Generally, I would suggest to make each test loop over a range of values instead of using just one value. |
…stable/2017-11-14)
Jenkins Console Log Machine informationProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: G++: Clang: |
Jenkins Console Log Machine informationProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: G++: Clang: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question and then this is good! Thanks!
@@ -17,7 +16,7 @@ class log_diff_exp_vv_vari : public op_vv_vari { | |||
log_diff_exp_vv_vari(vari* avi, vari* bvi) | |||
: op_vv_vari(log_diff_exp(avi->val_, bvi->val_), avi, bvi) {} | |||
void chain() { | |||
avi_->adj_ += adj_ * calculate_chain(avi_->val_, val_); | |||
avi_->adj_ -= adj_ / expm1(bvi_->val_ - avi_->val_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no way to use inv_logit here? If so, that's fine, just checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how you could because one has the extra 1 term inside and the other outside.
1 / expm1(x) = 1 / exp(x - 1)
inv_logit(u) = 1 / (1 + exp(-u))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh good point. I was too lazy to actually think through the math at all.
Summary
The partial derivatives of
log_sum_exp(x, y)
were losing accuracy in the regime wherex
andy
were equal and greater than around1e14
. In the originallog_sum_exp(x1, x2)
function, the derivative oflog_sum_exp
with respect tox1
was evaluated by this formula:exp(x1 - (y + log(exp(x1 - y) + exp(x2 - y))))
where
y
is the maximum ofx1
andx2
. This will result in a loss of accuracy wheny
is close tox1
andlog(exp(x1-y) + exp(x2 - y))
is small. For example, whenx1 = 1e20
andx2 = 1e20
x1 - (y + log(exp(x1 - y) + exp(x2 - y)))
evaluates to
0
.I changed the calculation of the partial derivative of
log_sum_exp
with respect tox1
to be1 / (1 + exp(x2 - x1))
and the partial derivative of
log_sum_exp
with respect tox2
to be1 / (1 + exp(x1 - x2))
.Tests
I wrote a couple of new tests to check that the partials are accurate with large and equal argument values as in @martinmodrak 's issue raised in #1679.
Side Effects
None
Checklist
Math issue Derivatives of log_sum_exp are wrong for large arguments #1679
Copyright holder: Columbia University
The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
the basic tests are passing
the code is written in idiomatic C++ and changes are documented in the doxygen
the new changes are tested