Gradient Sign Bug #1

AlexHarn · 2018-04-19T11:20:24Z

Under some circumstances the sign of the gradient seems to be flipped to the wrong direction.

in the wrong direction. Flipping the sign leads to proper convergence.

AlexHarn · 2018-04-19T11:27:37Z

15d5ac0 demonstrates the bug.

AlexHarn · 2018-04-19T11:59:14Z

3d62745 shows one way to flip the sign again which leads to convergence. This is not a fix, it simply demonstrates the bug.

AlexHarn · 2018-04-19T12:06:25Z

This is a visual representation of the loss function with the "true" parameters in the middle (100 m for absorption and 25 m for scattering). As expected there is a strong negative correlation between absorption and scattering when only using total hit counts without utilizing time information.

However only fitting one of the two parameters while keeping the other one constant should work fine. It does work like expected for absorption, but for scattering the gradient points in the wrong direction which is absolutely unexpected behavior. Flipping the sign like demonstrated in commit 3d62745 leads to proper convergence which makes no sense.

The bug also arises under different circumstances, but this specific example will be used as a base for trying to create a minimal working example to nail this bug down.

AlexHarn · 2018-04-20T15:08:00Z

Attempts to replicate the bug in a simplified one dimensional simulation (minimal_1d.py) were unsuccessful for now.

AlexHarn · 2018-04-20T15:11:14Z

Changing the scattering function to something else, for example only flipping the direction vectors with some given probability into exactly the opposite direction like demonstrated in 3fa66c6, makes the bug disappear.

So one might assume the bug can be nailed down to the scattering. But changing the loss function to something else, for example a loss function based on the moments of the final positions (which would not be possible on real data) just like I did in c535128, while using the original scattering also "resolves" the bug.

Maybe there are multiple independent issues here which flip the gradient for some reason?

AlexHarn · 2018-04-26T07:46:39Z

We found the underlying issue. The bug is not actually a bug, but expected behavior. It is demonstrated in bug_test.py.

The problem is that there is no gradient for the number of loop iterations, which is crucial for correct behavior. Convergence can still happen on chance, if the gradient derived from the body happens to point into the right direction, which explains the weird behavior. More on this can be read here. On p. 15 it explicitly says "This means that we assume that pred is not trainable.", where pred indirectly refers to the number of loop iterations.

So how do we solve or go around this? It seems like tricking the gradient to point into the right direction is not a safe option in higher dimensions.

AlexHarn added the bug Something isn't working label Apr 19, 2018

AlexHarn self-assigned this Apr 19, 2018

AlexHarn added a commit that referenced this issue Apr 19, 2018

This commit demonstrates #1, the gradient sign bug. The gradient points

15d5ac0

in the wrong direction. Flipping the sign leads to proper convergence.

AlexHarn mentioned this issue Apr 27, 2018

Feature request: Gradient for number of loop iterations for automatic differentiation of physics simulations tensorflow/tensorflow#18920

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradient Sign Bug #1

Gradient Sign Bug #1

AlexHarn commented Apr 19, 2018

AlexHarn commented Apr 19, 2018

AlexHarn commented Apr 19, 2018

AlexHarn commented Apr 19, 2018 •

edited

Loading

AlexHarn commented Apr 20, 2018 •

edited

Loading

AlexHarn commented Apr 20, 2018 •

edited

Loading

AlexHarn commented Apr 26, 2018 •

edited

Loading

Gradient Sign Bug #1

Gradient Sign Bug #1

Comments

AlexHarn commented Apr 19, 2018

AlexHarn commented Apr 19, 2018

AlexHarn commented Apr 19, 2018

AlexHarn commented Apr 19, 2018 • edited Loading

AlexHarn commented Apr 20, 2018 • edited Loading

AlexHarn commented Apr 20, 2018 • edited Loading

AlexHarn commented Apr 26, 2018 • edited Loading

AlexHarn commented Apr 19, 2018 •

edited

Loading

AlexHarn commented Apr 20, 2018 •

edited

Loading

AlexHarn commented Apr 20, 2018 •

edited

Loading

AlexHarn commented Apr 26, 2018 •

edited

Loading