Skip to content

Conversation

Synray
Copy link

@Synray Synray commented Aug 15, 2023

Instead of averaging every parameter's gradient at the end, just average the output gradient at the start, reducing the number of divisions. This is equivalent because the 1/n term propagates backwards to all the gradients.

Instead of averaging every parameter's gradient at the end, just average
the output gradient at the start, reducing the number of divisions. This
is equivalent because the `1/n` term propagates backwards to all the
gradients.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant