-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor lexicon softmax #549
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good, but I didn't understand the is_modifying_softmax_layer
function. Is this necessary? If not it'd be good to avoid unnecessary complication. If it is, then could you document it a little bit?
xnmt/modelparts/scorers.py
Outdated
@@ -126,27 +127,29 @@ def __init__(self, | |||
|
|||
def calc_scores(self, x: dy.Expression) -> dy.Expression: | |||
return self.output_projector.transform(x) | |||
|
|||
def is_modifying_softmax_layer(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So basically it means that we can't use dy.pickneglogsoftmax because we are modifying the value of the softmax layer before picking the loss. The dy.pickneglogsoftmax is a convenient function but can't be used when we are doing:
- label smoothing
- Linear interpolation on softmax
So this method is called to check that value, and use the correct dynet function accordingly. Also, by this inheritance, I don't have to copy+paste the calc_loss function from the Softmax object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some comments on the latest push
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let me check again. I think "modifying" is too vague though. Try to use a more descriptive name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you tell me one good name for this operation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, let me confirm: what do you mean by "pickneglogsoftmax cannot be used"? Why can it not be used? To elaborate,
calc_scores()
calculates an unnormalized log probabilitycalc_log_probs()
calculates a normalized log probabilitycalc_probs()
calculates a normalized probability
As long as these functions are implemented correctly, we should always be able to use pickneglogsoftmax, correct? If any of these functions don't do the desired action in your implementation here, we should fix the functions. This is an absolute must.
Or, if these are an efficiency concern where in some cases it's faster/memory efficient to calculate scores (log probs), and in some cases it's faster/memory efficient to calculate probabilities, we could have a function called something like prefer_scores_to_probs()
, where if it's true we use the calc_scores()
and if it's false we use calc_probs()
.
Does this make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep these are correct:
- calc_scores() calculates an unnormalized log probability
- calc_log_probs() calculates a normalized log probability
- calc_probs() calculates a normalized probability
As long as these functions are implemented correctly, we should always be able to use pickneglogsoftmax, correct?
I think this is not correct. For example the current implementation of master branch separates when to use dy.pick
and dy.pickneglogsoftmax
xnmt/xnmt/modelparts/scorers.py
Lines 134 to 149 in c644660
if self.label_smoothing == 0.0: | |
# single mode | |
if not batchers.is_batched(y): | |
loss = dy.pickneglogsoftmax(scores, y) | |
# minibatch mode | |
else: | |
loss = dy.pickneglogsoftmax_batch(scores, y) | |
else: | |
log_prob = dy.log_softmax(scores) | |
if not batchers.is_batched(y): | |
pre_loss = -dy.pick(log_prob, y) | |
else: | |
pre_loss = -dy.pick_batch(log_prob, y) | |
ls_loss = -dy.mean_elems(log_prob) | |
loss = ((1 - self.label_smoothing) * pre_loss) + (self.label_smoothing * ls_loss) |
This is because the loss itself can't be derrived by looking at the score of the unnormalized log prob of the model. For example, in the case when we are interpolating the softmax probability directly the loss should be the log probability of the interpolated model, right?
I see, this is correct. I guess we need to separate out the loss function
from the scorer.
Graham
…-- Sent from my phone so apologies for brevity and typos.
On Tue, Nov 20, 2018, 2:54 PM Philip Arthur ***@***.*** wrote:
***@***.**** commented on this pull request.
------------------------------
In xnmt/modelparts/scorers.py
<#549 (comment)>:
> @@ -126,27 +127,29 @@ def __init__(self,
def calc_scores(self, x: dy.Expression) -> dy.Expression:
return self.output_projector.transform(x)
+
+ def is_modifying_softmax_layer(self):
Yep these are correct:
- calc_scores() calculates an unnormalized log probability
- calc_log_probs() calculates a normalized log probability
- calc_probs() calculates a normalized probability
As long as these functions are implemented correctly, we should always be able to use pickneglogsoftmax, correct?
I think this is not correct. For example the current implementation of
master branch separates when to use dy.pick and dy.pickneglogsoftmax
https://github.com/neulab/xnmt/blob/c6446608d94aabeda3c2dca7bbd2920f23f593f7/xnmt/modelparts/scorers.py#L134-L149
This is because the loss itself can't be derrived by looking at the score
of the unnormalized log prob of the model. For example, in the case when we
are interpolating the softmax probability directly the loss should be the
log probability of the interpolated model, right?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#549 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAYWG23UiYndt0tBLkgysFZ6PQWObZrCks5uxF4LgaJpZM4Yqatp>
.
|
Alright, shall it be done in this pull request, or shall we make one more issue for this? For now, I think I can use name like |
I am opening a new issue after this merge |
This pull request is to solve #457. A minimal amount of code has been changed to implement this and also I corrected the example/19_lexbias.yaml.