Refactor lexicon softmax #549

philip30 · 2018-11-20T05:40:40Z

This pull request is to solve #457. A minimal amount of code has been changed to implement this and also I corrected the example/19_lexbias.yaml.

neubig

Overall, looks good, but I didn't understand the is_modifying_softmax_layer function. Is this necessary? If not it'd be good to avoid unnecessary complication. If it is, then could you document it a little bit?

neubig · 2018-11-20T12:23:35Z

xnmt/modelparts/scorers.py

@@ -126,27 +127,29 @@ def __init__(self,

  def calc_scores(self, x: dy.Expression) -> dy.Expression:
    return self.output_projector.transform(x)
+
+  def is_modifying_softmax_layer(self):


What does this mean?

So basically it means that we can't use dy.pickneglogsoftmax because we are modifying the value of the softmax layer before picking the loss. The dy.pickneglogsoftmax is a convenient function but can't be used when we are doing:

label smoothing

Linear interpolation on softmax

So this method is called to check that value, and use the correct dynet function accordingly. Also, by this inheritance, I don't have to copy+paste the calc_loss function from the Softmax object.

I added some comments on the latest push

Ok, let me check again. I think "modifying" is too vague though. Try to use a more descriptive name.

Can you tell me one good name for this operation?

OK, let me confirm: what do you mean by "pickneglogsoftmax cannot be used"? Why can it not be used? To elaborate,

calc_scores() calculates an unnormalized log probability

calc_log_probs() calculates a normalized log probability

calc_probs() calculates a normalized probability

As long as these functions are implemented correctly, we should always be able to use pickneglogsoftmax, correct? If any of these functions don't do the desired action in your implementation here, we should fix the functions. This is an absolute must.

Or, if these are an efficiency concern where in some cases it's faster/memory efficient to calculate scores (log probs), and in some cases it's faster/memory efficient to calculate probabilities, we could have a function called something like prefer_scores_to_probs(), where if it's true we use the calc_scores() and if it's false we use calc_probs().

Does this make sense?

Yep these are correct:

calc_scores() calculates an unnormalized log probability

calc_log_probs() calculates a normalized log probability

calc_probs() calculates a normalized probability

As long as these functions are implemented correctly, we should always be able to use pickneglogsoftmax, correct?

I think this is not correct. For example the current implementation of master branch separates when to use dy.pick and dy.pickneglogsoftmax

xnmt/xnmt/modelparts/scorers.py

Lines 134 to 149 in c644660

if self.label_smoothing == 0.0:

# single mode

if not batchers.is_batched(y):

loss = dy.pickneglogsoftmax(scores, y)

# minibatch mode

else:

loss = dy.pickneglogsoftmax_batch(scores, y)

else:

log_prob = dy.log_softmax(scores)

if not batchers.is_batched(y):

pre_loss = -dy.pick(log_prob, y)

else:

pre_loss = -dy.pick_batch(log_prob, y)

ls_loss = -dy.mean_elems(log_prob)

loss = ((1 - self.label_smoothing) * pre_loss) + (self.label_smoothing * ls_loss)

This is because the loss itself can't be derrived by looking at the score of the unnormalized log prob of the model. For example, in the case when we are interpolating the softmax probability directly the loss should be the log probability of the interpolated model, right?

neubig · 2018-11-20T20:02:16Z

I see, this is correct. I guess we need to separate out the loss function from the scorer. Graham

…

-- Sent from my phone so apologies for brevity and typos.

On Tue, Nov 20, 2018, 2:54 PM Philip Arthur ***@***.*** wrote: ***@***.**** commented on this pull request. ------------------------------ In xnmt/modelparts/scorers.py <#549 (comment)>: > @@ -126,27 +127,29 @@ def __init__(self, def calc_scores(self, x: dy.Expression) -> dy.Expression: return self.output_projector.transform(x) + + def is_modifying_softmax_layer(self): Yep these are correct: - calc_scores() calculates an unnormalized log probability - calc_log_probs() calculates a normalized log probability - calc_probs() calculates a normalized probability As long as these functions are implemented correctly, we should always be able to use pickneglogsoftmax, correct? I think this is not correct. For example the current implementation of master branch separates when to use dy.pick and dy.pickneglogsoftmax https://github.com/neulab/xnmt/blob/c6446608d94aabeda3c2dca7bbd2920f23f593f7/xnmt/modelparts/scorers.py#L134-L149 This is because the loss itself can't be derrived by looking at the score of the unnormalized log prob of the model. For example, in the case when we are interpolating the softmax probability directly the loss should be the log probability of the interpolated model, right? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#549 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAYWG23UiYndt0tBLkgysFZ6PQWObZrCks5uxF4LgaJpZM4Yqatp> .

philip30 · 2018-11-20T20:12:11Z

Alright, shall it be done in this pull request, or shall we make one more issue for this? For now, I think I can use name like can_loss_be_derived_from_scores() for the same method?

philip30 · 2018-11-21T01:08:52Z

I am opening a new issue after this merge

philip30 added 3 commits November 20, 2018 16:38

Refactor lexicon softmax

29b3429

Fixed some comments

c6c2848

Removed potential bug

376f337

neubig reviewed Nov 20, 2018

View reviewed changes

Added comment

ffb0da2

Fixed loss

42546cd

philip30 merged commit 91444ee into master Nov 21, 2018

philip30 deleted the lexicon_softmax branch November 21, 2018 01:09

philip30 mentioned this pull request Nov 21, 2018

Loss should be factored out from scorer #552

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor lexicon softmax #549

Refactor lexicon softmax #549

philip30 commented Nov 20, 2018

neubig left a comment

neubig Nov 20, 2018

philip30 Nov 20, 2018

philip30 Nov 20, 2018 •

edited

Loading

neubig Nov 20, 2018

philip30 Nov 20, 2018

neubig Nov 20, 2018

philip30 Nov 20, 2018

neubig commented Nov 20, 2018 via email

philip30 commented Nov 20, 2018 •

edited

Loading

philip30 commented Nov 21, 2018

	if self.label_smoothing == 0.0:
	# single mode
	if not batchers.is_batched(y):
	loss = dy.pickneglogsoftmax(scores, y)
	# minibatch mode
	else:
	loss = dy.pickneglogsoftmax_batch(scores, y)
	else:
	log_prob = dy.log_softmax(scores)
	if not batchers.is_batched(y):
	pre_loss = -dy.pick(log_prob, y)
	else:
	pre_loss = -dy.pick_batch(log_prob, y)

	ls_loss = -dy.mean_elems(log_prob)
	loss = ((1 - self.label_smoothing) * pre_loss) + (self.label_smoothing * ls_loss)

Refactor lexicon softmax #549

Refactor lexicon softmax #549

Conversation

philip30 commented Nov 20, 2018

neubig left a comment

Choose a reason for hiding this comment

neubig Nov 20, 2018

Choose a reason for hiding this comment

philip30 Nov 20, 2018

Choose a reason for hiding this comment

philip30 Nov 20, 2018 • edited Loading

Choose a reason for hiding this comment

neubig Nov 20, 2018

Choose a reason for hiding this comment

philip30 Nov 20, 2018

Choose a reason for hiding this comment

neubig Nov 20, 2018

Choose a reason for hiding this comment

philip30 Nov 20, 2018

Choose a reason for hiding this comment

neubig commented Nov 20, 2018 via email

philip30 commented Nov 20, 2018 • edited Loading

philip30 commented Nov 21, 2018

philip30 Nov 20, 2018 •

edited

Loading

philip30 commented Nov 20, 2018 •

edited

Loading