Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor lexicon softmax #549

Merged
merged 5 commits into from
Nov 21, 2018
Merged

Refactor lexicon softmax #549

merged 5 commits into from
Nov 21, 2018

Conversation

philip30
Copy link
Contributor

This pull request is to solve #457. A minimal amount of code has been changed to implement this and also I corrected the example/19_lexbias.yaml.

Copy link
Contributor

@neubig neubig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good, but I didn't understand the is_modifying_softmax_layer function. Is this necessary? If not it'd be good to avoid unnecessary complication. If it is, then could you document it a little bit?

@@ -126,27 +127,29 @@ def __init__(self,

def calc_scores(self, x: dy.Expression) -> dy.Expression:
return self.output_projector.transform(x)

def is_modifying_softmax_layer(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So basically it means that we can't use dy.pickneglogsoftmax because we are modifying the value of the softmax layer before picking the loss. The dy.pickneglogsoftmax is a convenient function but can't be used when we are doing:

  1. label smoothing
  2. Linear interpolation on softmax

So this method is called to check that value, and use the correct dynet function accordingly. Also, by this inheritance, I don't have to copy+paste the calc_loss function from the Softmax object.

Copy link
Contributor Author

@philip30 philip30 Nov 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments on the latest push

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let me check again. I think "modifying" is too vague though. Try to use a more descriptive name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you tell me one good name for this operation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let me confirm: what do you mean by "pickneglogsoftmax cannot be used"? Why can it not be used? To elaborate,

  • calc_scores() calculates an unnormalized log probability
  • calc_log_probs() calculates a normalized log probability
  • calc_probs() calculates a normalized probability

As long as these functions are implemented correctly, we should always be able to use pickneglogsoftmax, correct? If any of these functions don't do the desired action in your implementation here, we should fix the functions. This is an absolute must.

Or, if these are an efficiency concern where in some cases it's faster/memory efficient to calculate scores (log probs), and in some cases it's faster/memory efficient to calculate probabilities, we could have a function called something like prefer_scores_to_probs(), where if it's true we use the calc_scores() and if it's false we use calc_probs().

Does this make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep these are correct:

  • calc_scores() calculates an unnormalized log probability
  • calc_log_probs() calculates a normalized log probability
  • calc_probs() calculates a normalized probability
As long as these functions are implemented correctly, we should always be able to use pickneglogsoftmax, correct?

I think this is not correct. For example the current implementation of master branch separates when to use dy.pick and dy.pickneglogsoftmax

if self.label_smoothing == 0.0:
# single mode
if not batchers.is_batched(y):
loss = dy.pickneglogsoftmax(scores, y)
# minibatch mode
else:
loss = dy.pickneglogsoftmax_batch(scores, y)
else:
log_prob = dy.log_softmax(scores)
if not batchers.is_batched(y):
pre_loss = -dy.pick(log_prob, y)
else:
pre_loss = -dy.pick_batch(log_prob, y)
ls_loss = -dy.mean_elems(log_prob)
loss = ((1 - self.label_smoothing) * pre_loss) + (self.label_smoothing * ls_loss)

This is because the loss itself can't be derrived by looking at the score of the unnormalized log prob of the model. For example, in the case when we are interpolating the softmax probability directly the loss should be the log probability of the interpolated model, right?

@neubig
Copy link
Contributor

neubig commented Nov 20, 2018 via email

@philip30
Copy link
Contributor Author

philip30 commented Nov 20, 2018

Alright, shall it be done in this pull request, or shall we make one more issue for this? For now, I think I can use name like can_loss_be_derived_from_scores() for the same method?

@philip30
Copy link
Contributor Author

I am opening a new issue after this merge

@philip30 philip30 merged commit 91444ee into master Nov 21, 2018
@philip30 philip30 deleted the lexicon_softmax branch November 21, 2018 01:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants