-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Open
Labels
docsDocumentation relatedDocumentation relatedneeds triageWaiting to be triaged by maintainersWaiting to be triaged by maintainers
Description
📚 Documentation
Hi,
In src/lightning/pytorch/demos/transformer.py, an encoder-decoder transformer is used for next token prediction. However, just as the conventional setup in encoder-decoder models, the whole src is seen by the model since there’s no src mask. And the prediction target is just the shift right of the source. Wouldn’t this result in a future leak since the model can just output the i+1-th token in the src when predicting the i-th token in the target?
Metadata
Metadata
Assignees
Labels
docsDocumentation relatedDocumentation relatedneeds triageWaiting to be triaged by maintainersWaiting to be triaged by maintainers