-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switchable PyTorch backend #580
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
fix typo fix cudnn lstm better cudnn lstm padding check fix tb-reporting LR for dynet optimizers fix longtensor device cudnn lstm: move seq_lengths to device fix to beam search stopping criteria (neulab#572) torch.no_grad() for LossEvalTask no_grad() for inference code update doc string unit tests for cudnn lstm (passing even though training behavior seems buggy) comment for cudnn lstm save memory by freeing training data fix a unit test initial resource code fix type annot implement ResourceFile synta resolve ResourceFile when loading saved models made resource naming and _remove_data_dir() compatible more convenient message for existing log files support recent pyyaml new 'pretend' settings standard example: revert back no epochs fix error when trying to subsample more sentences than are in the training set fix previous fix cudnn lstm: use total_length option attempted cudnn lstm fix removed unused code in cudnn lstm fix missing train=True events in multi task training attempt transplosed plot fix fix code indentation in unicode tokenizer OOVStatisticsReporter: don't crash in case of empty hypo SkipOutOfMemory for simple training regimen (pytorch only) cleaned up manual tests; fix grad logging fix missing desc string in WER/CER scores
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This addresses #420 and implements switchable DyNet / Pytorch backends for XNMT. Different backends have different advantages, such as autobatching in DyNet vs. multi GPU support, mixed precision training, CTC training in Pytorch, both of which potentially critical in certain situations. Another motivation is that it can be easier to replicate prior work when using the same deep learning framework.
All technical details are described in the updated doc, so please take a look there. I did my best to keep the changes as unobtrusive as possible, which was relatively easy given the similar design principles of DyNet and Pytorch. Switchable backends imply somewhat increased maintenance effort for some of the core modeling code, although this code is fairly stable now so I think things should be fine in this respect. For advanced features, I don’t think we need to aim for keeping things in parallel.
The status is as follows:
There is one minor breaking change: saved model files now use a dash instead of a period, e.g. “Linear.9c2beb79” -> “Linear-9c2beb79”. This is because Pytorch complains when model names contain a period. When using old saved models, these would need to be manually renamed.
One potential question that might be raised about the chosen design is why DyNet and Pytorch code are mixed in the same Python modules, as opposed to having clean separate modules for each. The main reason for this is to allow for clean implementation of default components. For example,
DefaultTranslator
is backend-independent, and usesbare(embedders.SimpleWordEmbedder)
as default for it’ssrc_embedder
init argument.embedders.SimpleWordEmbedder
has two different implementations,embedders.SimpleWordEmbedderDynet
andembedders.SimpleWordEmbedderTorch
.embedders.SimpleWordEmbedder
will point to the appropriate one given the active backend. Moving both implementations to different modules would require importing things from the base module, leading to circular imports (e.g.,xnmt.embedders
andxnmt.embedders_dynet
would both import each other). Nevertheless, I did make sure that running with either backend works even without the other backend installed in the Python environment.There are a few extra changes and fixes that are not central to the PyTorch backend, but were very helpful for debugging and unit testing:
— Matthias