Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switchable PyTorch backend #580

Closed
wants to merge 223 commits into from
Closed

Switchable PyTorch backend #580

wants to merge 223 commits into from

Conversation

msperber
Copy link
Contributor

@msperber msperber commented Dec 9, 2019

This addresses #420 and implements switchable DyNet / Pytorch backends for XNMT. Different backends have different advantages, such as autobatching in DyNet vs. multi GPU support, mixed precision training, CTC training in Pytorch, both of which potentially critical in certain situations. Another motivation is that it can be easier to replicate prior work when using the same deep learning framework.

All technical details are described in the updated doc, so please take a look there. I did my best to keep the changes as unobtrusive as possible, which was relatively easy given the similar design principles of DyNet and Pytorch. Switchable backends imply somewhat increased maintenance effort for some of the core modeling code, although this code is fairly stable now so I think things should be fine in this respect. For advanced features, I don’t think we need to aim for keeping things in parallel.

The status is as follows:

  • most example configs are supported with both backends (with exception of a few advanced features: 17_minrisk, 18_lexiconbias, 21_char_segment; these are not implemented with Pytorch backend)
  • Most unit tests run with both backends. Those that don’t support the Pytorch backend are skipped automatically in case of this backend.
  • I did comprehensive checks of activations, gradients, and updates, as well as complete training curves, to confirm that both backends perform the same computations (modulo numerical stability)
  • the 3 recipes are tested and produce similar results with both backends.
  • the speed is more or less similar with both backends. The Pytorch backend needs less GPU memory and introduces a new CUDNN-based LSTM though, which has less features but gives significantly higher speed.
  • DyNet-trained models can be loaded with the Pytorch backend and evaluated or finetuned from there. The opposite direction is currently not implemented, as reading of serialized Pytorch models is less straightforward.

There is one minor breaking change: saved model files now use a dash instead of a period, e.g. “Linear.9c2beb79” -> “Linear-9c2beb79”. This is because Pytorch complains when model names contain a period. When using old saved models, these would need to be manually renamed.

One potential question that might be raised about the chosen design is why DyNet and Pytorch code are mixed in the same Python modules, as opposed to having clean separate modules for each. The main reason for this is to allow for clean implementation of default components. For example, DefaultTranslator is backend-independent, and uses bare(embedders.SimpleWordEmbedder) as default for it’s src_embedder init argument. embedders.SimpleWordEmbedder has two different implementations, embedders.SimpleWordEmbedderDynet and embedders.SimpleWordEmbedderTorch. embedders.SimpleWordEmbedder will point to the appropriate one given the active backend. Moving both implementations to different modules would require importing things from the base module, leading to circular imports (e.g., xnmt.embedders and xnmt.embedders_dynet would both import each other). Nevertheless, I did make sure that running with either backend works even without the other backend installed in the Python environment.

There are a few extra changes and fixes that are not central to the PyTorch backend, but were very helpful for debugging and unit testing:

  • loss reports were incorrect with the “avg” loss_comb_method, and tensorboard logging step counters were not working correctly.
  • a new —settings=pretend mode that runs training / evaluation on 1 input and then finishes (useful to quickly make sure everything runs smoothly, as a sanity check before launching a long training)
  • extended tensorboard support
  • more flexible parameter initialization, especially regarding components with multiple param matrices, and direct initialization to given numpy arrays
  • a few other minor details

— Matthias

msperber added 27 commits May 10, 2019 14:55
fix typo

fix cudnn lstm

better cudnn lstm padding check

fix tb-reporting LR for dynet optimizers

fix longtensor device

cudnn lstm: move seq_lengths to device

fix to beam search stopping criteria (neulab#572)

torch.no_grad() for LossEvalTask

no_grad() for inference code

update doc string

unit tests for cudnn lstm (passing even though training behavior seems buggy)

comment for cudnn lstm

save memory by freeing training data

fix a unit test

initial resource code

fix type annot

implement ResourceFile synta

resolve ResourceFile when loading saved models

made resource naming and _remove_data_dir() compatible

more convenient message for existing log files

support recent pyyaml

new 'pretend' settings

standard example: revert back no epochs

fix error when trying to subsample more sentences than are in the training set

fix previous fix

cudnn lstm: use total_length option

attempted cudnn lstm fix

removed unused code in cudnn lstm

fix missing train=True events in multi task training

attempt transplosed plot fix

fix code indentation in unicode tokenizer

OOVStatisticsReporter: don't crash in case of empty hypo

SkipOutOfMemory for simple training regimen (pytorch only)

cleaned up manual tests; fix grad logging

fix missing desc string in WER/CER scores
@msperber msperber requested a review from neubig December 9, 2019 15:48
@msperber msperber closed this Dec 9, 2019
@msperber msperber deleted the torchpr branch December 9, 2019 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant