-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switchable PyTorch backend #581
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
fix typo fix cudnn lstm better cudnn lstm padding check fix tb-reporting LR for dynet optimizers fix longtensor device cudnn lstm: move seq_lengths to device fix to beam search stopping criteria (neulab#572) torch.no_grad() for LossEvalTask no_grad() for inference code update doc string unit tests for cudnn lstm (passing even though training behavior seems buggy) comment for cudnn lstm save memory by freeing training data fix a unit test initial resource code fix type annot implement ResourceFile synta resolve ResourceFile when loading saved models made resource naming and _remove_data_dir() compatible more convenient message for existing log files support recent pyyaml new 'pretend' settings standard example: revert back no epochs fix error when trying to subsample more sentences than are in the training set fix previous fix cudnn lstm: use total_length option attempted cudnn lstm fix removed unused code in cudnn lstm fix missing train=True events in multi task training attempt transplosed plot fix fix code indentation in unicode tokenizer OOVStatisticsReporter: don't crash in case of empty hypo SkipOutOfMemory for simple training regimen (pytorch only) cleaned up manual tests; fix grad logging fix missing desc string in WER/CER scores
worked on optimizers, model saving+reverting tested model loading param init working handling device run dynet backend w/o torch installed WIP: toward using torch backend w/o dynet installed introduced decorators for backends WIP: toward working w/o dynet installed finished separating out dynet and pytorch code settings and command line arguments remove dynet_profiling flag building API doc works fixed some unit test problems make tensorboard optional, it's causing some interference with unit tests run unit tests in either dynet or torch mode bugfix: reload_example skip unsupported test_beam_search + better error message WIP: bug fixes + skip unit tests unsupported by torch backend all unit tests running or skipped if unsupported by backend merge dynet/torch classifier unit tests to use same config file backend-agnostic LM running update .gitignore seq_labeler works independently of backend torch/GPU fixes fix loss function add missing call to optimizer.step() init forget gate params to 1 flexible loop-based lstm functional init forget gate biases to 1 fix bug when no mask is set fix mask device fix LSTMCell device fixed case of multiple layers for flexible lstm implemented variational dropout for LSTMs fixed device for dropout masks remove unused code wiring together of uni LSTMs works regardless of backend seq2seq standard example working fix torch MLP attender on GPU NoBridge works with torchh backend missing embedder features; fix multi-layer bilstm torch version of DenseWordEmbedder GPU fix for dense embedder small bugfix small fix another small fix another try speech example working fix reporting ensembling working runnable self-attention torch version attempt at fixing longtensor device attempt at fixing longtensor device fix self-attention lineaar transforms device fix layer-norm device another device fix doc update fixes to kftt recipe fix broadcasting issue workaround for speech features for very short audios remove unused code label smoothing w/ pytorch backend fix linear bridge and multilayer rnn decoder for torch backend resolve deprecation warning added amsgrad minor cleanup refactor transforms fix to lazy expression sequence fixed downsampling for TransformSeqTransducer made param initialization more convenient introduce BaseParamCollection to reduce code duplication pytorch version of batchnorm mini cleanup CNN and transposed sequence tensors hide InitializableModuleList(nn.ModuleList) from dynet backend fix previous fix fixed typo fixed None check MaxPoolCNNLayer: pooling optional remove unused files h5/npz reader refactored and support delta features fix masking for subsampling MaxPoolCNNLayer fix reverting tranposed torch tensors implemented DotAttenderTorch adam and sgd support all pytorch-implemented features, including weight decay fix unit tests add some unit tests supported by torch backend by now WIP: fixed more unit tests fix for torch 0.4.1 more 0.4.1 fixes fix label smoothing fix unit test all unit tests passing less verbose data loading uncomment tensorboard logging remove unused commandline_args move train loss tracker fix loss tracking when losses are averaged across minibatches fix tensorboard step counter minor doc fix implemented skip_noisy consistency rename for layer norm fix feat stacking for older numpy version separate out clip_grads and rescale_grads fix major bug: pytorch gradients were not reset properly fix sentpiece output proc clean up comments fix typo small code simplification small code simplification clean up import fix for same batch multitask regimen set pytorch seed fix numpy resize issue fix typo bug check for cudnn lstm fix cnn device expr seq gpu fix fix typo remove import allow minor upgrade of pyyaml anomaly detection remove some comments fix loss tracker when using multiple losses fix batched L2 norm computation for fix_norm option safer expression sequence arguments checks update tensorboard writer to support histograms fix LazyNumpyExpressionSequenceDynet with transposed tensors print torch computation graph fix print_cg_torch gitignore visualized computation graphs update gitignore fixed feedback loss for batch size > 1 fix reporting of sentence losses tensorboard visualize embeddings TensorboardCustomWriter coding style fix add_scalars check/delete both .log and .log.tb fix skip_noisy when parts of the params have not received gradients fix loss tracker behavior for non-accumulative mode: accumulate minibatches since last report instead of reporting only most recent minibatch at time of report calc_context tensor dimensions consistent between dynet and torch backends tensorboard-log gradient norm fix commit that made dy/torch dimension consistent fix UniLSTMSeqTransducer, both torch and dynet implementations had bugs fix dropout mask batch size for per-timestep rnn unfolding fix embedder with numpy initializer safer check for TB logger being ready WIP: traceable tensor methods first version of trace working small bugfix to trace fix dim() for ReversedExpressionSequenceTorch include decoder state and final transducer state numpy initializer for torch backend for consistent behavior: dynet's numpy initializer checks dimensions of input array fix torch's lstm forget gate initialization fix to LazyNumpyExpressionSequenceTorch turn off tracing by default trying file reorg move tiny model fix reload test added manual test (WIP) remove test data from examples data dir implemented InitializerSequence fix switch of H/C when using lstm as decoder manual training unit test running added two-layer manual test refactor InitializerSequence to use __getitem__ update bi-lstm's handling of sequence initializers add manual test w/ bi-lstms expanded manual tests disable sparse dynet updates better error msg for mismatching init arrays manual gradients unit test introduce ManualTestingBaseClass seq2seq grad check more work on unit tests; singled out failing tests for seq2seq training with more than one step working on manual tests updated lstm params match with some tricks effectively disable the redundant lstm bias_hh all manual tests passing now WIP: manual full LAS test work on full las manual test mlp attender supports manual init better error msg pyramidal lstm supports param_init, bias_init intermediate las model: passing the trained weights manual check added fix_norm to test fix (minor?) bug with label smoothing added label smoothing to manual test more work on manual tests produced a test failing with SGD as well (not only Adam) attender fix? manual classifier tests: refactor + better precision working on basic seq2seq test basic s2s tests refactored worked up to failing mlp att test cleaned up mlp attender and tests finished unit test refactoring round add basic sec2sec grad test + clean up some manual tests fairly complete and passing WIP: load dynet weights into pytorch tensors loading dynet models into pytorch backend works cleaner solution for ignoring redundant lstm bias remove reference to outdated backward hook grad rescaling unit test fix lattice attender: incorrect var name simplified / unified grad clip configuration fix tensorboardx version document tensor tools fix type annotation fix variational recurrent dropout consistent use of sent_len() replace usages of dim() by more readable semantic accessors TB: always log grads, + log LR fix typo fix cudnn lstm better cudnn lstm padding check fix tb-reporting LR for dynet optimizers fix longtensor device cudnn lstm: move seq_lengths to device fix to beam search stopping criteria (neulab#572)
no_grad() for inference code update doc string unit tests for cudnn lstm (passing even though training behavior seems buggy) comment for cudnn lstm save memory by freeing training data fix a unit test initial resource code fix type annot implement ResourceFile synta resolve ResourceFile when loading saved models made resource naming and _remove_data_dir() compatible more convenient message for existing log files support recent pyyaml new 'pretend' settings standard example: revert back no epochs fix error when trying to subsample more sentences than are in the training set fix previous fix cudnn lstm: use total_length option attempted cudnn lstm fix removed unused code in cudnn lstm fix missing train=True events in multi task training attempt transplosed plot fix fix code indentation in unicode tokenizer OOVStatisticsReporter: don't crash in case of empty hypo SkipOutOfMemory for simple training regimen (pytorch only) cleaned up manual tests; fix grad logging fix missing desc string in WER/CER scores
Thanks so much for this! |
Ah, sure. I think I know where these come from & will try to fix. |
Torchpr fix
Torchpr fix
I've pushed the fix, sorry for the delay. |
Thanks a bunch! I am not going to be able to review this in detail in a timely manner, but I've looked at the overall structure and it looks good. I'll just go ahead and merge, and we can iterate on any further improvements. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This addresses #420 and implements switchable DyNet / Pytorch backends for XNMT. Different backends have different advantages, such as autobatching in DyNet vs. multi GPU support, mixed precision training, CTC training in Pytorch, both of which potentially critical in certain situations. Another motivation is that it can be easier to replicate prior work when using the same deep learning framework.
All technical details are described in the updated doc, so please take a look there. I did my best to keep the changes as unobtrusive as possible, which was relatively easy given the similar design principles of DyNet and Pytorch. Switchable backends imply somewhat increased maintenance effort for some of the core modeling code, although this code is fairly stable now so I think things should be fine in this respect. For advanced features, I don’t think we need to aim for keeping things in parallel.
The status is as follows:
There is one minor breaking change: saved model files now use a dash instead of a period, e.g. “Linear.9c2beb79” -> “Linear-9c2beb79”. This is because Pytorch complains when model names contain a period. When using old saved models, these would need to be manually renamed.
One potential question that might be raised about the chosen design is why DyNet and Pytorch code are mixed in the same Python modules, as opposed to having clean separate modules for each. The main reason for this is to allow for clean implementation of default components. For example,
DefaultTranslator
is backend-independent, and usesbare(embedders.SimpleWordEmbedder)
as default for it’ssrc_embedder
init argument.embedders.SimpleWordEmbedder
has two different implementations,embedders.SimpleWordEmbedderDynet
andembedders.SimpleWordEmbedderTorch
.embedders.SimpleWordEmbedder
will point to the appropriate one given the active backend. Moving both implementations to different modules would require importing things from the base module, leading to circular imports (e.g.,xnmt.embedders
andxnmt.embedders_dynet
would both import each other). Nevertheless, I did make sure that running with either backend works even without the other backend installed in the Python environment.There are a few extra changes and fixes that are not central to the PyTorch backend, but were very helpful for debugging and unit testing:
— Matthias