This repository has been archived by the owner on Jan 3, 2023. It is now read-only.
Releases: NervanaSystems/neon
Releases · NervanaSystems/neon
MergeSum, Colornoise layers, CSV batchwriter
- New MergeSum, Colornoise layers
- support for aspect_ratio scaling augmentation
- updated IMDB sentiment analysis example
- generic CSV batchwriter
- various build and deserialization bugfixes, doc updates
Kepler GPU support, updated data loader and serialization, expanded model zoo
- kepler GPU kernel support [#80]
- new dataloader format, updated docs [#115, #170]
- new serialization format
- FastRCNN implementation, ROI pooling support [#135]
- deep residual nets implementation and example
- expanded model zoo
- Ticker dataset and copy, repeat copy tasks
- autodiff transpose support [#173]
- numerous bug fixes and documentation updates.
Lookuptable, LRN kernels, deterministic Conv, bugfixes and docs
- CUDA kernels for lookuptable layer (up to 4x speedup)
- support for determinstic Conv layer updatesa
- LRN layer support
- custom dataset walkthrough utilizing bAbI data
- reduced number of threads in deep reduction EW kernels [#171]
- additional (de)serialization routines [#106]
- CPU tensor slicing fix
- corrections for PrecisionRecall, MultiLabelStats [#148]
- explicitly specify python2.7 for virtualenv [#155]
- default to SM50 when no working GPU found [#186]
- Add alpha to ELU activation [#164]
- deconv callback fix [#162]
- various documentation updates [#151, #152]
Bi-directional RNN's, ELU's, data shuffling, GPU kernel compile speedups
- Add support for bidirectional RNNs and LSTMs
- added ELU, leaky ReLU activations
- significantly faster GPU kernel builds (using ptx instead of cuda-c)
- data shuffling enhancements, removal of old data loader code.
- caffe conv, pool, dropout layer matching and compatibility flags
- add scheduling support for RMSProp
- callback enhancements, additional unit tests
- documentation auditing, added links to introductory video tutorials
QA demo, CPU speedups, deconv and histogram visualizations
- deconvolution and weight histogram visualization examples and documentation
- CPU convolution and pooling layer speedups (~2x faster)
- bAbI question and answer interactive demo, dataset support.
- various ImageLoader enhancements.
- interactive usage improvements (shortcut Callback import, multiple Callbacks
init, doc updates, single item batch size support) - set default verbosity level to warning
- CIFAR10 example normalization updates
- CUDA detection enhancements [#132]
- only parse batch_writer arguments when used as a script, allow undefined
global_mean [#137, #140]
new data loader, deconv visualization, recurrent weight loading
- completely re-written C++ multithreaded dataloader
- new weight initialization options for recurrent layers
- Added deconvolution visualization support (guided backprop)
- new bAbI question answering example network
- Improved performance of cifar10_allcnn, word_lstm examples
- new CUDA-C max and avg pooling kernels
- Additional bugfixes and documentation updates
Bugfixes, benchmarking, and timeseries
- Callback initialization bug fix [#127]
- IMDB LSTM example bug fix [#130]
- Added cuda-convnet2 style binary dropout variant
- Added benchmark function to model (separate fprop, bprop, update timings)
- Remove h_buffer references in lieu of outputs for recurrent layers
- Multi-cost output buffer bugfix for inference [#131]
- New timeseries prediction and generation example
- Change Callback initialization to re-support named arguments. Separate out
these arguments in argparser. [#128]
LayerContainers, Sentiment analysis, and more
- Sentiment analysis support (LSTM lookupTable based), new IMDB example
- Support for merge and branch layer stacks via LayerContainers
- Sequential, Tree, MergeBroadcast, MergeMultiStream
- Support for freezing layer stacks
- Adagrad optimizer support
- new GPU kernels for fast compounding batch norm, conv and pooling engine
updates, new kernel build system and flags. - Modifications for Caffe support
- conv, pooling, P/Q updates, dropout layer normalization more in-line with
Caffe approach. NOTE: this breaks backwards compatibility with some
strided conv/pool related models serialized using older versions of neon
as the output sizes may now be different. See the FAQ for more info. - serialization enhancements to make caffe model import/export easier
- use per-channel mean subtraction instead of single global. NOTE: this
breaks backwards compatibility with ImgMaster saved datasets prior to this
revision. To correct, please use the includedupdate_dataset_cache.py
script in the util directory.
- conv, pooling, P/Q updates, dropout layer normalization more in-line with
- Default training cost display during progress bar is now calculated on a
rolling window basis rather than from the beginning of each epoch - Separate Layer configuration and initialization steps
- YAML based alexnet example
- Callback enhancements.
- now pass args instead of having to spell out callbacks in each example
- Changed validation callback to loss callback, validation_frequency now
evaluation_frequency - Generic metric callback.
- Various bug fixes
- non-contiguous array get for GPUTensors
- 1D slicing returns 2D matrices
- bin/neon serialization fixes for RNNs
- 3D conv fixes for fprop, bprop
- batch norm inference fix
- bias layer size fix
- Documentation updates and improvements
Primarily bug fix release
- Ensure root logging handler setup [#82]
- C++ utility for CUDA compatibility checking [#83]
- Add predict function to models [#86]
- Fix bug in learning rate schedule impacting deserialization
- Speed up batch norm computation
- Average gradients in OpTree, fix tests
- Use inference mode for fprop during validation
- Add top-k misclassifcation metric
- Simplify maxas install, make vis requirements optional, doc updates.
Multi GPU support
This release implements support for multi GPU processing using weird trick parallelization (data parallel for local layers, model parallel for fully-connected layers) and cleans up previously existing MPI based parallel code.
Multi GPU is only supported on newer Maxwell based cards using the NervanaGPU backend.
Older, Kepler based cards using the cudanet backend are no longer supported (some models and datasets will still work, but others may raise DeprecationWarning
's). Users of these cards are encouraged to remain on the 0.8.2 release until we back-port NervanaGPU to support Kepler cards.