Skip to content
This repository has been archived by the owner on Jan 3, 2023. It is now read-only.

Releases: NervanaSystems/neon

MergeSum, Colornoise layers, CSV batchwriter

05 Feb 22:22
Compare
Choose a tag to compare
  • New MergeSum, Colornoise layers
  • support for aspect_ratio scaling augmentation
  • updated IMDB sentiment analysis example
  • generic CSV batchwriter
  • various build and deserialization bugfixes, doc updates

Kepler GPU support, updated data loader and serialization, expanded model zoo

31 Jan 17:51
Compare
Choose a tag to compare
  • kepler GPU kernel support [#80]
  • new dataloader format, updated docs [#115, #170]
  • new serialization format
  • FastRCNN implementation, ROI pooling support [#135]
  • deep residual nets implementation and example
  • expanded model zoo
  • Ticker dataset and copy, repeat copy tasks
  • autodiff transpose support [#173]
  • numerous bug fixes and documentation updates.

Lookuptable, LRN kernels, deterministic Conv, bugfixes and docs

14 Jan 06:43
Compare
Choose a tag to compare
  • CUDA kernels for lookuptable layer (up to 4x speedup)
  • support for determinstic Conv layer updatesa
  • LRN layer support
  • custom dataset walkthrough utilizing bAbI data
  • reduced number of threads in deep reduction EW kernels [#171]
  • additional (de)serialization routines [#106]
  • CPU tensor slicing fix
  • corrections for PrecisionRecall, MultiLabelStats [#148]
  • explicitly specify python2.7 for virtualenv [#155]
  • default to SM50 when no working GPU found [#186]
  • Add alpha to ELU activation [#164]
  • deconv callback fix [#162]
  • various documentation updates [#151, #152]

Bi-directional RNN's, ELU's, data shuffling, GPU kernel compile speedups

15 Dec 07:56
Compare
Choose a tag to compare
  • Add support for bidirectional RNNs and LSTMs
  • added ELU, leaky ReLU activations
  • significantly faster GPU kernel builds (using ptx instead of cuda-c)
  • data shuffling enhancements, removal of old data loader code.
  • caffe conv, pool, dropout layer matching and compatibility flags
  • add scheduling support for RMSProp
  • callback enhancements, additional unit tests
  • documentation auditing, added links to introductory video tutorials

QA demo, CPU speedups, deconv and histogram visualizations

01 Dec 23:37
Compare
Choose a tag to compare
  • deconvolution and weight histogram visualization examples and documentation
  • CPU convolution and pooling layer speedups (~2x faster)
  • bAbI question and answer interactive demo, dataset support.
  • various ImageLoader enhancements.
  • interactive usage improvements (shortcut Callback import, multiple Callbacks
    init, doc updates, single item batch size support)
  • set default verbosity level to warning
  • CIFAR10 example normalization updates
  • CUDA detection enhancements [#132]
  • only parse batch_writer arguments when used as a script, allow undefined
    global_mean [#137, #140]

new data loader, deconv visualization, recurrent weight loading

18 Nov 04:53
Compare
Choose a tag to compare
  • completely re-written C++ multithreaded dataloader
  • new weight initialization options for recurrent layers
  • Added deconvolution visualization support (guided backprop)
  • new bAbI question answering example network
  • Improved performance of cifar10_allcnn, word_lstm examples
  • new CUDA-C max and avg pooling kernels
  • Additional bugfixes and documentation updates

Bugfixes, benchmarking, and timeseries

06 Nov 19:14
Compare
Choose a tag to compare
  • Callback initialization bug fix [#127]
  • IMDB LSTM example bug fix [#130]
  • Added cuda-convnet2 style binary dropout variant
  • Added benchmark function to model (separate fprop, bprop, update timings)
  • Remove h_buffer references in lieu of outputs for recurrent layers
  • Multi-cost output buffer bugfix for inference [#131]
  • New timeseries prediction and generation example
  • Change Callback initialization to re-support named arguments. Separate out
    these arguments in argparser. [#128]

LayerContainers, Sentiment analysis, and more

30 Oct 21:25
Compare
Choose a tag to compare
  • Sentiment analysis support (LSTM lookupTable based), new IMDB example
  • Support for merge and branch layer stacks via LayerContainers
    • Sequential, Tree, MergeBroadcast, MergeMultiStream
  • Support for freezing layer stacks
  • Adagrad optimizer support
  • new GPU kernels for fast compounding batch norm, conv and pooling engine
    updates, new kernel build system and flags.
  • Modifications for Caffe support
    • conv, pooling, P/Q updates, dropout layer normalization more in-line with
      Caffe approach. NOTE: this breaks backwards compatibility with some
      strided conv/pool related models serialized using older versions of neon
      as the output sizes may now be different. See the FAQ for more info.
    • serialization enhancements to make caffe model import/export easier
    • use per-channel mean subtraction instead of single global. NOTE: this
      breaks backwards compatibility with ImgMaster saved datasets prior to this
      revision. To correct, please use the included update_dataset_cache.py
      script in the util directory.
  • Default training cost display during progress bar is now calculated on a
    rolling window basis rather than from the beginning of each epoch
  • Separate Layer configuration and initialization steps
  • YAML based alexnet example
  • Callback enhancements.
    • now pass args instead of having to spell out callbacks in each example
    • Changed validation callback to loss callback, validation_frequency now
      evaluation_frequency
    • Generic metric callback.
  • Various bug fixes
    • non-contiguous array get for GPUTensors
    • 1D slicing returns 2D matrices
    • bin/neon serialization fixes for RNNs
    • 3D conv fixes for fprop, bprop
    • batch norm inference fix
    • bias layer size fix
  • Documentation updates and improvements

Primarily bug fix release

30 Oct 21:23
Compare
Choose a tag to compare
  • Ensure root logging handler setup [#82]
  • C++ utility for CUDA compatibility checking [#83]
  • Add predict function to models [#86]
  • Fix bug in learning rate schedule impacting deserialization
  • Speed up batch norm computation
  • Average gradients in OpTree, fix tests
  • Use inference mode for fprop during validation
  • Add top-k misclassifcation metric
  • Simplify maxas install, make vis requirements optional, doc updates.

Multi GPU support

20 Jul 18:27
Compare
Choose a tag to compare
Multi GPU support Pre-release
Pre-release

This release implements support for multi GPU processing using weird trick parallelization (data parallel for local layers, model parallel for fully-connected layers) and cleans up previously existing MPI based parallel code.

Multi GPU is only supported on newer Maxwell based cards using the NervanaGPU backend.

Older, Kepler based cards using the cudanet backend are no longer supported (some models and datasets will still work, but others may raise DeprecationWarning's). Users of these cards are encouraged to remain on the 0.8.2 release until we back-port NervanaGPU to support Kepler cards.