- Reductions sum, mean, var, std, max, min, argmax, argmin accept keepdims option.
- The same reductions now return a GPUArray instead of ndarray if axis=None.
- Switch to PEP 440 version numbering.
- Replace distribute_setup.py with ez_setup.py.
- Improve support for latest NVIDIA GPUs.
- Direct links to online NVIDIA documentation in CUBLAS, CUFFT wrapper docstrings.
- Add wrappers for CUSOLVER in CUDA 7.0.
- Add skcuda namespace package that contains all modules in scikits.cuda namespace.
- Add more wrappers for CUBLAS 5 functions (enh. by Teodor Moldovan, Sander Dieleman).
- Add support for CULA Dense Free R17 (enh. by Alex Rubinsteyn).
- Memoize elementwise kernel used by ifft scaling (#37).
- Speed up misc.maxabs using reduction and kernel memoization.
- Speed up misc.cumsum using scan and kernel memoization.
- Speed up linalg.conj and misc.diff using elementwise kernel and memoization.
- Speed up special.{sici,exp1,expi} using elementwise kernel and memoization.
- Add wrappers for experimental multi-GPU CULA routines in CULA Dense R14+.
- Use ldconfig to find library paths rather than libdl (#39).
- Fix win32 platform detection.
- Add Cholesky factorization/solve routines (enh. by Steve Taylor).
- Fix Cholesky factorization/solve routines (fix by Thomas Unterthiner).
- Enable dot() function to operate inplace (enh. by Thomas Unterthiner).
- Python 3 compatibility improvements (enh. by Thomas Unterthiner).
- Support for Fortran-order arrays in dot() and cho_solve() (enh. by Thomas Unterthiner)
- CULA-based matrix inversion (enh. by Thomas Unterthiner).
- Add add_diag() function (enh. by Thomas Unterthiner).
- Use cublas*copy in diag() function (enh. by Thomas Unterthiner).
- Improved MacOSX compatibility (enh. by Michael M. Forbes).
- Find CUBLAS version even when it is only accessible via LD_LIBRARY_PATH (enh. by Frédéric Bastien).
- Get both major and minor version numbers from CUBLAS library when determining version.
- Handle unset LD_LIBRARY_PATH variable (fix by Jan Schlüter).
- Fix library search on MacOS X (fix by capdevc).
- Fix library search on Windows.
- Add Windows support to CULA wrappers.
- Enable specification of memory pool allocator to linalg functions (enh. by Thomas Unterthiner).
- Improve misc.select_block_grid_sizes() logic to handle different GPU hardware.
- Compute transpose using CUDA 5.0 CUBLAS functions rather than with inefficient naive kernel.
- Use ReadTheDocs theme when building HTML docs locally.
- Support additional cufftPlanMany() parameters when creating FFT plans (enh. by Gregory R. Lee).
- Improved Python 3.4 compatibility (enh. by Eric Larson).
- Avoid unnecessary import of cublas when importing fft module (enh. by Eric Larson).
- Matrix trace function (enh. by Thomas Unterthiner).
- Functions for computing simple axis-wise stats over matrices (enh. by Thomas Unterthiner).
- Matrix add_dot, add_matvec, div_matvec, mult_matvec functions (enh. by Thomas Unterthiner).
- Faster dot_diag implementation using CUBLAS matrix-matrix multiplication (enh. by Thomas Unterthiner).
- Memoize SourceModule calls to speed up various high-level functions (enh. by Thomas Unterthiner).
- Function for computing matrix determinant (enh. by Thomas Unterthiner).
- Function for computing min/max and argmin/argmax along a matrix axis (enh. by Thomas Unterthiner).
- Set default value of the parameter 'overwrite' to False in all linalg functions.
- Elementwise arithmetic operations with broadcasting up to 2 dimensions (enh. David Wei Chiang)
- Add complex exponential integral.
- Fix typo in cublasCgbmv.
- Use CUBLAS v2 API, add preliminary support for CUBLAS 5 functions.
- Detect CUBLAS version without initializing the GPU.
- Work around numpy bug #1898.
- Fix issues with pycuda installations done via easy_install/pip.
- Add support for specifying streams when creating FFT plans.
- Successfully find CULA R13a libraries.
- Raise exceptions when functions in the full release of CULA Dense are invoked without the library installed.
- Perform post-fft scaling in-place.
- Fix broken Python 2.6 compatibility (#19).
- Download distribute for package installation if it isn't available.
- Prevent absence of CULA from causing import errors (enh. by Jacob Frelinger)
- FFT batch tests and FFTW mode configuration (enh. by Lars Pastewka)
- Fix bug preventing installation with pip.
- Fix bug in cutoff_invert kernel.
- Add get_compute_capability function and other goodies to misc module.
- Use pycuda-complex.hpp to improve kernel readability.
- Add integrate module.
- Add unit tests for high-level functions.
- Automatically determine device used by current context.
- Support batched and multidimensional FFT operations.
- Extended dot() function to support implicit transpose/Hermitian.
- Support for in-place computation of singular vectors in svd() function.
- Simplify kernel launch setup.
- More CULA routine wrappers.
- Wrappers for CULA R11 auxiliary routines.
- Add support for some functions in the premium version of CULA toolkit.
- Add wrappers for all lapack functions in basic CULA toolkit.
- Fix pinv() to properly invert complex matrices.
- Add Hermitian transpose.
- Add tril function.
- Fix missing library detection.
- Include missing CUDA headers in package.
- Add documentation.
- Update copyright information.
- First public release.