Change Log

Release 0.5.0 - (under development)

Reductions sum, mean, var, std, max, min, argmax, argmin accept keepdims option.
The same reductions now return a GPUArray instead of ndarray if axis=None.
Switch to PEP 440 version numbering.
Replace distribute_setup.py with ez_setup.py.
Improve support for latest NVIDIA GPUs.
Direct links to online NVIDIA documentation in CUBLAS, CUFFT wrapper docstrings.
Add wrappers for CUSOLVER in CUDA 7.0.
Add skcuda namespace package that contains all modules in scikits.cuda namespace.
Add more wrappers for CUBLAS 5 functions (enh. by Teodor Moldovan, Sander Dieleman).
Add support for CULA Dense Free R17 (enh. by Alex Rubinsteyn).
Memoize elementwise kernel used by ifft scaling (#37).
Speed up misc.maxabs using reduction and kernel memoization.
Speed up misc.cumsum using scan and kernel memoization.
Speed up linalg.conj and misc.diff using elementwise kernel and memoization.
Speed up special.{sici,exp1,expi} using elementwise kernel and memoization.
Add wrappers for experimental multi-GPU CULA routines in CULA Dense R14+.
Use ldconfig to find library paths rather than libdl (#39).
Fix win32 platform detection.
Add Cholesky factorization/solve routines (enh. by Steve Taylor).
Fix Cholesky factorization/solve routines (fix by Thomas Unterthiner).
Enable dot() function to operate inplace (enh. by Thomas Unterthiner).
Python 3 compatibility improvements (enh. by Thomas Unterthiner).
Support for Fortran-order arrays in dot() and cho_solve() (enh. by Thomas Unterthiner)
CULA-based matrix inversion (enh. by Thomas Unterthiner).
Add add_diag() function (enh. by Thomas Unterthiner).
Use cublas*copy in diag() function (enh. by Thomas Unterthiner).
Improved MacOSX compatibility (enh. by Michael M. Forbes).
Find CUBLAS version even when it is only accessible via LD_LIBRARY_PATH (enh. by Frédéric Bastien).
Get both major and minor version numbers from CUBLAS library when determining version.
Handle unset LD_LIBRARY_PATH variable (fix by Jan Schlüter).
Fix library search on MacOS X (fix by capdevc).
Fix library search on Windows.
Add Windows support to CULA wrappers.
Enable specification of memory pool allocator to linalg functions (enh. by Thomas Unterthiner).
Improve misc.select_block_grid_sizes() logic to handle different GPU hardware.
Compute transpose using CUDA 5.0 CUBLAS functions rather than with inefficient naive kernel.
Use ReadTheDocs theme when building HTML docs locally.
Support additional cufftPlanMany() parameters when creating FFT plans (enh. by Gregory R. Lee).
Improved Python 3.4 compatibility (enh. by Eric Larson).
Avoid unnecessary import of cublas when importing fft module (enh. by Eric Larson).
Matrix trace function (enh. by Thomas Unterthiner).
Functions for computing simple axis-wise stats over matrices (enh. by Thomas Unterthiner).
Matrix add_dot, add_matvec, div_matvec, mult_matvec functions (enh. by Thomas Unterthiner).
Faster dot_diag implementation using CUBLAS matrix-matrix multiplication (enh. by Thomas Unterthiner).
Memoize SourceModule calls to speed up various high-level functions (enh. by Thomas Unterthiner).
Function for computing matrix determinant (enh. by Thomas Unterthiner).
Function for computing min/max and argmin/argmax along a matrix axis (enh. by Thomas Unterthiner).
Set default value of the parameter 'overwrite' to False in all linalg functions.
Elementwise arithmetic operations with broadcasting up to 2 dimensions (enh. David Wei Chiang)

Release 0.042 - (March 10, 2013)

Add complex exponential integral.
Fix typo in cublasCgbmv.
Use CUBLAS v2 API, add preliminary support for CUBLAS 5 functions.
Detect CUBLAS version without initializing the GPU.
Work around numpy bug #1898.
Fix issues with pycuda installations done via easy_install/pip.
Add support for specifying streams when creating FFT plans.
Successfully find CULA R13a libraries.
Raise exceptions when functions in the full release of CULA Dense are invoked without the library installed.
Perform post-fft scaling in-place.
Fix broken Python 2.6 compatibility (#19).
Download distribute for package installation if it isn't available.
Prevent absence of CULA from causing import errors (enh. by Jacob Frelinger)
FFT batch tests and FFTW mode configuration (enh. by Lars Pastewka)

Release 0.041 - (May 22, 2011)

Fix bug preventing installation with pip.

Release 0.04 - (May 11, 2011)

Fix bug in cutoff_invert kernel.
Add get_compute_capability function and other goodies to misc module.
Use pycuda-complex.hpp to improve kernel readability.
Add integrate module.
Add unit tests for high-level functions.
Automatically determine device used by current context.
Support batched and multidimensional FFT operations.
Extended dot() function to support implicit transpose/Hermitian.
Support for in-place computation of singular vectors in svd() function.
Simplify kernel launch setup.
More CULA routine wrappers.
Wrappers for CULA R11 auxiliary routines.

Release 0.03 - (November 22, 2010)

Add support for some functions in the premium version of CULA toolkit.
Add wrappers for all lapack functions in basic CULA toolkit.
Fix pinv() to properly invert complex matrices.
Add Hermitian transpose.
Add tril function.
Fix missing library detection.
Include missing CUDA headers in package.

Release 0.02 - (September 21, 2010)

Add documentation.
Update copyright information.

Release 0.01 - (September 17, 2010)

First public release.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!