v8.1.0: Updated types and many Ops improvements
✨ New features and improvements
- Added support for mypy 0.950 and pydantic v1.9.0, added bound types throughout layers and ops (#599).
- Made all
NumpyOps
CPU kernels generic (#627). - Made all custom CUDA kernels generic (#603).
- Added bounds checks for
NumpyOps
(#618). - Fixed out-of-bounds writes in
NumpyOps
andCupyOps
(#664). - Reduced unnecessary zero-init allocations (#632).
- Fixed reductions when applied to zero-length sequences (#637).
- Added
NumpyOps.cblas
to get a table of C BLAS functions (#643, #700). - Improved type-casting in
NumpyOps.asarray
(#656). - Simplified
CupyOps.asarray
(#661). - Fixed
Model.copy()
for layers used more than once (#659). - Fixed potential race in
Shim
(#677). - Convert numpy arrays using dlpack in
xp2tensorflow
andxp2torch
when possible (#686). - Improved speed of
HashEmbed
by avoiding large temporary arrays (#696). - Added
Ops.reduce_last
andOps.reduce_first
(#710). - Numerous test suite improvements.
- Experimental: Add support for Metal Performance Shaders with PyTorch nightlies (#685).
🔴 Bug fixes
- Fix issue #707: Fix label smoothing threshold for
to_categorical
.
⚠️ Backwards incompatibilities
-
In most cases the typing updates allow many casts and ignores to be removed, but types may also need minor modifications following the updates for mypy and pydantic.
-
get_array_module
now returnsNone
for non-numpy/cupy array input rather than returningnumpy
by default. -
The
prefer_gpu
andrequire_gpu
functions no longer set the default PyTorchtorch.Tensor
type totorch.cuda.FloatTensor
. This means that wrapped PyTorch models cannot assume that Tensors are allocated on a CUDA GPU after calling these functions. For example:# Before Thinc v8.1.0, this Tensor would be allocated on the GPU after # {prefer,require}_gpu. Now it will be allocated as a CPU tensor by default. token_mask = torch.arange(max_seq_len) # To ensure correct allocation, specify the device where the Tensor should be allocated. # `input` refers to the input of the model. token_mask = torch.arange(max_seq_len, device=input.device)
This change brings Thinc's behavior in line with how device memory allocation is normally handled in PyTorch.
👥 Contributors
@adrianeboyd, @danieldk, @honnibal, @ines, @kadarakos, @koaning, @richardpaulhudson, @shadeMe, @svlandeg