|
2 | 2 |
|
3 | 3 | Notable changes to this project will be documented in this file.
|
4 | 4 |
|
5 |
| -## [Unreleased] |
| 5 | +## 0.3.0 - 2/7/22 |
6 | 6 |
|
7 | 7 | ### TLDR
|
8 | 8 |
|
@@ -31,62 +31,62 @@ pull in `cust_core` in GPU crates for deriving `DeviceCopy` without cfg shenanig
|
31 | 31 |
|
32 | 32 | ### Removed
|
33 | 33 |
|
34 |
| -- Deleted `DeviceBox::wrap`, use `DeviceBox::from_raw`. |
35 |
| -- Deleted `DeviceSlice::as_ptr` and `DeviceSlice::as_mut_ptr`. Use `DeviceSlice::as_device_ptr` then `DevicePointer::as_(mut)_ptr`. |
36 |
| -- Deleted `DeviceSlice::chunks` and consequently `DeviceChunks`. |
37 |
| -- Deleted `DeviceSlice::chunks_mut` and consequently `DeviceChunksMut`. |
38 |
| -- Deleted `DeviceSlice::from_slice` and `DeviceSlice::from_slice_mut` because it was unsound. |
39 |
| -- Deleted `DevicePointer::as_raw_mut` (use `DevicePointer::as_mut_ptr`). |
40 |
| -- Deleted `DevicePointer::wrap` (use `DevicePointer::from_raw`). |
| 34 | +- `DeviceBox::wrap`, use `DeviceBox::from_raw`. |
| 35 | +- `DeviceSlice::as_ptr` and `DeviceSlice::as_mut_ptr`. Use `DeviceSlice::as_device_ptr` then `DevicePointer::as_(mut)_ptr`. |
| 36 | +- `DeviceSlice::chunks` and consequently `DeviceChunks`. |
| 37 | +- `DeviceSlice::chunks_mut` and consequently `DeviceChunksMut`. |
| 38 | +- `DeviceSlice::from_slice` and `DeviceSlice::from_slice_mut` because it was unsound. |
| 39 | +- `DevicePointer::as_raw_mut` (use `DevicePointer::as_mut_ptr`). |
| 40 | +- `DevicePointer::wrap` (use `DevicePointer::from_raw`). |
41 | 41 | - `DeviceSlice` no longer implements `Index` and `IndexMut`, switching away from `[T]` made this impossible to implement.
|
42 | 42 | Instead you can now use `DeviceSlice::index` which behaves the same.
|
43 | 43 | - `vek` is no longer re-exported.
|
44 | 44 |
|
45 | 45 | ### Deprecated
|
46 | 46 |
|
47 |
| -- Deprecated `Module::from_str`, use `Module::from_ptx` and pass `&[]` for options. |
48 |
| -- Deprecated `Module::load_from_string`, use `Module::from_ptx_cstr`. |
| 47 | +- `Module::from_str`, use `Module::from_ptx` and pass `&[]` for options. |
| 48 | +- `Module::load_from_string`, use `Module::from_ptx_cstr`. |
49 | 49 |
|
50 | 50 | ### Added
|
51 | 51 |
|
52 |
| -- Added `cust::memory::LockedBox`, same as `LockedBuffer` except for single elements. |
53 |
| -- Added `cust::memory::cuda_malloc_async`. |
54 |
| -- Added `cust::memory::cuda_free_async`. |
55 |
| -- Added `impl AsyncCopyDestination<LockedBox<T>> for DeviceBox<T>` for async HtoD/DtoH memcpy. |
56 |
| -- Added `DeviceBox::new_async`. |
57 |
| -- Added `DeviceBox::drop_async`. |
58 |
| -- Added `DeviceBox::zeroed_async`. |
59 |
| -- Added `DeviceBox::uninitialized_async`. |
60 |
| -- Added `DeviceBuffer::uninitialized_async`. |
61 |
| -- Added `DeviceBuffer::drop_async`. |
62 |
| -- Added `DeviceBuffer::zeroed`. |
63 |
| -- Added `DeviceBuffer::zeroed_async`. |
64 |
| -- Added `DeviceBuffer::cast`. |
65 |
| -- Added `DeviceBuffer::try_cast`. |
66 |
| -- Added `DeviceSlice::set_8` and `DeviceSlice::set_8_async`. |
67 |
| -- Added `DeviceSlice::set_16` and `DeviceSlice::set_16_async`. |
68 |
| -- Added `DeviceSlice::set_32` and `DeviceSlice::set_32_async`. |
69 |
| -- Added `DeviceSlice::set_zero` and `DeviceSlice::set_zero_async`. |
70 |
| -- Added the `bytemuck` feature which is enabled by default. |
71 |
| -- Added mint integration behind `impl_mint`. |
72 |
| -- Added half integration behind `impl_half`. |
73 |
| -- Added glam integration behind `impl_glam`. |
74 |
| -- Added experimental linux external memory import APIs through `cust::external::ExternalMemory`. |
75 |
| -- Added `DeviceBuffer::as_slice`. |
76 |
| -- Added `DeviceVariable`, a simple wrapper around `DeviceBox<T>` and `T` which allows easy management of a CPU and GPU version of a type. |
77 |
| -- Added `DeviceMemory`, a trait describing any region of GPU memory that can be described with a pointer + a length. |
78 |
| -- Added `memcpy_htod`, a wrapper around `cuMemcpyHtoD_v2`. |
79 |
| -- Added `mem_get_info` to query the amount of free and total memory. |
80 |
| -- Added `DevicePointer::as_ptr` and `DevicePointer::as_mut_ptr` for `*const T` and `*mut T`. |
81 |
| -- Added `DevicePointer::from_raw` for `CUdeviceptr -> DevicePointer<T>` with a safe function. |
82 |
| -- Added `DevicePointer::cast`. |
83 |
| -- Added dependency on `cust_core` for `DeviceCopy`. |
84 |
| -- Added `ModuleJitOption`, `JitFallback`, `JitTarget`, and `OptLevel` for specifying options when loading a module. Note that |
| 52 | +- `cust::memory::LockedBox`, same as `LockedBuffer` except for single elements. |
| 53 | +- `cust::memory::cuda_malloc_async`. |
| 54 | +- `cust::memory::cuda_free_async`. |
| 55 | +- `impl AsyncCopyDestination<LockedBox<T>> for DeviceBox<T>` for async HtoD/DtoH memcpy. |
| 56 | +- `DeviceBox::new_async`. |
| 57 | +- `DeviceBox::drop_async`. |
| 58 | +- `DeviceBox::zeroed_async`. |
| 59 | +- `DeviceBox::uninitialized_async`. |
| 60 | +- `DeviceBuffer::uninitialized_async`. |
| 61 | +- `DeviceBuffer::drop_async`. |
| 62 | +- `DeviceBuffer::zeroed`. |
| 63 | +- `DeviceBuffer::zeroed_async`. |
| 64 | +- `DeviceBuffer::cast`. |
| 65 | +- `DeviceBuffer::try_cast`. |
| 66 | +- `DeviceSlice::set_8` and `DeviceSlice::set_8_async`. |
| 67 | +- `DeviceSlice::set_16` and `DeviceSlice::set_16_async`. |
| 68 | +- `DeviceSlice::set_32` and `DeviceSlice::set_32_async`. |
| 69 | +- `DeviceSlice::set_zero` and `DeviceSlice::set_zero_async`. |
| 70 | +- the `bytemuck` feature which is enabled by default. |
| 71 | +- mint integration behind `impl_mint`. |
| 72 | +- half integration behind `impl_half`. |
| 73 | +- glam integration behind `impl_glam`. |
| 74 | +- experimental linux external memory import APIs through `cust::external::ExternalMemory`. |
| 75 | +- `DeviceBuffer::as_slice`. |
| 76 | +- `DeviceVariable`, a simple wrapper around `DeviceBox<T>` and `T` which allows easy management of a CPU and GPU version of a type. |
| 77 | +- `DeviceMemory`, a trait describing any region of GPU memory that can be described with a pointer + a length. |
| 78 | +- `memcpy_htod`, a wrapper around `cuMemcpyHtoD_v2`. |
| 79 | +- `mem_get_info` to query the amount of free and total memory. |
| 80 | +- `DevicePointer::as_ptr` and `DevicePointer::as_mut_ptr` for `*const T` and `*mut T`. |
| 81 | +- `DevicePointer::from_raw` for `CUdeviceptr -> DevicePointer<T>` with a safe function. |
| 82 | +- `DevicePointer::cast`. |
| 83 | +- dependency on `cust_core` for `DeviceCopy`. |
| 84 | +- `ModuleJitOption`, `JitFallback`, `JitTarget`, and `OptLevel` for specifying options when loading a module. Note that |
85 | 85 | `ModuleJitOption::MaxRegisters` does not seem to work currently, but NVIDIA is looking into it.
|
86 | 86 | You can achieve the same goal by compiling the ptx to cubin using nvcc then loading that: `nvcc --cubin foo.ptx -maxrregcount=REGS`
|
87 |
| -- Added `Module::from_fatbin`. |
88 |
| -- Added `Module::from_cubin`. |
89 |
| -- Added `Module::from_ptx` and `Module::from_ptx_cstr`. |
| 87 | +- `Module::from_fatbin`. |
| 88 | +- `Module::from_cubin`. |
| 89 | +- `Module::from_ptx` and `Module::from_ptx_cstr`. |
90 | 90 | - `Stream`, `Module`, `Linker`, `Function`, `Event`, `UnifiedBox`, `ArrayObject`, `LockedBuffer`, `LockedBox`, `DeviceSlice`, `DeviceBuffer`, and `DeviceBox` all now impl `Send` and `Sync`, this makes
|
91 | 91 | it much easier to write multigpu code. The CUDA API is fully thread-safe except for graph objects.
|
92 | 92 |
|
|
0 commit comments