Releases · xlite-dev/LeetCUDA · GitHub

25 Sep 06:07

DefTruth

v2.4.1 Pack LayerNorm

What's Changed

[Nsight] Add nsys/ncu usage, ptx/sass by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/44
[DotProd][FP16] support f16x8_pack kernel by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/45
[LayerNorm][FP16] Add pack support for f16x8 LD/ST by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/46

Full Changelog: DefTruth/CUDA-Learn-Notes@v2.4...v2.4.1

Contributors

DefTruth

Assets 2

24 Sep 02:13

DefTruth

v2.4 Pack Reduce LDST

What's Changed

[Reduce][Kernel] Pack f16/bf16x8 & fp8/i8x16 LD/ST by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/43

Full Changelog: DefTruth/CUDA-Learn-Notes@v2.3.1...v2.4

Contributors

DefTruth

Assets 2

23 Sep 03:44

DefTruth

v2.3.1 f16x8 Pack Elementwise

What's Changed

[FA2][Half] Add FA2 f16_mma_m16n8k16 kernel by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/35
[Refactor][7/N] CUDA Learn Notes refactor Part-7 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/36
Clamped input range in Sigmoid kernel to prevent overflow by @Phoenix8215 in https://github.com/DefTruth/CUDA-Learn-Notes/pull/37
[Sigmoid][F16] Add f16x8_pack kernel, boost 1.5x ~ by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/39
[Elementwise][Half] support f16x8_pack kernel, boost 1.1x by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/40
[FlashAttention] replace FLOAT4 with LDST128BITS macro by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/41
[RELU][FP16] Add f16x8_pack kernel, boost 2.1x by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/42

New Contributors

@Phoenix8215 made their first contribution in https://github.com/DefTruth/CUDA-Learn-Notes/pull/37

Full Changelog: DefTruth/CUDA-Learn-Notes@v2.3...v2.3.1

Contributors

DefTruth and Phoenix8215

Assets 2

17 Sep 07:57

DefTruth

v2.3 Refactor 6/N

What's Changed

[Refactor][6/N] CUDA Learn Notes refactor Part-6 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/17
[Refactor][5/N] CUDA Learn Notes refactor Part-6 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/18
[LayerNorm][Half] support fp16x8 packed LayerNorm by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/19
[Reduce][Half] add HALF2 & BFLOAT2 macro by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/21
[RMSNorm][Half] support fp16x8 packed RMSNorm by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/22
[Bugfix][Kernel] fixed some kernel blocks calculate errors by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/23
[Elementwise][Half] support fp16x8 packed Elementwise by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/24
[Elementwise][Half] support fp16x8 packed Elementwise by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/25
[RELU][Half] support fp16x8 RELU kernel by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/26
[RMSNorm] support f16x8_f32 RMSNorm by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/28
[RMSNorm][Kernel] Add FLOAT2/HALF2_VARIANCE macro by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/29
[LayerNorm][Kernel] Add HALF2 SUM/SUB/VAR macro by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/30
[HGEMM] Add slicked_k&t_8x8_sliced_k_f16x4 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/31
[HGEMV][Half] support hgemv k32/k128/f16 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/32
[FlashAttention] Refactor flash_attn_1_fwd_f32 kernel by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/33
Bump up to v2.3 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/34

Full Changelog: DefTruth/CUDA-Learn-Notes@v2.2...v2.3

Contributors

DefTruth

Assets 2

12 Sep 01:36

DefTruth

v2.2 Refactor 5/N

What's Changed

[Refactor][5/N] CUDA Learn Notes refactor Part-5 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/15
Bump up to v2.2 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/16

Full Changelog: DefTruth/CUDA-Learn-Notes@2.1...v2.2

Contributors

DefTruth

Assets 2

04 Sep 03:16

DefTruth

v2.1 Refactor 4/N Part-4

What's Changed

[Refactor][4/N] CUDA Learn Notes refactor Part-4 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/10
[Refactor][4/N] CUDA Learn Notes refactor Part-4 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/11
[Refactor][4/N] CUDA Learn Notes refactor Part-4 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/12
[Refactor][4/N] CUDA Learn Notes refactor Part-4 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/13
[Refactor][4/N] CUDA Learn Notes refactor Part-4 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/14

Full Changelog: DefTruth/CUDA-Learn-Notes@v2.0...2.1

Contributors

DefTruth

Assets 2

01 Sep 13:09

DefTruth

v2.0 Refactor 4/N

Full Changelog: DefTruth/CUDA-Learn-Notes@v0.8...v2.0

Assets 2

21 Aug 02:22

DefTruth

v0.8

What's Changed

Bump up to v0.8 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/9

Full Changelog: DefTruth/CUDA-Learn-Notes@v0.7...v0.8

Contributors

DefTruth

Assets 2

24 Jul 02:09

DefTruth

CUDA Learn Note v0.7

Full Changelog: DefTruth/CUDA-Learn-Notes@v0.5...v0.6

What's Changed

Bump up to v0.7 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/8

New Contributors

@DefTruth made their first contribution in https://github.com/DefTruth/CUDA-Learn-Notes/pull/8

Full Changelog: DefTruth/CUDA-Learn-Notes@v0.6...v0.7

Contributors

DefTruth

Assets 2

20 Jun 01:14

DefTruth

CUDA Learn Notes v0.5

Full Changelog: DefTruth/CUDA-Learn-Notes@v0.3...v0.5

Assets 2