-
Notifications
You must be signed in to change notification settings - Fork 366
Pull requests: NVIDIA/TransformerEngine
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[PyTorch] Bugfix for wgrad bulk overlap conflict when dgrad overlap is reduce-scatter
bug
Something isn't working
#1341
opened Nov 18, 2024 by
denera
•
Review required
6 of 13 tasks
Build with uv instead of just pip
#1324
opened Nov 8, 2024 by
jennifgcrl
•
Review required
5 of 13 tasks
TP communication overlap: enable the overlap between GEMM chunk at Ho…
#1311
opened Nov 4, 2024 by
erhoo82
Loading…
1 of 13 tasks
[JAX] Collective GEMM custom op with
nvte_cublas_gemm
(no comm. overlap)
jax
#1307
opened Nov 2, 2024 by
denera
Loading…
7 of 17 tasks
[PyTorch] Add heuristics for intializing FP8 params
enhancement
New feature or request
#1300
opened Oct 30, 2024 by
timmoon10
Loading…
8 of 13 tasks
attention_mask fill with -inf for UnfusedDotProductAttention
#1268
opened Oct 18, 2024 by
Agoniii
Loading…
1 of 13 tasks
Draft: reduce cudagraph mem via preoallcations
#1253
opened Oct 15, 2024 by
JimmyZhang12
Loading…
13 tasks
Save CUDA Graph memory by reusing input and output tensors
#1234
opened Oct 9, 2024 by
buptzyb
Loading…
5 of 13 tasks
Draft: Use fused push_send_recv kernel for TP AG and RS overlaps
#1200
opened Sep 24, 2024 by
erhoo82
Loading…
13 tasks
[PyTorch] Avoid saving fp8_tensors in certain scenarios
#1143
opened Aug 28, 2024 by
cyanguwa
Loading…
8 of 13 tasks
Fix param input order for cudagraph
bug
Something isn't working
#1138
opened Aug 27, 2024 by
yifeis-nv
Loading…
4 of 13 tasks
Add high_precision_init_val to model params when using fp8_model_init
#1121
opened Aug 19, 2024 by
kunlunl
Loading…
8 of 13 tasks
Use pyproject.toml to specify build requirements
build
Build system
#1061
opened Jul 30, 2024 by
ksivaman
Loading…
6 of 13 tasks
ProTip!
Add no:assignee to see everything that’s not assigned.