NVIDIA / TransformerEngine Public

Notifications
Fork 366
Star 2.2k

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Pull requests: NVIDIA/TransformerEngine

Labels 39 Milestones 0

New pull request New

60 Open 1,030 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Add paged attention support

#1355 opened Dec 4, 2024 by cyanguwa

8 of 13 tasks

[PyTorch] Bugfix for wgrad bulk overlap conflict when dgrad overlap is reduce-scatter bug

Something isn't working

#1341 opened Nov 18, 2024 by denera

Loading…

6 of 13 tasks

[C/JAX] Comm+GEMM Overlap API for TE/JAX enhancement

New feature or request

jax

#1337 opened Nov 15, 2024 by denera • Draft

3 of 13 tasks

Build with uv instead of just pip

#1324 opened Nov 8, 2024 by jennifgcrl

Loading…

5 of 13 tasks

TP communication overlap: enable the overlap between GEMM chunk at Ho…

#1311 opened Nov 4, 2024 by erhoo82

Loading…

1 of 13 tasks

[JAX] Collective GEMM custom op with nvte_cublas_gemm (no comm. overlap) jax

#1307 opened Nov 2, 2024 by denera

Loading…

7 of 17 tasks

[PyTorch] Add heuristics for intializing FP8 params enhancement

New feature or request

#1300 opened Oct 30, 2024 by timmoon10

Loading…

8 of 13 tasks

Offloading example

#1299 opened Oct 29, 2024 by sanandaraj5597

Loading…

[PyTorch] Fix autocast deprecation warnings

#1277 opened Oct 21, 2024 by yaox12

Loading…

13 tasks

[PyTorch] Remove sequence parallel check for setting dropout RNG context

#1272 opened Oct 18, 2024 by ksivaman • Draft

1 of 13 tasks

attention_mask fill with -inf for UnfusedDotProductAttention

#1268 opened Oct 18, 2024 by Agoniii

Loading…

1 of 13 tasks

Draft: reduce cudagraph mem via preoallcations

#1253 opened Oct 15, 2024 by JimmyZhang12

Loading…

13 tasks

[pyTorch] Infrastructure for C++ QuantizedTensor

#1251 opened Oct 14, 2024 by ptrendx • Draft

13 tasks

fused out correction in CP

#1248 opened Oct 14, 2024 by xiaoyao0115

Loading…

12 tasks

Save CUDA Graph memory by reusing input and output tensors

#1234 opened Oct 9, 2024 by buptzyb

Loading…

5 of 13 tasks

[PyTorch] Improve CP P2P efficiency

#1208 opened Sep 26, 2024 by yenchenlin

Loading…

1 of 6 tasks

Draft: Use fused push_send_recv kernel for TP AG and RS overlaps

#1200 opened Sep 24, 2024 by erhoo82

Loading…

13 tasks

[WIP] [PyTorch] Proof-of-concept for using operation-based API in modules

#1173 opened Sep 10, 2024 by timmoon10 • Draft

2 of 13 tasks

Fix autocast deprecation warning.

#1167 opened Sep 6, 2024 by jondeaton

Loading…

[PyTorch] Avoid saving fp8_tensors in certain scenarios

#1143 opened Aug 28, 2024 by cyanguwa

Loading…

8 of 13 tasks

Norms Refractor

#1140 opened Aug 27, 2024 by phu0ngng • Draft

5 of 13 tasks

Fix param input order for cudagraph bug

Something isn't working

#1138 opened Aug 27, 2024 by yifeis-nv

Loading…

4 of 13 tasks

Add high_precision_init_val to model params when using fp8_model_init

#1121 opened Aug 19, 2024 by kunlunl

Loading…

8 of 13 tasks

Use pyproject.toml to specify build requirements build

Build system

#1061 opened Jul 30, 2024 by ksivaman

Loading…

6 of 13 tasks

Change condition for ub tp overlap.

#1055 opened Jul 29, 2024 by Victarry

Loading…

1 of 13 tasks

Previous 1 2 3 Next

Previous Next

ProTip! no:milestone will show everything without a milestone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly