Paged attention changes to THD attention #3

sudhakarsingh27 · 2024-12-09T22:53:00Z

Description

Checking how difficult it is to merge Paged Attention changes into THD Attention changes

fix an int conversion error Signed-off-by: Jennifer Zhou <[email protected]>

Debug ONNX export with te.Sequential ONNX export assumes that all state dict objects are tensor, even extra state. Signed-off-by: Tim Moon <[email protected]>

…tructure (NVIDIA#1326) * Remove manual FP8 scale update for FP8 params Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * lint Signed-off-by: Kirthi Shankar Sivamani <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Signed-off-by: Kirthi Shankar Sivamani <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Kirthi Shankar Sivamani <[email protected]>

…#1334) * Limit to one call of ctx.saved_tensors per autograd bwd Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Kirthi Shankar Sivamani <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Add activation ops Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix lint warnings Signed-off-by: Tim Moon <[email protected]> * Fix linter warning Signed-off-by: Tim Moon <[email protected]> * Update to use QuantizedTensor Signed-off-by: Tim Moon <[email protected]> * Respect PyTorch autograd dtype Signed-off-by: Tim Moon <[email protected]> * Rename CastFloat8 op to Quantize Signed-off-by: Tim Moon <[email protected]> * Add support for fused dSwiGLU-cast-transpose Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Tim Moon <[email protected]> Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Signed-off-by: Przemek Tredak <[email protected]>

…1333) use CMAKE_CURRENT_SOURCE_DIR instead of CMAKE_SOURCE_DIR Signed-off-by: Kenichi Maehashi <[email protected]>

* fix GQA error message Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Charlene Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Handle deprecated `hidden_size` arg in norm modules Signed-off-by: Tim Moon <[email protected]> * Support initializing norm ops on CPU Signed-off-by: Tim Moon <[email protected]> * Add integration test for Megatron-LM Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Rename Mcore integration test Signed-off-by: Tim Moon <[email protected]> * Handle case in RMSNorm where hidden dim is not provided Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Add helper function to convert C++ container to string Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Align RNG tracker with megatron Signed-off-by: Robin Zhang <[email protected]> Co-authored-by: Yifei Song <[email protected]> * Fix module_params order and warmup bug in cudagraph Signed-off-by: Robin Zhang <[email protected]> Co-authored-by: Yifei Song <[email protected]> * Add fp8_group argument and fix fp8 accuracy issue for cudagraph Signed-off-by: Robin Zhang <[email protected]> Co-authored-by: Yifei Song <[email protected]> * Add TE modules and weights filters to support MoE models Signed-off-by: Robin Zhang <[email protected]> Co-authored-by: Yifei Song <[email protected]> * Revert self.fp8 Signed-off-by: Robin Zhang <[email protected]> * Use hooks to filter module params Signed-off-by: Robin Zhang <[email protected]> * Filter all TE modules in hooks Signed-off-by: Robin Zhang <[email protected]> Co-authored-by: Yifei Song <[email protected]> * Format code Signed-off-by: Robin Zhang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update graph.py Signed-off-by: Xin Yao <[email protected]> * Revert CudaRNGStatesTracker Signed-off-by: Robin Zhang <[email protected]> * Format Update Signed-off-by: Yifei Song <[email protected]> * Revert "Use hooks to filter module params" This reverts commit 73a22e2. Signed-off-by: Yifei Song <[email protected]> * Remove filtering module params Signed-off-by: Robin Zhang <[email protected]> --------- Signed-off-by: Robin Zhang <[email protected]> Signed-off-by: Xin Yao <[email protected]> Signed-off-by: Yifei Song <[email protected]> Co-authored-by: Yifei Song <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xin Yao <[email protected]> Co-authored-by: Xin Yao <[email protected]> Co-authored-by: Tim Moon <[email protected]>

Moved framework agnostic THD kernels to common. --------- Signed-off-by: Michael Goldfarb <[email protected]>

* retain_graph=True for grouped gemm Signed-off-by: Xiaowei Ren <[email protected]> * remove an unnecessary retain_graph=True Signed-off-by: Xiaowei Ren <[email protected]> * make retain_graph in graph capture configurable Signed-off-by: Xiaowei Ren <[email protected]> * typo fix Signed-off-by: Xiaowei Ren <[email protected]> --------- Signed-off-by: Xiaowei Ren <[email protected]>

* Update list of CI users Signed-off-by: Tim Moon <[email protected]> * Update list of CI users Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]>

…age (NVIDIA#1308) * draft implementation Signed-off-by: Youngeun Kwon <[email protected]> * compile error fix Signed-off-by: Youngeun Kwon <[email protected]> * fix compile error Signed-off-by: Youngeun Kwon <[email protected]> * remove print Signed-off-by: Youngeun Kwon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Edit comments Signed-off-by: Youngeun Kwon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * edit the bulk-overlap test case Signed-off-by: Youngeun Kwon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add version guard Signed-off-by: Youngeun Kwon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add runtime version guard Signed-off-by: Youngeun Kwon <[email protected]> * fix the version guard Signed-off-by: Youngeun Kwon <[email protected]> --------- Signed-off-by: Youngeun Kwon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Signed-off-by: Charlene Yang <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Charlene Yang <[email protected]>

for more information, see https://pre-commit.ci

…1347) Scale sequence length in CP tests to avoid tiny sizes. Signed-off-by: Michael Goldfarb <[email protected]>

Debug jobs to deploy nightly docs Signed-off-by: Tim Moon <[email protected]>

Store module extra state in tensor Signed-off-by: Tim Moon <[email protected]>

* always have padding mask type for both flash and fused attentions Signed-off-by: Xiaowei Ren <[email protected]> * remove an redundant assert Signed-off-by: Xiaowei Ren <[email protected]> --------- Signed-off-by: Xiaowei Ren <[email protected]>

Debug Mcore integration test Avoid FP8 on Ampere and older. Generate synthetic data instead of depending on external data. Signed-off-by: Tim Moon <[email protected]>

Fix typo Signed-off-by: Tim Moon <[email protected]> Co-authored-by: Kirthi Shankar Sivamani <[email protected]>

* fix fuse_wgrad_accumulation for GroupedLinear Signed-off-by: Xin Yao <[email protected]> * fix fuse_wgrad_accumulation for GroupedLinear Signed-off-by: Xin Yao <[email protected]> * update tests Signed-off-by: Xin Yao <[email protected]> --------- Signed-off-by: Xin Yao <[email protected]> Co-authored-by: Tim Moon <[email protected]>

Signed-off-by: Charlene Yang <[email protected]>

* Fix te sequential for older pytorch versions Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * FIxes Signed-off-by: Kirthi Shankar Sivamani <[email protected]> --------- Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

* commit some debug code Signed-off-by: Xiaowei Ren <[email protected]> * add more debug info Signed-off-by: Xiaowei Ren <[email protected]> * debug code commit and typo fix Signed-off-by: Xiaowei Ren <[email protected]> * a typo fix Signed-off-by: Xiaowei Ren <[email protected]> * remove debug info Signed-off-by: Xiaowei Ren <[email protected]> * do not return lse Signed-off-by: Xiaowei Ren <[email protected]> * add amax_per_step for quantizers of CP Signed-off-by: Xiaowei Ren <[email protected]> * fix FP8 + CP Signed-off-by: Xiaowei Ren <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * bug fix Signed-off-by: Xiaowei Ren <[email protected]> * bug fix Signed-off-by: Xiaowei Ren <[email protected]> * dtype fix Signed-off-by: Xiaowei Ren <[email protected]> * bug fix Signed-off-by: Xiaowei Ren <[email protected]> --------- Signed-off-by: Xiaowei Ren <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xiaowei Ren <[email protected]>

Signed-off-by: Charlene Yang <[email protected]>

for more information, see https://pre-commit.ci

…NVIDIA#1466) Use same API in optimizer zero_grad as PyT optimizers Signed-off-by: Tim Moon <[email protected]>

…1498) * Remove dependency on transformer_engine::Tensor in attention.cu Signed-off-by: Kshitij Janardan Lakhani <[email protected]> * Templatize thd_partition_indices_kernel and thd_read_half_tensor_kernel kernels ONLY for invoking recompilation and not directly using the pre-compiled symbols in libtransformer.so Signed-off-by: Kshitij Janardan Lakhani <[email protected]> * Modify attention.cu for thd templatized kernels. Remove dependency on common.h Signed-off-by: Kshitij Janardan Lakhani <[email protected]> * Move thd structs from libtransformer.so to framework extensions include header Code cleanup Signed-off-by: Kshitij Janardan Lakhani <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Consolidate and move thd_utils from common to framework extensions Signed-off-by: Kshitij Janardan Lakhani <[email protected]> * Remove template decorators around thd_partition_indices_kernel and thd_read_half_tensor_kernel Signed-off-by: Kshitij Janardan Lakhani <[email protected]> Code clean up Signed-off-by: Kshitij Janardan Lakhani <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Kshitij Janardan Lakhani <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Signed-off-by: Charlene Yang <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Charlene Yang <[email protected]>

* fix Signed-off-by: Pawel Gadzinski <[email protected]> * reshape inp Signed-off-by: Pawel Gadzinski <[email protected]> --------- Signed-off-by: Pawel Gadzinski <[email protected]>

* non-exit tests Signed-off-by: Pawel Gadzinski <[email protected]> * fix Signed-off-by: Pawel Gadzinski <[email protected]> * fix Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Pawel Gadzinski <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Signed-off-by: Charlene Yang <[email protected]>

for more information, see https://pre-commit.ci

jennifgcrl and others added 30 commits November 12, 2024 20:30

Fix an int conversion error (NVIDIA#1325)

943f1e0

fix an int conversion error Signed-off-by: Jennifer Zhou <[email protected]>

[PyTorch] Fix ONNX export bug with operation-based API (NVIDIA#1320)

c0a539c

Debug ONNX export with te.Sequential ONNX export assumes that all state dict objects are tensor, even extra state. Signed-off-by: Tim Moon <[email protected]>

Changed VERSION to 1.14.0.dev

89e3292

Signed-off-by: Przemek Tredak <[email protected]>

Use CMAKE_CURRENT_SOURCE_DIR instead of CMAKE_SOURCE_DIR (NVIDIA#…

994f19d

…1333) use CMAKE_CURRENT_SOURCE_DIR instead of CMAKE_SOURCE_DIR Signed-off-by: Kenichi Maehashi <[email protected]>

[Common] Moved framework agnostic THD kernels to common. (NVIDIA#1339)

60ce21f

Moved framework agnostic THD kernels to common. --------- Signed-off-by: Michael Goldfarb <[email protected]>

Update list of CI users (NVIDIA#1340)

0951971

* Update list of CI users Signed-off-by: Tim Moon <[email protected]> * Update list of CI users Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]>

add paged attention; test_kv_cache_accuray and test_paged_attn pass

44f6ff2

Signed-off-by: Charlene Yang <[email protected]>

remove unnecessary change from last commit

06605e5

Signed-off-by: Charlene Yang <[email protected]>

test_fused_attn pass

0b2eb88

Signed-off-by: Charlene Yang <[email protected]>

Merge branch 'main' into paged_attention

d243b79

Signed-off-by: Charlene Yang <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

b0a5da4

for more information, see https://pre-commit.ci

remove unnecessary import in test_numerics

b4efd71

Signed-off-by: Charlene Yang <[email protected]>

add license for test

e637a07

Signed-off-by: Charlene Yang <[email protected]>

fix lint

767c8f5

Signed-off-by: Charlene Yang <[email protected]>

add to L0 test

a3bb14f

Signed-off-by: Charlene Yang <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

d65933c

for more information, see https://pre-commit.ci

[JAX] Scale sequence length in CP tests to avoid tiny sizes. (NVIDIA#…

d3cbccd

…1347) Scale sequence length in CP tests to avoid tiny sizes. Signed-off-by: Michael Goldfarb <[email protected]>

Debug nightly docs (NVIDIA#1338)

71ada55

Debug jobs to deploy nightly docs Signed-off-by: Tim Moon <[email protected]>

[PyTorch] Store module extra state in tensor (NVIDIA#1335)

8c00424

Store module extra state in tensor Signed-off-by: Tim Moon <[email protected]>

Disable FP8 in Mcore integration test on older GPUs (NVIDIA#1357)

d8b13cb

Debug Mcore integration test Avoid FP8 on Ampere and older. Generate synthetic data instead of depending on external data. Signed-off-by: Tim Moon <[email protected]>

timmoon10 and others added 3 commits February 19, 2025 02:40

[PyTorch] Fix typo (NVIDIA#1495)

56c0c07

Fix typo Signed-off-by: Tim Moon <[email protected]> Co-authored-by: Kirthi Shankar Sivamani <[email protected]>

commit two files missed by bcef6b3

33b430f

Signed-off-by: Charlene Yang <[email protected]>

cyanguwa force-pushed the paged_attention branch from e710273 to 33b430f Compare February 19, 2025 23:19

ksivaman and others added 3 commits February 19, 2025 23:30

WIP: thd_bshd_bshd

1c31b68

Signed-off-by: Charlene Yang <[email protected]>

cyanguwa force-pushed the paged_attention branch 2 times, most recently from 6fcad33 to f5b91c6 Compare February 21, 2025 23:33

WIP: fix last commit

7331a4c

Signed-off-by: Charlene Yang <[email protected]>

cyanguwa force-pushed the paged_attention branch from 5b4117b to 7331a4c Compare February 22, 2025 00:02

WIP: fix 1c31b68

0341de7

Signed-off-by: Charlene Yang <[email protected]>

cyanguwa force-pushed the paged_attention branch from cbad5ea to 0341de7 Compare February 22, 2025 01:07

WIP: add bshd_2sbhd, sbhd_2bshd

6bd61a7

Signed-off-by: Charlene Yang <[email protected]>

cyanguwa force-pushed the paged_attention branch from c699139 to 6bd61a7 Compare February 22, 2025 01:40

pre-commit-ci bot and others added 6 commits February 22, 2025 01:41

[pre-commit.ci] auto fixes from pre-commit.com hooks

2d30bb1

for more information, see https://pre-commit.ci

[PyTorch] Use same API in optimizer zero_grad as PyTorch optimizers (…

b4fbc2b

…NVIDIA#1466) Use same API in optimizer zero_grad as PyT optimizers Signed-off-by: Tim Moon <[email protected]>

Merge branch 'main' into paged_attention

a391a49

WIP: some cleanup

9ec3649

Signed-off-by: Charlene Yang <[email protected]>

WIP: all qkv_format combinations and merge CM files

93235dd

Signed-off-by: Charlene Yang <[email protected]>

cyanguwa force-pushed the paged_attention branch from 8f8a81e to 93235dd Compare February 23, 2025 18:02

pre-commit-ci bot and others added 8 commits February 23, 2025 18:02

[pre-commit.ci] auto fixes from pre-commit.com hooks

b476244

for more information, see https://pre-commit.ci

WIP: some lint fixes

3cb001d

Signed-off-by: Charlene Yang <[email protected]>

WIP: add docstring for IP

583b76f

Signed-off-by: Charlene Yang <[email protected]>

[Pytorch] Added missing assert_dim_for_fp8_exec for Linear

d668f18

* fix Signed-off-by: Pawel Gadzinski <[email protected]> * reshape inp Signed-off-by: Pawel Gadzinski <[email protected]> --------- Signed-off-by: Pawel Gadzinski <[email protected]>

fix sequences_pre

f13b861

Signed-off-by: Charlene Yang <[email protected]>

Merge branch 'main' into paged_attention

62cffc8

Signed-off-by: Charlene Yang <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

a06d72c

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Paged attention changes to THD attention #3

Paged attention changes to THD attention #3

sudhakarsingh27 commented Dec 9, 2024

Paged attention changes to THD attention #3

Are you sure you want to change the base?

Paged attention changes to THD attention #3

Conversation

sudhakarsingh27 commented Dec 9, 2024

Description