forked from NVIDIA/TransformerEngine
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Paged attention changes to THD attention #3
Draft
sudhakarsingh27
wants to merge
124
commits into
sudhakarsingh27:te_gemma_generation_support
Choose a base branch
from
cyanguwa:paged_attention
base: te_gemma_generation_support
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Paged attention changes to THD attention #3
sudhakarsingh27
wants to merge
124
commits into
sudhakarsingh27:te_gemma_generation_support
from
cyanguwa:paged_attention
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
fix an int conversion error Signed-off-by: Jennifer Zhou <[email protected]>
Debug ONNX export with te.Sequential ONNX export assumes that all state dict objects are tensor, even extra state. Signed-off-by: Tim Moon <[email protected]>
…tructure (NVIDIA#1326) * Remove manual FP8 scale update for FP8 params Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * lint Signed-off-by: Kirthi Shankar Sivamani <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Signed-off-by: Kirthi Shankar Sivamani <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Kirthi Shankar Sivamani <[email protected]>
…#1334) * Limit to one call of ctx.saved_tensors per autograd bwd Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Kirthi Shankar Sivamani <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Add activation ops Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix lint warnings Signed-off-by: Tim Moon <[email protected]> * Fix linter warning Signed-off-by: Tim Moon <[email protected]> * Update to use QuantizedTensor Signed-off-by: Tim Moon <[email protected]> * Respect PyTorch autograd dtype Signed-off-by: Tim Moon <[email protected]> * Rename CastFloat8 op to Quantize Signed-off-by: Tim Moon <[email protected]> * Add support for fused dSwiGLU-cast-transpose Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Tim Moon <[email protected]> Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Przemek Tredak <[email protected]>
…1333) use CMAKE_CURRENT_SOURCE_DIR instead of CMAKE_SOURCE_DIR Signed-off-by: Kenichi Maehashi <[email protected]>
* fix GQA error message Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Charlene Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Handle deprecated `hidden_size` arg in norm modules Signed-off-by: Tim Moon <[email protected]> * Support initializing norm ops on CPU Signed-off-by: Tim Moon <[email protected]> * Add integration test for Megatron-LM Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Rename Mcore integration test Signed-off-by: Tim Moon <[email protected]> * Handle case in RMSNorm where hidden dim is not provided Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Add helper function to convert C++ container to string Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Align RNG tracker with megatron Signed-off-by: Robin Zhang <[email protected]> Co-authored-by: Yifei Song <[email protected]> * Fix module_params order and warmup bug in cudagraph Signed-off-by: Robin Zhang <[email protected]> Co-authored-by: Yifei Song <[email protected]> * Add fp8_group argument and fix fp8 accuracy issue for cudagraph Signed-off-by: Robin Zhang <[email protected]> Co-authored-by: Yifei Song <[email protected]> * Add TE modules and weights filters to support MoE models Signed-off-by: Robin Zhang <[email protected]> Co-authored-by: Yifei Song <[email protected]> * Revert self.fp8 Signed-off-by: Robin Zhang <[email protected]> * Use hooks to filter module params Signed-off-by: Robin Zhang <[email protected]> * Filter all TE modules in hooks Signed-off-by: Robin Zhang <[email protected]> Co-authored-by: Yifei Song <[email protected]> * Format code Signed-off-by: Robin Zhang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update graph.py Signed-off-by: Xin Yao <[email protected]> * Revert CudaRNGStatesTracker Signed-off-by: Robin Zhang <[email protected]> * Format Update Signed-off-by: Yifei Song <[email protected]> * Revert "Use hooks to filter module params" This reverts commit 73a22e2. Signed-off-by: Yifei Song <[email protected]> * Remove filtering module params Signed-off-by: Robin Zhang <[email protected]> --------- Signed-off-by: Robin Zhang <[email protected]> Signed-off-by: Xin Yao <[email protected]> Signed-off-by: Yifei Song <[email protected]> Co-authored-by: Yifei Song <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xin Yao <[email protected]> Co-authored-by: Xin Yao <[email protected]> Co-authored-by: Tim Moon <[email protected]>
Moved framework agnostic THD kernels to common. --------- Signed-off-by: Michael Goldfarb <[email protected]>
* retain_graph=True for grouped gemm Signed-off-by: Xiaowei Ren <[email protected]> * remove an unnecessary retain_graph=True Signed-off-by: Xiaowei Ren <[email protected]> * make retain_graph in graph capture configurable Signed-off-by: Xiaowei Ren <[email protected]> * typo fix Signed-off-by: Xiaowei Ren <[email protected]> --------- Signed-off-by: Xiaowei Ren <[email protected]>
* Update list of CI users Signed-off-by: Tim Moon <[email protected]> * Update list of CI users Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]>
…age (NVIDIA#1308) * draft implementation Signed-off-by: Youngeun Kwon <[email protected]> * compile error fix Signed-off-by: Youngeun Kwon <[email protected]> * fix compile error Signed-off-by: Youngeun Kwon <[email protected]> * remove print Signed-off-by: Youngeun Kwon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Edit comments Signed-off-by: Youngeun Kwon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * edit the bulk-overlap test case Signed-off-by: Youngeun Kwon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add version guard Signed-off-by: Youngeun Kwon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add runtime version guard Signed-off-by: Youngeun Kwon <[email protected]> * fix the version guard Signed-off-by: Youngeun Kwon <[email protected]> --------- Signed-off-by: Youngeun Kwon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
for more information, see https://pre-commit.ci
…1347) Scale sequence length in CP tests to avoid tiny sizes. Signed-off-by: Michael Goldfarb <[email protected]>
Debug jobs to deploy nightly docs Signed-off-by: Tim Moon <[email protected]>
Store module extra state in tensor Signed-off-by: Tim Moon <[email protected]>
* always have padding mask type for both flash and fused attentions Signed-off-by: Xiaowei Ren <[email protected]> * remove an redundant assert Signed-off-by: Xiaowei Ren <[email protected]> --------- Signed-off-by: Xiaowei Ren <[email protected]>
Debug Mcore integration test Avoid FP8 on Ampere and older. Generate synthetic data instead of depending on external data. Signed-off-by: Tim Moon <[email protected]>
Fix typo Signed-off-by: Tim Moon <[email protected]> Co-authored-by: Kirthi Shankar Sivamani <[email protected]>
* fix fuse_wgrad_accumulation for GroupedLinear Signed-off-by: Xin Yao <[email protected]> * fix fuse_wgrad_accumulation for GroupedLinear Signed-off-by: Xin Yao <[email protected]> * update tests Signed-off-by: Xin Yao <[email protected]> --------- Signed-off-by: Xin Yao <[email protected]> Co-authored-by: Tim Moon <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
e710273
to
33b430f
Compare
* Fix te sequential for older pytorch versions Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * FIxes Signed-off-by: Kirthi Shankar Sivamani <[email protected]> --------- Signed-off-by: Kirthi Shankar Sivamani <[email protected]>
* commit some debug code Signed-off-by: Xiaowei Ren <[email protected]> * add more debug info Signed-off-by: Xiaowei Ren <[email protected]> * debug code commit and typo fix Signed-off-by: Xiaowei Ren <[email protected]> * a typo fix Signed-off-by: Xiaowei Ren <[email protected]> * remove debug info Signed-off-by: Xiaowei Ren <[email protected]> * do not return lse Signed-off-by: Xiaowei Ren <[email protected]> * add amax_per_step for quantizers of CP Signed-off-by: Xiaowei Ren <[email protected]> * fix FP8 + CP Signed-off-by: Xiaowei Ren <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * bug fix Signed-off-by: Xiaowei Ren <[email protected]> * bug fix Signed-off-by: Xiaowei Ren <[email protected]> * dtype fix Signed-off-by: Xiaowei Ren <[email protected]> * bug fix Signed-off-by: Xiaowei Ren <[email protected]> --------- Signed-off-by: Xiaowei Ren <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xiaowei Ren <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
6fcad33
to
f5b91c6
Compare
Signed-off-by: Charlene Yang <[email protected]>
5b4117b
to
7331a4c
Compare
Signed-off-by: Charlene Yang <[email protected]>
cbad5ea
to
0341de7
Compare
Signed-off-by: Charlene Yang <[email protected]>
c699139
to
6bd61a7
Compare
for more information, see https://pre-commit.ci
…NVIDIA#1466) Use same API in optimizer zero_grad as PyT optimizers Signed-off-by: Tim Moon <[email protected]>
…1498) * Remove dependency on transformer_engine::Tensor in attention.cu Signed-off-by: Kshitij Janardan Lakhani <[email protected]> * Templatize thd_partition_indices_kernel and thd_read_half_tensor_kernel kernels ONLY for invoking recompilation and not directly using the pre-compiled symbols in libtransformer.so Signed-off-by: Kshitij Janardan Lakhani <[email protected]> * Modify attention.cu for thd templatized kernels. Remove dependency on common.h Signed-off-by: Kshitij Janardan Lakhani <[email protected]> * Move thd structs from libtransformer.so to framework extensions include header Code cleanup Signed-off-by: Kshitij Janardan Lakhani <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Consolidate and move thd_utils from common to framework extensions Signed-off-by: Kshitij Janardan Lakhani <[email protected]> * Remove template decorators around thd_partition_indices_kernel and thd_read_half_tensor_kernel Signed-off-by: Kshitij Janardan Lakhani <[email protected]> Code clean up Signed-off-by: Kshitij Janardan Lakhani <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Kshitij Janardan Lakhani <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
8f8a81e
to
93235dd
Compare
for more information, see https://pre-commit.ci
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
* fix Signed-off-by: Pawel Gadzinski <[email protected]> * reshape inp Signed-off-by: Pawel Gadzinski <[email protected]> --------- Signed-off-by: Pawel Gadzinski <[email protected]>
* non-exit tests Signed-off-by: Pawel Gadzinski <[email protected]> * fix Signed-off-by: Pawel Gadzinski <[email protected]> * fix Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Pawel Gadzinski <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
for more information, see https://pre-commit.ci
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Checking how difficult it is to merge Paged Attention changes into THD Attention changes