Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generation tutorial for Gemma model #829

Open
wants to merge 264 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
264 commits
Select commit Hold shift + click to select a range
a2f7c72
[PyTorch] Fix kernel_bulk launch config (#775)
ksivaman Apr 12, 2024
87cc803
Add SM margin to LayerNorm in inference (#772)
erhoo82 Apr 12, 2024
d0e02cf
[PyTorch] Don't use autograd hook for bwd reduction (#781)
ksivaman Apr 15, 2024
a25a2fe
[C/PyTorch] Add FP8 DPA and MHA (#768)
cyanguwa Apr 16, 2024
2921464
Changed VERSION to 1.7.0dev
ksivaman Apr 16, 2024
ea9f6be
[PyTorch] Use __torch_function__ as a class method (#783)
ksivaman Apr 16, 2024
324bafb
[PyTorch] TE checkpoint pass-through logic fix (#782)
denera Apr 16, 2024
f998fee
Add new users to TE CI
ptrendx Apr 16, 2024
a27264b
Support Low Rank Adaptation (LoRA). (#745)
mingxu1067 Apr 16, 2024
7e9dbca
[PyTorch] Misc fixes for release_v1.6 (#784)
ksivaman Apr 17, 2024
4a8a807
[UB] Adding configurable timeout for userbuffer and improving error r…
shamisp Apr 17, 2024
fc2a8bc
[PyTorch] Fix for type checking failure on custom callables (#790)
denera Apr 18, 2024
df28cea
[JAX] Fixing CI failure due to incorrect use of `static_argnums` in j…
denera Apr 18, 2024
346e7da
NVRTC kernels for cast-transpose (#258)
timmoon10 Apr 19, 2024
5b9e2e4
[PyTorch] Stop storing fused weight tensor in linear modules (#719)
timmoon10 Apr 19, 2024
3ba02f1
[JAX] Allow multi-dims for dgamma and dbeta in LN descriptor. (#780)
mingxu1067 Apr 19, 2024
fab53a4
[PyTorch] Remove unnecessary Pylint overrides (#794)
timmoon10 Apr 22, 2024
165225a
[JAX] Unifying GeLU and GeGLU in LayerNorm MLP (#765)
phu0ngng Apr 24, 2024
cb60166
[PyTorch] Avoid using LRU cache for cu_seqlens (#798)
ksivaman Apr 24, 2024
b1a4efc
Update README.rst (#806)
Apr 24, 2024
a06ab9a
[JAX] SwiGLU Implementation (#773)
phu0ngng Apr 24, 2024
f339c42
Add attention bias and qkv format to context parallelism (#726)
xrennvidia Apr 26, 2024
36297ef
FP8 Support for MCore MoE (#648)
Victarry Apr 29, 2024
b394dee
Add module level filter for deprecation warning in common (#813)
ksivaman Apr 29, 2024
086df06
[PyTorch] Fix tp_group_initialized error (#819)
cyanguwa Apr 29, 2024
9ac388a
[PyTorch] Skip context parallel tests on architectures below sm80 (#799)
cyanguwa Apr 29, 2024
f6aca0a
[PyTorch] Fix linter warnings from unused args (#816)
timmoon10 Apr 30, 2024
53be633
Added pull request template (#793)
ptrendx Apr 30, 2024
996ed0d
Fix ring_exchange RS to support CUDA graph capture (#811)
vasunvidia Apr 30, 2024
46fc3b0
Avoid amax roll for non-run modules (#825)
ksivaman Apr 30, 2024
850b790
Handle the scaling factor when amax is too tiny that leads to an infi…
jinzex May 1, 2024
cd0f62f
[JAX] Support FP8 training for Pipeline Parallelism when Micro-batch …
mingxu1067 May 1, 2024
da9ee4d
[PyTorch] Miscellanous fixes for FP8 DPA module (#804)
cyanguwa May 2, 2024
4afb291
[JAX] Enhance JAX unit tests (#796)
zlsh80826 May 2, 2024
5db9ed9
[JAX] Generalizing Activation Primitives (#810)
phu0ngng May 3, 2024
8e75d91
[PyTorch] Update FP8 recipe test to handle recipe changes (#834)
timmoon10 May 7, 2024
4af821b
Update FA version (#838)
ksivaman May 9, 2024
9607e95
[JAX] Fixes for the issue with ActLuPrimitive in PAXML (#837)
phu0ngng May 9, 2024
e0f3157
Not completely done gemma
Mar 21, 2024
746deba
something
Mar 21, 2024
e582840
Version which works
Mar 22, 2024
59eb22d
Fixed kv_channels
pggPL Mar 22, 2024
7a7fe6f
Fixed potential bug with fc1 loading
pggPL Mar 27, 2024
64718a1
Gemma generation
pggPL Apr 4, 2024
219bb07
Fp8 generation and evaluation
Apr 12, 2024
8a5ba9b
Fp8 generation and evaluation
pggPL Apr 12, 2024
39de0e8
changes
Apr 17, 2024
1d3105c
Fixed Llama tutorial. Changed batch size and added fused=True.
Mar 21, 2024
70aa1f3
Tutorial updated but not complete yet.
Mar 22, 2024
b52a733
Tutorial notebook reseted - removed fuse=true
Mar 22, 2024
bd6aa42
Removed fused=true
Mar 22, 2024
91dd83e
Batch size back to 8
Mar 22, 2024
7edce8e
Typo and commented out line
pggPL Mar 22, 2024
ef9db44
fixed whitespace
Mar 27, 2024
ccb7f26
fixed whitespace
Mar 27, 2024
187d7fc
Added comment to attention line. Fixed potential bug with loading wei…
Mar 27, 2024
59eaf7c
Comments
Mar 27, 2024
72e5017
Models cast added again
Mar 27, 2024
12edbcf
Weight download info
pggPL Mar 27, 2024
3e77434
Moved parameter gate_proj_size to config
pggPL Mar 27, 2024
42235da
gate_proj_size removed and put immediate_size instead
pggPL Mar 28, 2024
18ff645
add THD support for arbitrary_seqlen backend
cyanguwa Apr 15, 2024
906f74e
update test results
cyanguwa Apr 15, 2024
bd8a7dc
THD generation
Apr 24, 2024
eb76011
Cuda graphs generation (which seems to be working)
Apr 29, 2024
41045ab
fp8 cuda_graphs generation
May 1, 2024
c696641
attention.py
May 1, 2024
d572eb6
attention.py
May 1, 2024
d94c505
Low level fixes
pggPL May 3, 2024
78125c4
pybind
pggPL May 3, 2024
3ad4714
Prepare attention for generalized kernel
pggPL May 4, 2024
6dc12bc
Prepare attention for generalized kernel
pggPL May 4, 2024
894cf58
Drafts of tutorials
pggPL May 6, 2024
b03543b
Drafts of tutorials
pggPL May 6, 2024
d0b6289
File permission updates
pggPL May 7, 2024
370dd1e
File permission updates
pggPL May 7, 2024
3363a67
Removed draft attention_copy.cu
pggPL May 7, 2024
9ea62c3
New vesrion of tutorial markdown
pggPL May 7, 2024
7259dc9
HF finetuing introcution
pggPL May 8, 2024
be68a5d
HF finetuing introcution
pggPL May 8, 2024
1bfc9b7
HF finetuing introcution
pggPL May 8, 2024
894c645
Fused attn temporary fix
pggPL May 8, 2024
e1e5fa8
Bug fix
pggPL May 9, 2024
79af381
.h file ifx
pggPL May 9, 2024
ef70a25
generate_sample_text() add
pggPL May 9, 2024
53a50fb
Removed files
pggPL May 9, 2024
9dbbdd4
Removed files
pggPL May 9, 2024
2e3bebd
Removed files
pggPL May 9, 2024
b12416b
whitespace fix
pggPL May 9, 2024
306b94b
Attention pictures
pggPL May 9, 2024
b8f25fd
temp fix
pggPL May 10, 2024
394f736
temp fix
pggPL May 10, 2024
eb689ce
temp fix
pggPL May 10, 2024
036ed5a
zero centered gamma
pggPL May 10, 2024
9c7880c
refactor of replace_params()
pggPL May 10, 2024
b05cfa6
refactor of replace_params()
pggPL May 10, 2024
ee698e7
Minor refactors of te_gemma.py
pggPL May 13, 2024
9ec603a
Refactored te_gemma.py
pggPL May 14, 2024
942a2db
Minor chenges
pggPL May 14, 2024
2048c6e
Te gemma
pggPL May 15, 2024
167631d
Attention
pggPL May 15, 2024
8db2699
attention.py refactor
pggPL May 15, 2024
a0e35dc
attention.py refactor
pggPL May 15, 2024
40ce474
fp8_model_init tutorial
pggPL May 15, 2024
ae64bdf
images
pggPL May 15, 2024
4677576
images
pggPL May 15, 2024
f1e727a
images
pggPL May 15, 2024
20538a5
Added nice images
pggPL May 16, 2024
5fa76f4
Added nice images
pggPL May 16, 2024
62ec2f4
Small code refactors
pggPL May 16, 2024
2c0ea1f
Small code refactors
pggPL May 16, 2024
0f16bf8
Small code refactors
pggPL May 16, 2024
cd2566f
Small code refactors
pggPL May 16, 2024
3501548
Cosmetic change
pggPL May 17, 2024
d5ef40c
te_gemma fix
pggPL May 17, 2024
c94b36b
bug fix
pggPL May 18, 2024
1a7c0d3
fused=True
pggPL May 18, 2024
afbaa3f
fused=True
pggPL May 18, 2024
f153720
new rope kernel (not working)
pggPL May 20, 2024
8f572e3
merge with THD branch
pggPL May 21, 2024
65e6b57
Times for finetuning
pggPL May 21, 2024
d82cb9f
Times for finetuning
pggPL May 21, 2024
bc26c4d
Times for finetuning
pggPL May 21, 2024
183f1f1
fixes
pggPL May 22, 2024
d23e2b3
Minor fixes
pggPL May 22, 2024
967be16
Minor fixes
pggPL May 22, 2024
4bf081b
Minor fixes
pggPL May 22, 2024
a541e63
Minor fixes
pggPL May 22, 2024
27defac
fix
pggPL May 22, 2024
600ff90
Images
pggPL May 23, 2024
3ec4e9a
Merge branch 'main' of github.com:pggPL/TransformerEngine into main7
pggPL May 23, 2024
2ded3d5
Merge branch 'main' of github.com:pggPL/TransformerEngine into main7
pggPL May 23, 2024
1649057
Merge branch 'main7' into new_main3
pggPL May 23, 2024
b5ba6d6
fix
pggPL May 23, 2024
fcfda2c
fix
pggPL May 23, 2024
1d7c997
fix
pggPL May 23, 2024
117f2f9
fix
pggPL May 23, 2024
c65eee7
git fix
pggPL May 23, 2024
6ec8926
git fix
pggPL May 23, 2024
4da9fee
git fix
pggPL May 23, 2024
f16868b
git fix
pggPL May 23, 2024
c439a76
git fix
pggPL May 23, 2024
448df78
git fix
pggPL May 23, 2024
63a98b7
git fix
pggPL May 23, 2024
f64acd3
Attention.py refactoring
pggPL May 24, 2024
c8e4510
Attention.py refactoring
pggPL May 24, 2024
954257d
te_gemma.py refactoring
pggPL May 24, 2024
6e35fcb
Not THD attention generation
pggPL May 28, 2024
4a2a936
Tutorial fixes
pggPL May 29, 2024
3222fde
Tutorial fixes
pggPL May 29, 2024
1f64ac5
Tutorial fixes
pggPL May 29, 2024
f6bb973
requirements.txt
pggPL May 29, 2024
56f3771
requirements.txt
pggPL May 29, 2024
d1b94c2
requirements.txt
pggPL May 29, 2024
c39fe07
changed prompts
pggPL May 29, 2024
03c92fe
changed prompts
pggPL May 29, 2024
491bc1d
changed prompts
pggPL May 29, 2024
e1763c6
changed prompts
pggPL May 29, 2024
330be41
notebook fix
pggPL May 29, 2024
023c9b7
notebook fix
pggPL May 29, 2024
67ecbbe
notebook fix
pggPL May 29, 2024
12e0bfb
notebook fix
pggPL May 29, 2024
1f89a9a
notebook fix
pggPL May 29, 2024
00c0dd0
Merge branch 'main' of github.com:pggPL/TransformerEngine into main7
pggPL May 29, 2024
51d0437
Merge branch 'main7' into new_main3
pggPL May 29, 2024
1524e9f
te gemma merge update
pggPL May 29, 2024
a80e02d
te gemma merge update
pggPL May 29, 2024
f6aad30
Image remove
pggPL May 29, 2024
e5f48ff
Image remove
pggPL May 29, 2024
ee3b029
Merge branch 'main' of github.com:pggPL/TransformerEngine into main7
pggPL May 30, 2024
6fe920d
attention merge
pggPL May 30, 2024
1895ca1
Merge branch 'main7' into new_main3
pggPL May 30, 2024
e4d6b07
merge
pggPL May 30, 2024
d3ba406
merge
pggPL May 30, 2024
90cbb5c
merge
pggPL May 30, 2024
9fbfeb8
merge
pggPL May 30, 2024
bea0566
merge
pggPL May 30, 2024
7b327ae
merge
pggPL May 30, 2024
b4c1fd1
merge
pggPL May 30, 2024
385a9b5
merge
pggPL May 30, 2024
3788b4d
merge
pggPL May 30, 2024
8cf9b4d
merge
pggPL May 30, 2024
8917d5d
merge
pggPL May 30, 2024
7c23ba7
lint/license
pggPL May 30, 2024
d43c596
lint fixes
pggPL May 30, 2024
9a22188
lint fixes
pggPL May 30, 2024
2aae0e6
added test to ci
pggPL May 30, 2024
daa251f
Updated fused rope test
pggPL Jun 1, 2024
a61e896
Added test
pggPL Jun 1, 2024
e06d425
llama renamed
pggPL Jun 1, 2024
36f342f
tests
pggPL Jun 1, 2024
218eb78
tests
pggPL Jun 2, 2024
933d4d8
tests
pggPL Jun 2, 2024
2a818e6
whitespace fix
pggPL Jun 2, 2024
9c0c805
whitespace fix
pggPL Jun 2, 2024
d132245
whitespace fix
pggPL Jun 2, 2024
ad82b68
whitespace fix
pggPL Jun 2, 2024
cb1b753
whitespace fix
pggPL Jun 2, 2024
400920e
review of files for gen
pggPL Jun 2, 2024
947ddad
pixtures
pggPL Jun 3, 2024
0442e98
pixtures
pggPL Jun 3, 2024
933e1b6
pixtures
pggPL Jun 3, 2024
1b989bd
pixtures
pggPL Jun 3, 2024
316ab65
pixtures
pggPL Jun 3, 2024
ae93046
pixtures
pggPL Jun 3, 2024
98399f0
pixtures
pggPL Jun 3, 2024
68800c5
pixtures
pggPL Jun 3, 2024
19a016a
inference params optimization
pggPL Jun 3, 2024
132c3a7
inference params optimization
pggPL Jun 3, 2024
183bf4d
tests
pggPL Jun 3, 2024
9359bd6
attention
pggPL Jun 3, 2024
e50f660
attention
pggPL Jun 3, 2024
c4bdb7d
description
pggPL Jun 3, 2024
59aede4
lint
pggPL Jun 3, 2024
2daffeb
lint
pggPL Jun 3, 2024
372d10e
lint
pggPL Jun 3, 2024
6e75eb3
lint
pggPL Jun 3, 2024
532eabd
Merge branch 'main' of github.com:pggPL/TransformerEngine into main7
pggPL Jun 3, 2024
9e35a51
Merge branch 'main7' into new_main3
pggPL Jun 3, 2024
5c7bd98
skip thd test for not hopper
pggPL Jun 3, 2024
9b288d0
Merge branch 'main' into Gemma-generation
pggPL Jun 3, 2024
8a268b0
tutorial fix
pggPL Jun 3, 2024
3dbb493
Merge branch 'main' of github.com:NVIDIA/TransformerEngine into new_m…
pggPL Jun 4, 2024
3c410ce
tutorial fix
pggPL Jun 4, 2024
fbbb0c4
docs fix
pggPL Jun 4, 2024
da13415
Merge branch 'Gemma-generation' of github.com:pggPL/TransformerEngine…
pggPL Jun 4, 2024
a8591c9
docs fix
pggPL Jun 4, 2024
d22308d
svg
pggPL Jun 4, 2024
51e31a3
fix
pggPL Jun 4, 2024
da8272d
fix test
pggPL Jun 5, 2024
e32528f
Merge branch 'main' into Gemma-generation
sudhakarsingh27 Jun 5, 2024
27f8052
removed one file
pggPL Jun 6, 2024
09f9ac0
Merge branch 'main' of github.com:NVIDIA/TransformerEngine into new_m…
pggPL Jun 6, 2024
5922beb
some merge
pggPL Jun 6, 2024
4d654c2
.contiguous() refactoring
pggPL Jun 6, 2024
7517aaa
Merge branch 'main' into Gemma-generation
sudhakarsingh27 Jun 6, 2024
cfe8219
Merge branch 'main' of github.com:NVIDIA/TransformerEngine into new_m…
pggPL Jun 6, 2024
7ee9f73
new animation
pggPL Jun 6, 2024
8ea89de
Merge branch 'Gemma-generation' of github.com:pggPL/TransformerEngine…
pggPL Jun 6, 2024
24e8c2d
attention.py
pggPL Jun 7, 2024
b7102c3
typoe fix
pggPL Jun 7, 2024
88430b4
new image
pggPL Jun 7, 2024
5b709fc
new image
pggPL Jun 7, 2024
549adc6
images
pggPL Jun 7, 2024
ce15af4
images
pggPL Jun 7, 2024
da2e6e3
images
pggPL Jun 7, 2024
5cb8ed4
Merge branch 'main' into Gemma-generation
sudhakarsingh27 Jun 7, 2024
3464131
Merge branch 'main' into Gemma-generation
sudhakarsingh27 Aug 1, 2024
9103731
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 1, 2024
e4fd1c2
fix typo in attention
sudhakarsingh27 Aug 1, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/examples/te_gemma/media/calibration.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/examples/te_gemma/media/calibration_1_half.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/examples/te_gemma/media/calibration_2_half.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/examples/te_gemma/media/fp8_model_init.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/examples/te_gemma/media/fp8_model_init_1_half.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/examples/te_gemma/media/fp8_model_init_2_half.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading