-
Notifications
You must be signed in to change notification settings - Fork 12.3k
llama : support Jamba hybrid Transformer-Mamba models #7531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
compilade
wants to merge
52
commits into
master
Choose a base branch
from
compilade/refactor-kv-cache
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+628
−260
Open
Changes from all commits
Commits
Show all changes
52 commits
Select commit
Hold shift + click to select a range
271104c
wip: llama : separate recurrent states from the KV cache
compilade 8db1e4d
llama : use std::find for seq_nodes in llama_rs_cache
compilade 0028010
llama : state checkpoints for recurrent models
compilade 0c8b3b2
llama : correctly handle more edge cases for the rs cache
compilade d66849f
Merge branch 'master' into compilade/refactor-kv-cache
compilade a09db95
llama : rename many llama_kv_cache_* functions
compilade c460ff1
Merge branch 'master' into compilade/refactor-kv-cache
compilade b6fafd1
llama : remove useless return value for some llama_cache_* functions
compilade b7ec12e
Merge branch 'master' into compilade/refactor-kv-cache
compilade 3b57b55
Merge branch 'master' into compilade/refactor-kv-cache
compilade 7e13f19
llama : rethink recurrent state cell counts
compilade cbc743e
llama : support Jamba
compilade 0fd13e9
Merge branch 'master' into compilade/refactor-kv-cache
compilade 61a88a1
llama : fix BERT inference without KV cache
compilade ea2e63e
convert-hf : check for unprocessed Jamba experts
compilade fc59407
convert-hf : support Mini-Jamba conversion
compilade 181dadf
llama : fix Jamba quantization sanity checks
compilade 3a414b0
llama : sequence-length-aware batch splitting
compilade 4e4c41e
Merge branch 'master' into compilade/refactor-kv-cache
compilade 3587a94
llama : use equal-sequence-length sub-batches for recurrent models
compilade 5d3c7b9
Merge branch 'master' into compilade/refactor-kv-cache
compilade 72eea49
llama : fix batch split output count for embeddings
compilade 18d1c14
llama : minimize swaps when reordering logits
compilade 61200ef
llama : fix edge case finding batch seq_id of split recurrent cell
compilade eb589d5
llama : avoid copies for simple batch splits
compilade 8fb57ac
llama : use im2col and mul_mat to perform convolution for Mamba
compilade 17f6c1e
llama : fix .base() compilation error on Windows
compilade fee3c1d
llama : allow doing the equivalent of SSM_CONV with SUM_ROWS and MUL
compilade 6840ac0
Merge branch 'master' into compilade/refactor-kv-cache
compilade 372482d
llama : rename llama_cache to llama_past
compilade 43d8d4b
examples : replace llama_kv_cache_seq_* with llama_past_seq_*
compilade ff794f5
Merge branch 'master' into compilade/refactor-kv-cache
compilade 33425a7
mamba : fix non-contiguous usage of ggml_silu
compilade 10c3c41
Merge branch 'master' into compilade/refactor-kv-cache
compilade 9b38f8b
Merge branch 'master' into compilade/refactor-kv-cache
compilade bc320ef
Merge branch 'master' into compilade/refactor-kv-cache
compilade fcb889c
llama : session saving and reloading for hybrid models
compilade a03e32a
Merge branch 'master' into compilade/refactor-kv-cache
compilade 9d3f44d
convert_hf : fix Jamba conversion
compilade 5f62db7
llama : fix mixed signedness comparison
compilade 375de5b
llama : use unused n_embd_k_gqa in k_shift
compilade 4bb4b22
llama : begin renaming llama_past back to llama_kv_cache
compilade 63ac36b
Merge branch 'master' into compilade/refactor-kv-cache
compilade 124c222
Merge branch 'master' into compilade/refactor-kv-cache
compilade 8006f3b
llama : remove implicit recurrent state rollbacks
compilade 691698e
Merge branch 'master' into compilade/refactor-kv-cache
compilade e3fe612
llama : partially apply clang-format style
compilade 2bcaf64
Merge branch 'master' into compilade/refactor-kv-cache
compilade 908e655
convert : fix jamba conv1d shape squeezing
compilade 4682e21
Merge branch 'master' into compilade/refactor-kv-cache
compilade 20f8e43
graph : add back hybrid memory graph input
compilade 07c252f
model : add Jamba to Mamba-specific hparams printing
compilade File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pre-tokenizer override is pretty much only used by https://huggingface.co/pszemraj/jamba-900M-v0.13-KIx2.
The official Jamba models and finetunes use a sentencepiece
tokenizer.model
.