-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark Layer Skip #1
base: main
Are you sure you want to change the base?
Conversation
I ran for a large number of models, comparing 2-model speculative decoding and self-speculative decoding and put the results here (note there is a separate sheet for each task): Cc @gante |
Perfect, thank you for opening the PR 💛 I'll merge this one as soon as the |
@@ -52,7 +53,7 @@ def run_model(args, processor_cls, model_cls, run_prediction_loop): | |||
tokenizer = processor_cls.from_pretrained(args.model) | |||
|
|||
if args.max_gpu_memory is None: # fails if it doesn't fit in a GPU | |||
max_memory = {0: "100GiB", "cpu": "0GiB"} | |||
max_memory = {0: "100GiB", "cpu": "50GiB"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n00b question: why do we need to modify max_memory
?
For Llama2 70B ran on 8 A100 GPUs, I had to replace this line with:
max_memory = None
to get it to work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha for no good reason: long ago, when I wrote this script, my setup crashed if a) the model didn't fully fit on the GPU OR b) I didn't set a limit slightly below the device capacity -- e.g. 20GiB on a RTX3090 (24GB)
can and should be replaced
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha ... OK. I submitted a commit to keep max memory to None if it is not specified by the user
After merging huggingface/transformers#34240 I think this PR is ready to merge. |
Just noticed there are conflicts. Working now on resolving them. |
7b320bc
to
adaa02c
Compare
I re-ran commands to verify that they are working and fixed some further errors that popped up. |
Hi @gante . Just a gentle ping if you would like to merge the PR |
I followed this blog and implemented early exit self-speculation here.