Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

estimate-memory gives ~64GB for meta-llama/Llama-3.1-70B-Instruct instead of anticipated ~140GB #3379

Open
dvrogozh opened this issue Feb 5, 2025 · 0 comments

Comments

@dvrogozh
Copy link
Contributor

dvrogozh commented Feb 5, 2025

With:

Accelerate's estimate-memory gives 64.73GB for meta-llama/Llama-3.1-70B-Instruct at fp16 precision instead of anticipated ~140GB, i.e. num_params * num_bytes_in_dtype. The later is basically overall recommendation for the memory capacity to accommodate inference. See for example https://huggingface.co/blog/llama31#inference-memory-requirements.

Is this a mistake by estimate-memory or there is some magic happening behind the scenes when Transformers load models which allows to save memory?

If estimate-memory indeed makes a mistake, that's worrisome since this might impact device_map=auto if it reuses estimate code to distribute layers across devices. At least estimate-memory does call regular accelerator's utils method:

from accelerate.utils import (
calculate_maximum_sizes,

# accelerate estimate-memory meta-llama/Llama-3.1-70B-Instruct
Loading pretrained config for `meta-llama/Llama-3.1-70B-Instruct` from `transformers`...
┌────────────────────────────────────────────────────────────────┐
│  Memory Usage for loading `meta-llama/Llama-3.1-70B-Instruct`  │
├───────┬─────────────┬──────────┬───────────────────────────────┤
│ dtype │Largest Layer│Total Size│      Training using Adam      │
├───────┼─────────────┼──────────┼───────────────────────────────┤
│float32│   1.96 GB   │129.46 GB │           517.84 GB           │
│float16│  1002.0 MB  │ 64.73 GB │           258.92 GB           │
│  int8 │   501.0 MB  │ 32.36 GB │              N/A              │
│  int4 │   250.5 MB  │ 16.18 GB │              N/A              │
└───────┴─────────────┴──────────┴───────────────────────────────┘

CC: @muellerzr @ArthurZucker @SunMarc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant