estimate-memory gives ~64GB for meta-llama/Llama-3.1-70B-Instruct instead of anticipated ~140GB #3379

dvrogozh · 2025-02-05T19:18:29Z

With:

accelerate: f076495
huggingface/transformers@8d73a38

Accelerate's estimate-memory gives 64.73GB for meta-llama/Llama-3.1-70B-Instruct at fp16 precision instead of anticipated ~140GB, i.e. num_params * num_bytes_in_dtype. The later is basically overall recommendation for the memory capacity to accommodate inference. See for example https://huggingface.co/blog/llama31#inference-memory-requirements.

Is this a mistake by estimate-memory or there is some magic happening behind the scenes when Transformers load models which allows to save memory?

If estimate-memory indeed makes a mistake, that's worrisome since this might impact device_map=auto if it reuses estimate code to distribute layers across devices. At least estimate-memory does call regular accelerator's utils method:

accelerate/src/accelerate/commands/estimate.py

Lines 21 to 22 in f076495

    
           from accelerate.utils import ( 
        
               calculate_maximum_sizes,

# accelerate estimate-memory meta-llama/Llama-3.1-70B-Instruct
Loading pretrained config for `meta-llama/Llama-3.1-70B-Instruct` from `transformers`...
┌────────────────────────────────────────────────────────────────┐
│  Memory Usage for loading `meta-llama/Llama-3.1-70B-Instruct`  │
├───────┬─────────────┬──────────┬───────────────────────────────┤
│ dtype │Largest Layer│Total Size│      Training using Adam      │
├───────┼─────────────┼──────────┼───────────────────────────────┤
│float32│   1.96 GB   │129.46 GB │           517.84 GB           │
│float16│  1002.0 MB  │ 64.73 GB │           258.92 GB           │
│  int8 │   501.0 MB  │ 32.36 GB │              N/A              │
│  int4 │   250.5 MB  │ 16.18 GB │              N/A              │
└───────┴─────────────┴──────────┴───────────────────────────────┘

CC: @muellerzr @ArthurZucker @SunMarc

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

estimate-memory gives ~64GB for meta-llama/Llama-3.1-70B-Instruct instead of anticipated ~140GB #3379

estimate-memory gives ~64GB for meta-llama/Llama-3.1-70B-Instruct instead of anticipated ~140GB #3379

dvrogozh commented Feb 5, 2025

estimate-memory gives ~64GB for meta-llama/Llama-3.1-70B-Instruct instead of anticipated ~140GB #3379

estimate-memory gives ~64GB for meta-llama/Llama-3.1-70B-Instruct instead of anticipated ~140GB #3379

Comments

dvrogozh commented Feb 5, 2025