Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] Hello, how to accurately calculate memory usage based on operating parameters? #597

Closed
13416157913 opened this issue Nov 20, 2023 · 5 comments
Labels
stale No activity in 60 days on issue or PR

Comments

@13416157913
Copy link

Hello, how to accurately calculate memory usage based on operating parameters ?
For example,
TP_SIZE=1
PP_SIZE=1
WORLD_SIZE=8
MICRO_BATCH_SIZE=2
GLOBAL_BATCH_SIZE=128
--finetune
--sequence-parallel
--num-layers 32
--hidden-size 4096
--num-attention-heads 32
--seq-length 4096
--max-position-embeddings 4096
--no-position-embedding
--use-rotary-position-embeddings
--swiglu
--ffn-hidden-size 11008
--disable-bias-linear
--RMSNorm
--layernorm-epsilon 1e-6
--causal-lm
--distributed-optimizer
--use-flash-attn

@hwdef
Copy link

hwdef commented Nov 20, 2023

#482
Please check this

@13416157913
Copy link
Author

#482 Please check this

Thank you very much.

@deepakn94
Copy link
Collaborator

We also now have a report_theoretical_memory.py script now that should take the same set of arguments as pretrain_gpt.py.

You can use like this:

CUDA_DEVICE_MAX_CONNECTIONS=1 WORLD_SIZE=<WORLD_SIZE> python -u report_theoretical_memory.py ${options}

Copy link

Marking as stale. No activity in 60 days.

@github-actions github-actions bot added the stale No activity in 60 days on issue or PR label Jan 28, 2024
@deepakn94
Copy link
Collaborator

Going to close this. Feel free to re-open if you are still running into issues.

XZQshiyu pushed a commit to XZQshiyu/Megatron-DeepSpeed that referenced this issue Jan 15, 2025
* fix gitignore

* add local dataset dir

* add ignore

* add local dataset support

* add some about local dataset in README.md

* fix some

* add some in README

* remove data dir

* add line to gitignore

* fix some following good advices

* fix some about format

* reformat code using yapf

---------

Co-authored-by: 宋超 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale No activity in 60 days on issue or PR
Projects
None yet
Development

No branches or pull requests

3 participants