scale gradient accumulation steps with train batch size to keep effective batch size about the same #33281

winglian · 2024-09-03T18:38:18Z

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

…tive batch size about the same

muellerzr

Nice start!

muellerzr · 2024-09-03T18:59:56Z

src/transformers/trainer.py

+        def reduce_batch_size_fn_wrapper(train_batch_size, args):
+            effective_batch_size = args.gradient_accumulation_steps * train_batch_size
+            def reduce_batch_size_fn():
+                nonlocal train_batch_size
+                train_batch_size = train_batch_size // 2
+                args.gradient_accumulation_steps = effective_batch_size // train_batch_size
+                return train_batch_size
+            return reduce_batch_size_fn


Let's put this in trainer_utils I think since that's where we have find_executable_batch_size defined.

muellerzr · 2024-09-03T19:01:17Z

src/transformers/trainer.py

+                args.gradient_accumulation_steps = effective_batch_size // train_batch_size
+                return train_batch_size
+            return reduce_batch_size_fn
+
        inner_training_loop = find_executable_batch_size(


The trainer_utils's find_executable_batch_size will then also need changing.

Also need to be careful about versioning inside that func

It should be set as >=0.34.0

sorry, I forgot to commit a file. should be there now

scale gradient accumulation steps with train batch size to keep effec…

56cbbba

…tive batch size about the same

winglian force-pushed the auto_find_batch_size_compensate branch from ab90906 to 56cbbba Compare September 3, 2024 18:39

muellerzr reviewed Sep 3, 2024

View reviewed changes

make sure to pass the reduce bsz fn to accelerate

e2783cb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scale gradient accumulation steps with train batch size to keep effective batch size about the same #33281

scale gradient accumulation steps with train batch size to keep effective batch size about the same #33281

winglian commented Sep 3, 2024

muellerzr left a comment

muellerzr Sep 3, 2024

muellerzr Sep 3, 2024

muellerzr Sep 3, 2024

winglian Sep 3, 2024

scale gradient accumulation steps with train batch size to keep effective batch size about the same #33281

Are you sure you want to change the base?

scale gradient accumulation steps with train batch size to keep effective batch size about the same #33281

Conversation

winglian commented Sep 3, 2024

What does this PR do?

Before submitting

Who can review?

muellerzr left a comment

Choose a reason for hiding this comment

muellerzr Sep 3, 2024

Choose a reason for hiding this comment

muellerzr Sep 3, 2024

Choose a reason for hiding this comment

muellerzr Sep 3, 2024

Choose a reason for hiding this comment

winglian Sep 3, 2024

Choose a reason for hiding this comment