ENH: Improve MetaMath training script runtime #2894

BenjaminBossan · 2025-11-04T13:42:17Z

The training script of the MetaMathQA PEFT method comparison was calling cuda.empty_cache() and gc.collect() after each step. However, this is not really needed and it also slows down training considerably.

It turns out that gc.collect() is not needed at all when it comes to memory and thus it has been removed. This results in a big improvement in runtime. As for empty_cache(), not calling it at all leads to an increase in memory usage, but it's not necessary to call it every step. It is instead called every 10th step.

Improvement (tested locally, 250 steps):

Removing gc.collect()
- 108 sec => 65 sec
- memory reserved max stays the same (19.3 GB)
- memory reserved 99th percentile stays the same (18.0 GB)
- memory reserved avg stays the same (12.0 GB)
Also calling empty_cache() only every 10 steps
- 65 sec => 50 sec
- memory reserved max stays the same (19.3 GB)
- memory reserved 99th percentile: 18.0 GB => 19.3 GB
- memory reserved avg: 12.0 GB => 14.5 GB
Not calling empty_cache() at all:
- 65 sec => 45 sec
- memory reserved max: 19.3 GB => 23.5 GB
- memory reserved 99th percentile: 18.0 GB => 23.5 GB
- memory reserved avg: 12.0 GB => 22.1 GB

Thus gc.collect() can be safely removed, but removing empty_cache() completely is not advisable. Calling empty_cache() only every 10th step does increase average memory usage, but the peak is unaffected, which is what's most important in this benchmark, so it is a worthwhile tradeoff for the 23% speed improvement we get.

I also tested how manually deleting certain torch variables (batch, output, loss) would affect training but could not see any difference.

While working on this PR, I also removed an obsolete comment and an unused variable.

Note to maintainers: If this is merged, all MetaMathQA benchmarks should be re-run.

The training script of the MetaMathQA PEFT method comparison was calling cuda.empty_cache() and gc.collect() after each step. However, this is not really needed and it also slows down training considerably. It turns out that gc.collect() is not needed at all and thus it has been removed. This results in a big improvement in runtime. As for empty_cache(), not calling it at all leads to an increase in memory usage, but it's not necessary to call it every step. It is instead called every 10th step. Improvement (tested locally, 250 steps): - Removing gc.collect() - 108 sec => 65 sec - memory reserved max stays the same (19.3 GB) - memory reserved 99th percentile stays the same (18.0 GB) - memory reserved avg stays the same (12.0 GB) - Also calling empty_cache() only every 10 steps - 65 sec => 50 sec - memory reserved max stays the same (19.3 GB) - memory reserved avg: 18.0 GB => 19.3 GB - memory reserved avg: 12.0 GB => 14.5 GB Thus gc.collect() can be safely removed. And while calling empty_cache() only every 10th step does increase average memory usage, the peak is unaffected, which is what's most important in this benchmark, so it is a worthwhile tradeoff for the 23% speed improvement we get. Note to maintainers: If this is merged, all MetaMathQA benchmarks should be re-run.

HuggingFaceDocBuilderDev · 2025-11-04T13:46:39Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

githubnemo · 2025-11-04T14:32:14Z

method_comparison/MetaMathQA/run.py

                }
                print_verbose(json.dumps(log_dict))

            # # TODO is this needed?


I think the comment can be removed if we don't get any more insights :)

githubnemo

LGTM. Thanks for investigating, this is a really nice find.

If there are no new discoveries I think this can be merged.

Also remove gc import

BenjaminBossan requested a review from githubnemo November 4, 2025 14:16

githubnemo reviewed Nov 4, 2025

View reviewed changes

githubnemo approved these changes Nov 4, 2025

View reviewed changes

Reviewer feedback: remove obsolete comment

bc8d9c5

Also remove gc import

githubnemo approved these changes Nov 5, 2025

View reviewed changes

BenjaminBossan merged commit 90d3fc0 into huggingface:main Nov 13, 2025
23 of 25 checks passed

BenjaminBossan deleted the enh-improve-metamath-training-runtime branch November 13, 2025 11:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ENH: Improve MetaMath training script runtime #2894

ENH: Improve MetaMath training script runtime #2894

Uh oh!

BenjaminBossan commented Nov 4, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Nov 4, 2025

Uh oh!

githubnemo Nov 4, 2025

Uh oh!

githubnemo left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ENH: Improve MetaMath training script runtime #2894

ENH: Improve MetaMath training script runtime #2894

Uh oh!

Conversation

BenjaminBossan commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Nov 4, 2025

Uh oh!

githubnemo Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

githubnemo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

BenjaminBossan commented Nov 4, 2025 •

edited

Loading