ENH: Improve MetaMath training script runtime #2894
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The training script of the MetaMathQA PEFT method comparison was calling
cuda.empty_cache()andgc.collect()after each step. However, this is not really needed and it also slows down training considerably.It turns out that
gc.collect()is not needed at all when it comes to memory and thus it has been removed. This results in a big improvement in runtime. As forempty_cache(), not calling it at all leads to an increase in memory usage, but it's not necessary to call it every step. It is instead called every 10th step.Improvement (tested locally, 250 steps):
gc.collect()empty_cache()only every 10 stepsempty_cache()at all:Thus
gc.collect()can be safely removed, but removingempty_cache()completely is not advisable. Callingempty_cache()only every 10th step does increase average memory usage, but the peak is unaffected, which is what's most important in this benchmark, so it is a worthwhile tradeoff for the 23% speed improvement we get.I also tested how manually deleting certain torch variables (batch, output, loss) would affect training but could not see any difference.
While working on this PR, I also removed an obsolete comment and an unused variable.
Note to maintainers: If this is merged, all MetaMathQA benchmarks should be re-run.