Commit 90d3fc0
authored
ENH: Improve MetaMath training script runtime (#2894)
The training script of the MetaMathQA PEFT method comparison was calling
cuda.empty_cache() and gc.collect() after each step. However, this is
not really needed and it also slows down training considerably.
It turns out that gc.collect() is not needed at all and thus it has been
removed. This results in a big improvement in runtime. As for
empty_cache(), not calling it at all leads to an increase in memory
usage, but it's not necessary to call it every step. It is instead
called every 10th step.
Improvement (tested locally, 250 steps):
- Removing gc.collect()
- 108 sec => 65 sec
- memory reserved max stays the same (19.3 GB)
- memory reserved 99th percentile stays the same (18.0 GB)
- memory reserved avg stays the same (12.0 GB)
- Also calling empty_cache() only every 10 steps
- 65 sec => 50 sec
- memory reserved max stays the same (19.3 GB)
- memory reserved avg: 18.0 GB => 19.3 GB
- memory reserved avg: 12.0 GB => 14.5 GB
Thus gc.collect() can be safely removed. And while calling empty_cache()
only every 10th step does increase average memory usage, the peak is
unaffected, which is what's most important in this benchmark, so it is a
worthwhile tradeoff for the 23% speed improvement we get.
Note to maintainers: If this is merged, all MetaMathQA benchmarks should
be re-run.1 parent 3fc83e3 commit 90d3fc0
1 file changed
+6
-10
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
22 | 21 | | |
23 | 22 | | |
24 | 23 | | |
| |||
58 | 57 | | |
59 | 58 | | |
60 | 59 | | |
61 | | - | |
62 | | - | |
63 | | - | |
64 | | - | |
65 | | - | |
66 | | - | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
67 | 64 | | |
68 | 65 | | |
69 | 66 | | |
| |||
298 | 295 | | |
299 | 296 | | |
300 | 297 | | |
301 | | - | |
302 | | - | |
303 | | - | |
| 298 | + | |
| 299 | + | |
304 | 300 | | |
305 | 301 | | |
306 | 302 | | |
| |||
0 commit comments