GPU memory usage going up every epoch #1196
-
I am trying to figure out why the GPU memory usage keeps going up after each epoch. What causes this to happen? I was in the understanding that this should remain constant. These are my hyperparameters:
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
@AlejandroRigau with caching allocators (like the pytorch one) that's not a reliable indicator of a leak, memory churn can increase the overall allocated without actually causing any issues (just more cached allocations that may get reused, only recovered if needed). I've used the scripts enough for training on the order of months sometimes so I'm fairly certain no issues. If you're only using 70-somthing% at the start you should try pushing your batch size up such that you use 90-95%, going all the way to the limit can result in a OOM when you transition between train-eval-train |
Beta Was this translation helpful? Give feedback.
@AlejandroRigau with caching allocators (like the pytorch one) that's not a reliable indicator of a leak, memory churn can increase the overall allocated without actually causing any issues (just more cached allocations that may get reused, only recovered if needed).
I've used the scripts enough for training on the order of months sometimes so I'm fairly certain no issues. If you're only using 70-somthing% at the start you should try pushing your batch size up such that you use 90-95%, going all the way to the limit can result in a OOM when you transition between train-eval-train