This repository was archived by the owner on Apr 23, 2025. It is now read-only.

Description
In testing PR #671, we noticed that the GPT-2 model now exhausts all available memory on 8 GB GPUs (example: GTX 1080) for both eager mode and X10 runtimes. It did not do this previously, so at some point the RAM usage of this model has increased to the point where it can no longer train on these GPUs.
We should investigate why this happened and see if memory usage for this model can be brought back down.