A demo of finetuning GPT2-large using 8-bit optimizer from bitsandbytes library.
- Download the notebook
- Upload it to your Kaggle account
- Pick GPU accelerator (it should give you a Tesla-P100 with 16 Gb VRAM)
- Run the notebook
TLDR — 8-bit optimizer reduces memory footprint from 14 Gb to 9.7 Gb, allowing, for example, training the model in Google colaboratory on Tesla K80.