This repo is an adaptation of Andrej Karpathy's MinGPT project. It uses the @torchrun
decorator with @kubernetes
on Metaflow to train a MinGPT model with distributed training.
Many of the files in this example have been directly sourced from the MinGPT project with minimal or no adjustments. The gpt2_train_cfg.yaml, char_dataset.py, model.py, trainer.py, main.py have been sourced from the MinGPT project. The flow.py
and flow_oss.py
uses the minGPT's CLI script via Metaflow's @torchrun
decorator.
python flow_oss.py run
python flow.py --environment=fast-bakery run