Skip to content

Latest commit

 

History

History
11 lines (7 loc) · 1.19 KB

README.md

File metadata and controls

11 lines (7 loc) · 1.19 KB

Distributed Training GPT-2 with minGPT

This repo is an adaptation of Andrej Karpathy's MinGPT project. It uses the @torchrun decorator with @kubernetes on Metaflow to train a MinGPT model with distributed training.

Many of the files in this example have been directly sourced from the MinGPT project with minimal or no adjustments. The gpt2_train_cfg.yaml, char_dataset.py, model.py, trainer.py, main.py have been sourced from the MinGPT project. The flow.py and flow_oss.py uses the minGPT's CLI script via Metaflow's @torchrun decorator.

Running with Open source Metaflow on Kubernetes

  • python flow_oss.py run

Running on the Outerbounds Platform

  • python flow.py --environment=fast-bakery run