Skip to content

Commit

Permalink
Model training (#5)
Browse files Browse the repository at this point in the history
* Colab

* Delete .vscode directory

* start local

* Starting ideas for training modeL

---------

Co-authored-by: seunguk <[email protected]>
  • Loading branch information
SamShin and seunguk authored Jan 26, 2024
1 parent ac60f68 commit a2844de
Show file tree
Hide file tree
Showing 6 changed files with 380 additions and 1 deletion.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Dependencies
/node_modules

/.vscode
# Production
/build

Expand Down
8 changes: 8 additions & 0 deletions docs/tech/ML/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"label": "Machine Learning",
"position": 2,
"link": {
"type": "generated-index",
"description": "The following documentation will cover topics relating to Machine Learning / AI."
}
}
7 changes: 7 additions & 0 deletions docs/tech/ML/ml-intro.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
sidebar_position: 1
---

# Introduction

## About
145 changes: 145 additions & 0 deletions docs/tech/ML/training/Colab.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
---
sidebar_position: 2
---

# Google Colab

## Setup

For our training example on Google Colab, we will be using the nanoGPT [(https://github.com/karpathy/nanoGPT)](https://github.com/karpathy/nanoGPT) model by [Andrej Karpathy](https://karpathy.ai/) to explain how it works at a basic level. You can follow this step by step video tutorial by non other than Andrej Karpathy [here](https://www.youtube.com/watch?v=kCc8FmEb1nY) for an in depth understanding of how this all works or you can read an article by [New York Times](https://www.nytimes.com/interactive/2023/04/26/upshot/gpt-from-scratch.html) (or access a PDF of it [here](https://courses.cs.washington.edu/courses/cse473/23sp/uwnetid/lit/Let_Us_GPT.pdf)). Since Google Colab is a cloud service, there is no setup required on your part other than having a google account.

## Training

To start, open this [notebook](https://colab.research.google.com/drive/1JMLa53HDuA-i7ZBmqV7ZnA3c_fvtXnx-?usp=sharing#scrollTo=O6medjfRsLD9) in the browser of your choice. Google Colab is a cloud service that allows you to run Jupyter Notebooks online so a lot of things will come easy if you have previous experience with Jupyter Notebooks. After you open this notebook you might see __Cannot save changes__ near the top next to +Code + Text. This is because colab files are stored in google drive and we did not have a copy for us to modify. Click on __Cannot save changes__ and save a copy to drive. Now when you have saved a copy to your drive, take a look at the first cell with the following code.

```bash
# We always start with a dataset to train on. Let's download the tiny shakespeare dataset
!wget https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt
```

You can click the start button the the left of the cell or press [ctrl] + [enter] after you have the cell focused. When you run a cell, google allocates basic resources to run the python code which you can see on the top right with RAM and Disk usage showing. When you run the cell this will download the shakespeare dataset from the web into the storage for the current session. You can check the storage by clicking the files icon on the left sidebar. When trainig a model we want to use a GPU for faster training and you can ask for GPU resource from Colab by clicking the downward arrow by the resource tab with the RAM and Disk usage. Select change runtime type and select GPU and save. This will reset the current session so you will have to download the file again from the first cell. At this point you can read through the cells along with the video provided but for simplifity, scroll down to the last cell in the notebook. When you are connected, you will have access to GPU resource in Colab. For our first run make sure to lower the ```max_iters``` parameter from the given 5,000 down to 2,000 for faster results. A thing to note, Colab already has many of the most common python packages already installed such torch which we are using in this cell. When you hit run, the process starts and the model will output its result after running for a bit.

## Saving Model

For models you want to train multiple times or cannot train in a single session, it is important to save your work so that later on, you can load the model and start training again. Since Colab doesn't automatically save your work, you need to connect to drive and save your session work there. Create a new cell and enter this code.

```py
from google.colab import drive
drive.mount('/content/drive')
```

This will mount your google drive to this session of Colab and you can save your work there which you can see under "drive" in the files tab. In torch, you can save your current model in this way.

```py
import torch

checkpoint = {
'iter': iter,
'model_state_dict': m.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'loss': loss
#... other parameters you want to keep
}

checkpoint_path = 'drive/MyDrive/Backup/model_checkpoint.pth'

torch.save(checkpoint, checkpoint_path)
```

Of course your checkpoint_path will be different but make sure to save inside the drive. Loading the model is also straightforward,

```py
import torch

# Load the checkpoint and create a empty model
loaded_checkpoint = torch.load(checkpoint_path)
model = BigramLanguageModel()

# Basic setup
device = 'cuda' if torch.cuda.is_available() else 'cpu'
m = model.to(device)

# Load the model parameters from checkpoint
m.load_state_dict(loaded_checkpoint['model_state_dict'])

# Create basic optimizer and load from checkpoint
optimizer = torch.optim.AdamW(m.parameters())
optimizer.load_state_dict(loaded_checkpoint['optimizer_state_dict'])

# Get vales
iter = loaded_checkpoint['iter']
loss = loaded_checkpoint['loss']

# Testing to see if model was correctly loaded
print(iter)
print(loss)

context = torch.zeros((1, 1), dtype=torch.long, device=device)
print(decode(m.generate(context, max_new_tokens=2000)[0].tolist()))
```

As you can see saving and loading a model allows you to maintain progess and to train bigger models over time.

To incorporate this into the nanoGPT training model we are using would look something like this,

### Before

```py
...
for iter in range(max_iters):

# every once in a while evaluate the loss on train and val sets
if iter % eval_interval == 0 or iter == max_iters - 1:
losses = estimate_loss()
print(f"step {iter}: train loss {losses['train']:.4f}, val loss {losses['val']:.4f}")

# sample a batch of data
xb, yb = get_batch('train')

# evaluate the loss
logits, loss = model(xb, yb)
optimizer.zero_grad(set_to_none=True)
loss.backward()
optimizer.step()

# generate from the model
context = torch.zeros((1, 1), dtype=torch.long, device=device)
print(decode(m.generate(context, max_new_tokens=2000)[0].tolist()))
```

### After

```py
...
checkpoint_path = 'drive/MyDrive/Backup/model_checkpoint.pth'

for iter in range(max_iters):

# save a checkpoint every 100 iteration or if we are at the last iteration
if iter % 1000 == 0 or iter == max_iters - 1:
checkpoint = {
'iter': iter,
'model_state_dict': m.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'loss': loss
}
torch.save(checkpoint, checkpoint_path)

# every once in a while evaluate the loss on train and val sets
if iter % eval_interval == 0 or iter == max_iters - 1:
losses = estimate_loss()
print(f"step {iter}: train loss {losses['train']:.4f}, val loss {losses['val']:.4f}")

# sample a batch of data
xb, yb = get_batch('train')

# evaluate the loss
logits, loss = model(xb, yb)
optimizer.zero_grad(set_to_none=True)
loss.backward()
optimizer.step()

# generate from the model
context = torch.zeros((1, 1), dtype=torch.long, device=device)
print(decode(m.generate(context, max_new_tokens=2000)[0].tolist()))
```
211 changes: 211 additions & 0 deletions docs/tech/ML/training/Local.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,211 @@
---
sidebar_position: 1
---

# Local

## Setup

For our training example on a local machine, we will be using the nanoGPT [(https://github.com/karpathy/nanoGPT)](https://github.com/karpathy/nanoGPT) model by [Andrej Karpathy](https://karpathy.ai/) to explain how it works at a basic level. You can follow this step by step video tutorial by non other than Andrej Karpathy [here](https://www.youtube.com/watch?v=kCc8FmEb1nY) for an in depth understanding of how this all works or you can read an article by [New York Times](https://www.nytimes.com/interactive/2023/04/26/upshot/gpt-from-scratch.html) (or access a PDF of it [here](https://courses.cs.washington.edu/courses/cse473/23sp/uwnetid/lit/Let_Us_GPT.pdf)).

### Installing Python

First, we have to make sure Python is installed. While the latest python version is Python 3.12, we need Python 3.10. This is because Python 3.10 is the latest python version that supports an important machine learning package [PyTorch](https://pytorch.org/). If you have a lower version python such as Python 3.9 or Python 3.8 it should be fine but will not work for older versions.

#### Windows

For most machines, make sure to download Windows installed (64-bit) version for Python 3.10.13 [[here]](https://www.python.org/downloads/windows/)

#### macOS

For the macOS version, you should download the latest Python 3.10 version which is 3.10.11 [[here]](https://www.python.org/downloads/macos/)

#### Linux

You have many options for Linux, ...as always. You could either download directly from the package manager like so

```bash
sudo apt install python3.10
```

or build directly from source [[here]](https://www.python.org/downloads/source/)

### Copying Code

Now that we have Python setup, we should access [this](https://github.com/karpathy/ng-video-lecture/tree/master) github repository from Andrej Karpathy to get all the necessary code.

### Setting up a virtual environment

After you have your python installed and repository cloned, the next step is to make a virtual environment. Inside the directory you downloaded, open your ide of choice from the directory. We use virtual environments to make sure there are no conflicts between package versions. If this is the only python version you have installed, you can just do

```bash
python3 -m venv venv
```

or if this doesn't work for some reason also try

```bash
python -m venv venv
```

If you have multiple python versions already installed, you would need to find the location of the executable for the specific python version. So in my case, it would look something like this.

```bash
C:/Python310/python.exe -m venv venv
```

This will create a virtual python environment in the current directory called venv.

To start the virtual envornment simply type ```activate``` in the terminal and you should be placed in the virtual environment if you are in the same directory as the venv folder. If this does not work, make sure your terminal is in the current directory and the venv file is also located in the current directory. You can also try

#### Windows

```bash
.\venv\Scripts\activate
```

#### macOS / Linux

```bash
source venv/bin/activate
```

To exit the virtual environment, simply type ```deactivate``` in the termianl or

#### Windows

```bash
.\venv\Scripts\deactivate
```

#### macOS / Linux

```bash
source venv/bin/deactivate
```

Now that we have our environment setup, we should download the required packages for this demonstration. This part will depend on if you have a dedicated GPU that supports CUDA or not. Go to this [website](https://pytorch.org/) and scroll down until you see the ```install pytorch``` section. Select your options and copy and paste the output inside your virtual environment. This will download the PyTorch package and all the required dependencies.

## Training

To finally start training, go to the ```bigram.py``` file and for our first run make sure to lower the ```max_iters``` parameter from the given 3,000 down to 2,000 for faster results. As you can see from the results, after training for a bit, the model outputs its results. You can mess with the parameters to get better results and you can see the loss go down as the iteration number increases.

### Saving Model

For bigger models you want to train multiple times or cannot train in a single session, it is important to save your work so that later on, you can load the model and start training again.

Pytorch streamlines the process of saving a model by using checkpoints.

```py
import torch

checkpoint = {
'iter': iter,
'model_state_dict': m.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'loss': loss
#... other parameters you want to keep
}

checkpoint_path = #The path of the checkpoint

torch.save(checkpoint, checkpoint_path)
```

Loading the model is also straightforward

```py
import torch

# Load the checkpoint and create a empty model
loaded_checkpoint = torch.load(checkpoint_path)
model = BigramLanguageModel()

# Basic setup
device = 'cuda' if torch.cuda.is_available() else 'cpu'
m = model.to(device)

# Load the model parameters from checkpoint
m.load_state_dict(loaded_checkpoint['model_state_dict'])

# Create basic optimizer and load from checkpoint
optimizer = torch.optim.AdamW(m.parameters())
optimizer.load_state_dict(loaded_checkpoint['optimizer_state_dict'])

# Get vales
iter = loaded_checkpoint['iter']
loss = loaded_checkpoint['loss']

# Testing to see if model was correctly loaded
print(iter)
print(loss)

context = torch.zeros((1, 1), dtype=torch.long, device=device)
print(decode(m.generate(context, max_new_tokens=2000)[0].tolist()))
```

As you can see saving and loading a model allows you to maintain progess and to train bigger models over time.

To incorporate this into the nanoGPT training model we are using would look something like this,

#### Before

```py
...
for iter in range(max_iters):

# every once in a while evaluate the loss on train and val sets
if iter % eval_interval == 0 or iter == max_iters - 1:
losses = estimate_loss()
print(f"step {iter}: train loss {losses['train']:.4f}, val loss {losses['val']:.4f}")

# sample a batch of data
xb, yb = get_batch('train')

# evaluate the loss
logits, loss = model(xb, yb)
optimizer.zero_grad(set_to_none=True)
loss.backward()
optimizer.step()

# generate from the model
context = torch.zeros((1, 1), dtype=torch.long, device=device)
print(decode(m.generate(context, max_new_tokens=2000)[0].tolist()))
```

#### After

```py
...
checkpoint_path = # The path of your checkpoint

for iter in range(max_iters):

# save a checkpoint every 100 iteration or if we are at the last iteration
if iter % 1000 == 0 or iter == max_iters - 1:
checkpoint = {
'iter': iter,
'model_state_dict': m.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'loss': loss
}
torch.save(checkpoint, checkpoint_path)

# every once in a while evaluate the loss on train and val sets
if iter % eval_interval == 0 or iter == max_iters - 1:
losses = estimate_loss()
print(f"step {iter}: train loss {losses['train']:.4f}, val loss {losses['val']:.4f}")

# sample a batch of data
xb, yb = get_batch('train')

# evaluate the loss
logits, loss = model(xb, yb)
optimizer.zero_grad(set_to_none=True)
loss.backward()
optimizer.step()

# generate from the model
context = torch.zeros((1, 1), dtype=torch.long, device=device)
print(decode(m.generate(context, max_new_tokens=2000)[0].tolist()))
```
8 changes: 8 additions & 0 deletions docs/tech/ML/training/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"label": "Training",
"position": 1,
"link": {
"type": "generated-index",
"description": "The following documentation will cover topics relating to model training."
}
}

0 comments on commit a2844de

Please sign in to comment.