Skip to content
/ GPT Public

Generative Pretrained Transformer built from scratch using PyTorch.

License

Notifications You must be signed in to change notification settings

Forquosh/GPT

Repository files navigation

GPT

A PyTorch implementation of a GPT-like language model with text preprocessing utilities.

Overview

This project implements a transformer-based language model similar to GPT, designed for character-level text generation. It includes utilities for vocabulary generation and dataset splitting.

In this example, I tested it on the fabulous book The Brothers Karamazov, downloaded from Project Gutenberg. Feel free to change the text file or even try training it on an consacrated dataset (like OpenWebText for example), though on larger datasets the vocab.py and split.py might not work properly.

Features

  • Character-level language modeling
  • Multi-head self-attention mechanism
  • Memory-efficient data loading using memory mapping
  • Text preprocessing utilities
  • Configurable model architecture

Requirements

  • Python 3.9+
  • PyTorch
  • Jupyter Notebooks
  • CUDA (optional, for GPU acceleration, on Windows)

Project Structure

  • vocab.py - Generates vocabulary from input text
  • split.py - Splits text data into training and validation sets
  • GPT.ipynb - Main model implementation and training

Usage

1. Initialization

INITIALIZATION STEPS FOR MAC OS

Run the terminal in a directory of choice.

Create a Python Virtual Environment and activate it:

python3 -m venv venv
source ./venv/bin/activate

Install the MacOS requirements:

pip3 install -r requirements_macos.txt

INITIALIZATION STEPS FOR WINDOWS

Install Python on your system. If you have it already, skip this step.

Install Anaconda. Follow the steps from this link.

Once installed, run Anaconda Prompt in a directory of choice.

Create a Python Virtual Environment and activate it:

python3 -m venv venv
venv\Scripts\activate

Install the Windows requirements:

pip3 install -r requirements_windows.txt

! These requirements are different. On Windows, PyTorch is installed with CUDA support, if available

2. Prepare Your Data

First, add your desired data file and generate the vocabulary from your text:

python3 vocab.py

Then, split your data into training and validation sets:

python3 split.py

2. Train the Model

Install a new kernel to use in your Jupyter Notebook:

python3 -m ipykernel install --user --name=venv --display-name "GPTKernel"

Run Jupyter Notebook:

jupyter notebook

Open GPT.ipynb.

Select GPTKernel and run the cells sequentially. The notebook contains:

  • Model architecture implementation
  • Training loop
  • Text generation functionality

Model Parameters

The default hyperparameters are:

  • Batch size: 32
  • Block size: 128
  • Maximum training iterations: 300
  • Learning Rate: 2e-5
  • Evaluation: every 50 iterations
  • Embedding dimension: 300
  • Number of heads: 4
  • Number of layers: 4
  • Dropout: 0.2

These can be adjusted based on your hardware capabilities and requirements.

Model Architecture

The model implements a transformer architecture with:

  • Multi-head self-attention
  • Position embeddings
  • Layer normalization
  • Feed-forward networks

License

MIT