Skip to content

Text Generator is an educational project build from scratch using PyTorch. It demonstrates the complete text genration pipeline, from data cleaning and tokenization to model training and text prediction. The goal of this project is to deeply understand how text generation models work internally.

License

Notifications You must be signed in to change notification settings

Luckson-dev/text-generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text Generator

Text Generator is an educational project built from scratch using PyTorch.
It demonstrates the complete text generation pipeline, from data cleaning and tokenization to model training and text prediction.
The goal of this project is to deeply understand how text generation models work internally.


Features

  • Load and clean text data from CSV
  • Tokenization using transformers tokenizer
  • Custom PyTorch LSTM model for text generation
  • Training loop with configurable hyperparameters
  • Interactive text prediction via console
  • Saving and loading model weights
  • Configurable through config.yaml

Installation

  1. Clone the repository:
git clone https://github.com/your-username/text-generator.git
cd text-generator
  1. Create a virtual environment:
python3 -m venv venv
source venv/bin/activate 
venv\Scripts\activate

  1. Install dependencies:
pip install -r requirements.txt

Usage

Interactive prediction:

python run.py

Configuration

All hyperparameters are stored in configs/config.yaml:

  • Model parameters: embedding_dim, hidden_dim, num_layers

  • Training parameters: batch_size, learning_rate, epochs

  • Others: pad_idx, vocab_size, etc.

Contributing

This is an educational project, feel free to experiment with:

  • Different datasets

  • Other tokenizers or models (GRU, Transformer)

  • Training strategies and hyperparameters

Licence

This project is released under the MIT License.

Limitations

  • The model is trained on a small dataset, so predictions may not be accurate in all cases.
  • Performance can be improved by adding more question-answer pairs to the data.csv file.

About

Text Generator is an educational project build from scratch using PyTorch. It demonstrates the complete text genration pipeline, from data cleaning and tokenization to model training and text prediction. The goal of this project is to deeply understand how text generation models work internally.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages