Machine Learning Techniques: Text Generation, LoRA, and Knowledge Distillation

Assignment #2 - E0270: Machine Learning

Author: Pritam Trilochan Gouda
Affiliation: CSA, IISc
Date: April 27, 2024

Overview

This repository contains the solutions and discussions for Assignment #2 of the course E0270: Machine Learning. The assignment covers three main topics: Text Generation with GPT-2, Low Rank Adaptation (LoRA), and Knowledge Distillation.

Introduction

This assignment explores the application and analysis of advanced machine learning techniques, focusing on text generation, efficient model adaptation, and knowledge transfer between models.

Problem 0: Text Generation with GPT-2

Overview

An exploration of GPT-2's text generation capabilities was conducted by providing a prompt and analyzing the model's ability to generate a coherent and creative continuation.

Generated Text Instance

The model generated a narrative based on the prompt, demonstrating its grasp of context and creative storytelling.

Problem 1: Low Rank Adaptation (LoRA)

Introduction

LoRA is a Parameter-Efficient Fine-tuning (PEFT) technique that allows for selective updating of model parameters, reducing computational overhead while maintaining performance.

Implementation and Results

LoRA was integrated into the GPT-2 model, and the adaptation was tested on the CoLA dataset. The fine-tuned model achieved a balance between computational efficiency and accuracy.

Model Details

GPT2 Variant Used: Medium
Total Number of Parameters: 356.40M
Number of Trainable Parameters: 1.68M
Reduction in Parameters: 99.53%
Maximum Accuracy on CoLA Validation Dataset: 82.73%
GPT2 Variant Used: Base
Total Number of Parameters: 125.03M
Number of Trainable Parameters: 0.63M
Reduction in Parameters: 99.50%

Training Strategy

The GPT-2 model was fine-tuned using the following hyperparameters:

Learning Rate: 1e-3
Number of Epochs: 10
Batch Size: 128
Optimizer: Adam
Loss Function: Cross-Entropy Loss
LoRA Rank: 4

Problem 2: Knowledge Distillation

Introduction

Knowledge Distillation aims to transfer knowledge from a larger teacher model to a smaller student model, enabling efficient deployment in resource-constrained environments.

Implementation and Results

An RNN was trained via knowledge distillation from the fine-tuned GPT-2 model. The student model achieved similar validation performance compared to the teacher model, confirming the effectiveness of the distillation process.

Distillation Strategy

To distill knowledge from the fine-tuned GPT model (teacher model) to the DistilRNN model (student model) for the CoLA classification dataset, the distillation loss function used is a combination of soft target loss and true label loss.

DistilRNN Architecture

Embedding layer mapping input tokens to dense vectors of size 768.
Two-layer RNN with hidden size 768.
ReLU activation function.
Linear layer projecting the output to a 2-dimensional space for binary classification.

Optimal Training Hyperparameters

Batch size: 128
Learning rate: 1e-3
Number of epochs: 5

Results

Maximum Accuracy on CoLA Validation Dataset: 71%
Accuracy without KD: 68%
Accuracy Improvement with KD: 3%

Files Included

Text Generation, LoRA, and Knowledge Distillation
├── plots
│   ├── Distillation_accuracy.png
│   ├── Distillation_loss.png
│   ├── LoRA_accuracy.png
│   ├── LoRA_loss.png
│   ├── rnn_accuracy.png
│   └── rnn_loss.png
├── tuning
│   ├── tuning.txt
│   ├── tuning2.txt
│   ├── tuning3.txt
│   └── tuning4.txt
├── Report_23754.pdf
├── model.py
├── run.py
├── train_utils.py
└── utils.py

model.py: Full definition of a GPT Language Model, all of it in this single file.
Report_23754.pdf: Detailed project report with explanation.

Plots

Usage

Clone the repository:

git clone https://github.com/yourusername/ml-techniques.git
cd ml-techniques

Conclusion

This assignment provides a comprehensive study of advanced ML techniques, demonstrating their practical applications and effectiveness in different scenarios.

References

Practical Tips for Finetuning LLMs Using LoRA

Knowledge Distillation: Principles, Algorithms, Applications

Pretraining a 124-M Parameter GPT-2 Language Model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Techniques: Text Generation, LoRA, and Knowledge Distillation

Assignment #2 - E0270: Machine Learning

Overview

Table of Contents

Introduction

Problem 0: Text Generation with GPT-2

Overview

Generated Text Instance

Problem 1: Low Rank Adaptation (LoRA)

Introduction

Implementation and Results

Model Details

Training Strategy

Problem 2: Knowledge Distillation

Introduction

Implementation and Results

Distillation Strategy

DistilRNN Architecture

Optimal Training Hyperparameters

Results

Files Included

Plots

Usage

Conclusion

References

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
plots		plots
tuning		tuning
README.md		README.md
Report_23754.pdf		Report_23754.pdf
model.py		model.py
run.py		run.py
train_utils.py		train_utils.py
utils.py		utils.py

pritamgouda11/Text-Generation-LoRA-and-Knowledge-Distillation

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Techniques: Text Generation, LoRA, and Knowledge Distillation

Assignment #2 - E0270: Machine Learning

Overview

Table of Contents

Introduction

Problem 0: Text Generation with GPT-2

Overview

Generated Text Instance

Problem 1: Low Rank Adaptation (LoRA)

Introduction

Implementation and Results

Model Details

Training Strategy

Problem 2: Knowledge Distillation

Introduction

Implementation and Results

Distillation Strategy

DistilRNN Architecture

Optimal Training Hyperparameters

Results

Files Included

Plots

Usage

Conclusion

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages