Vision-Transformer-Image-Classification

A Vision Transformer (ViT) implementation for image classification using CIFAR-10 dataset, leveraging HuggingFace's Trainer API for computational efficiency

Vision Transformer for Image Classification

Overview

This repository contains an implementation of the Vision Transformer (ViT) model, a novel architecture leveraging self-attention mechanisms for image classification tasks. Unlike traditional CNNs, ViT splits images into patches and processes them as sequences, enabling the model to capture global context effectively.

Objective

To explore the capabilities of Vision Transformer on the CIFAR-10 dataset.
To compare its performance with traditional CNN models.
To implement and evaluate using HuggingFace's Trainer API for improved computational efficiency.

Methodology

Dataset: CIFAR-10 (60,000 32x32 images across 10 classes).
Preprocessing: Data augmentation and patch embedding for input preparation.
Model Architecture: Implementation of Vision Transformer with patch encoding and positional encoding.
Training: Leveraged HuggingFace's Trainer API to streamline training and overcome computational limitations.
Evaluation: Achieved high accuracy through transfer learning and efficient training.

Results

Accuracy: Reached 98.6% validation accuracy by epoch 3.
Efficiency: Demonstrated the use of pre-trained weights and transfer learning for computationally constrained setups.

Challenges

Faced computational resource constraints but overcame them using HuggingFace’s Trainer API, reducing the training burden while maintaining accuracy.

Usage

Clone the repository:

git clone https://github.com/KhushiRajurkar/Vision-Transformer-Image-Classification.git

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Vision_Transformer_Project.ipynb		Vision_Transformer_Project.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vision-Transformer-Image-Classification

Vision Transformer for Image Classification

Overview

Objective

Methodology

Results

Challenges

Usage

About

Releases

Packages

Languages

License

KhushiRajurkar/Vision-Transformer-Image-Classification

Folders and files

Latest commit

History

Repository files navigation

Vision-Transformer-Image-Classification

Vision Transformer for Image Classification

Overview

Objective

Methodology

Results

Challenges

Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages