Skip to content

brianw/whichway

Repository files navigation

whichway: a demo CNN to correct image orientation

This is a demonstration CNN model to highlight the impact of training data compared to model complexity.

Given a rotated image containing text:

image

Predict the required rotation degrees to correct it:

image

(26 degrees)

Getting started

Install dependencies with uv:

uv sync

Generate datasets

Create some sample datasets with generate.py -- this script is a bit hacky (vibe coded) and font paths might need tweaking.

Start with a small (2,000 samples) training dataset:

uv run generate.py --output datasets/train-small --count 2000

...and validation set (600 samples):

uv run generate.py --output datasets/val --count 600

Train a model

Start with a single epoch, just to make sure everything works:

uv run train.py --training datasets/train-small/answersheet.json --validation datasets/val/answersheet.json --epochs 1

Correct an image

Use the model you just trained to correctly rotate an image:

uv run correct.py datasets/val/sample-0222.png

Experimenting with training data vs layers

It would be nice to make our model configurable in the number of layers it has, e.g. have a list of convolution layers and use a subset based on some command line parameter, but this causes problems for PyTorch's backprop magic. To keep things simple, I've just commented out code and modified the for loop in the forward method.

Comparing results

To keep things simple, we'll keep some parameters constant:

  • We'll define a correct prediction by the model as an answer that is within 5 degrees of the actual angle.
  • We'll only change the convolution layers, our model will always have two fully connected layers before its output.
  • We'll always run 20 epochs (loss plateaus well before this).
  • We'll use the same 600 sample validation set for all runs (these samples are never part of the training data).

At 5 convolution layers:

self.conv1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3)
self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3)
self.conv3 = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3)
self.conv4 = nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3)
self.conv5 = nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3)

We get:

  • 87.5% correct, using 2,000 samples
  • 99.6% correct, using 10,000 samples

Dropping to 4 convolution layers:

self.conv1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3)
self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3)
self.conv3 = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3)
self.conv4 = nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3)

We get:

  • 73.8% correct, using 2,000 samples
  • 98.6% correct, using 10,000 samples

Thinking about the results

  • Our "bigger" model was trained with 2,000 samples and got 87.5% accuracy
  • Our "smaller" model was trained with 10,000 samples and got 98.6% accuracy

This is a toy demonstration model, but in a real-world production scenario a smaller model is cheaper to deploy, manage and run. Working to get more (and better) training data can have high ROI.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages