whichway: a demo CNN to correct image orientation

This is a demonstration CNN model to highlight the impact of training data compared to model complexity.

Given a rotated image containing text:

Predict the required rotation degrees to correct it:

(26 degrees)

Getting started

Install dependencies with uv:

uv sync

Generate datasets

Create some sample datasets with generate.py -- this script is a bit hacky (vibe coded) and font paths might need tweaking.

Start with a small (2,000 samples) training dataset:

uv run generate.py --output datasets/train-small --count 2000

...and validation set (600 samples):

uv run generate.py --output datasets/val --count 600

Train a model

Start with a single epoch, just to make sure everything works:

uv run train.py --training datasets/train-small/answersheet.json --validation datasets/val/answersheet.json --epochs 1

Correct an image

Use the model you just trained to correctly rotate an image:

uv run correct.py datasets/val/sample-0222.png

Experimenting with training data vs layers

It would be nice to make our model configurable in the number of layers it has, e.g. have a list of convolution layers and use a subset based on some command line parameter, but this causes problems for PyTorch's backprop magic. To keep things simple, I've just commented out code and modified the for loop in the forward method.

Comparing results

To keep things simple, we'll keep some parameters constant:

We'll define a correct prediction by the model as an answer that is within 5 degrees of the actual angle.
We'll only change the convolution layers, our model will always have two fully connected layers before its output.
We'll always run 20 epochs (loss plateaus well before this).
We'll use the same 600 sample validation set for all runs (these samples are never part of the training data).

At 5 convolution layers:

self.conv1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3)
self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3)
self.conv3 = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3)
self.conv4 = nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3)
self.conv5 = nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3)

We get:

87.5% correct, using 2,000 samples
99.6% correct, using 10,000 samples

Dropping to 4 convolution layers:

self.conv1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3)
self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3)
self.conv3 = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3)
self.conv4 = nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3)

We get:

73.8% correct, using 2,000 samples
98.6% correct, using 10,000 samples

Thinking about the results

Our "bigger" model was trained with 2,000 samples and got 87.5% accuracy
Our "smaller" model was trained with 10,000 samples and got 98.6% accuracy

This is a toy demonstration model, but in a real-world production scenario a smaller model is cheaper to deploy, manage and run. Working to get more (and better) training data can have high ROI.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
readme-images		readme-images
whichway		whichway
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
README.md		README.md
correct.py		correct.py
generate.py		generate.py
pyproject.toml		pyproject.toml
train.py		train.py
uv.lock		uv.lock
visualize.py		visualize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

whichway: a demo CNN to correct image orientation

Getting started

Generate datasets

Train a model

Correct an image

Experimenting with training data vs layers

Comparing results

At 5 convolution layers:

Dropping to 4 convolution layers:

Thinking about the results

About

Uh oh!

Releases

Packages

Languages

brianw/whichway

Folders and files

Latest commit

History

Repository files navigation

whichway: a demo CNN to correct image orientation

Getting started

Generate datasets

Train a model

Correct an image

Experimenting with training data vs layers

Comparing results

At 5 convolution layers:

Dropping to 4 convolution layers:

Thinking about the results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages