This is a demonstration CNN model to highlight the impact of training data compared to model complexity.
Given a rotated image containing text:
Predict the required rotation degrees to correct it:
(26 degrees)
Install dependencies with uv:
uv sync
Create some sample datasets with generate.py -- this script is a bit hacky (vibe coded) and font paths might need tweaking.
Start with a small (2,000 samples) training dataset:
uv run generate.py --output datasets/train-small --count 2000
...and validation set (600 samples):
uv run generate.py --output datasets/val --count 600
Start with a single epoch, just to make sure everything works:
uv run train.py --training datasets/train-small/answersheet.json --validation datasets/val/answersheet.json --epochs 1
Use the model you just trained to correctly rotate an image:
uv run correct.py datasets/val/sample-0222.png
It would be nice to make our model configurable in the number of layers it has, e.g. have a list of convolution layers and use a subset based on some command line parameter, but this causes problems for PyTorch's backprop magic. To keep things simple, I've just commented out code and modified the for loop in the forward method.
To keep things simple, we'll keep some parameters constant:
- We'll define a correct prediction by the model as an answer that is within 5 degrees of the actual angle.
- We'll only change the convolution layers, our model will always have two fully connected layers before its output.
- We'll always run 20 epochs (loss plateaus well before this).
- We'll use the same 600 sample validation set for all runs (these samples are never part of the training data).
self.conv1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3)
self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3)
self.conv3 = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3)
self.conv4 = nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3)
self.conv5 = nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3)We get:
- 87.5% correct, using 2,000 samples
- 99.6% correct, using 10,000 samples
self.conv1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3)
self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3)
self.conv3 = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3)
self.conv4 = nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3)We get:
- 73.8% correct, using 2,000 samples
- 98.6% correct, using 10,000 samples
- Our "bigger" model was trained with 2,000 samples and got 87.5% accuracy
- Our "smaller" model was trained with 10,000 samples and got 98.6% accuracy
This is a toy demonstration model, but in a real-world production scenario a smaller model is cheaper to deploy, manage and run. Working to get more (and better) training data can have high ROI.

