An interactive desktop application for handwritten digit recognition using ensemble machine learning models. Users can draw digits, get AI predictions, and contribute training data to improve the model's accuracy.
- Interactive Drawing Canvas: Draw digits using mouse or drawing tablet
- Real-time Preprocessing: See how your drawing is processed for the AI (28x28px, MNIST format)
- Ensemble Model Predictions: Uses multiple CNN models for improved accuracy
- Confidence Visualization: Bar chart showing prediction probabilities for all digits (0-9)
- Interactive Training: Add your drawings to help improve the AI
- Multi-language Support: English and Danish* interface | *Still missing some translations
- Model Management: Automatic ensemble creation from multiple trained models* | *This is still not working properly
- Draw: Use mouse/tablet to draw a single digit (0-9) in the white canvas
- Predict: Click "Guess my digit!" to see the AI's prediction with confidence levels
- Teach: If the AI guesses wrong, select the correct digit and add your drawing to training data
- Improve: Click "Teach the AI with my drawings" to create a new model with your contributions
- Preprocessing: Raw drawings are converted to 28x28px grayscale images, centered and normalized to match MNIST format
- Prediction: Ensemble of CNN models makes predictions, results are averaged for final confidence scores
- Training: New models are trained on user-contributed data and automatically added to the ensemble
- Ensemble: Multiple models work together using prediction averaging for improved accuracy
- Python 3.7 or higher (we recommened 3.11)
- pip package manager
Install required packages:
pip install tensorflow opencv-python pillow matplotlib scikit-learn numpy tkinter tensorflow-datasets- Clone this repository:
git clone https://github.com/w1setown/CNN-Digit-Recognize
cd handwritten-digit-recognition- Create the models directory:
mkdir models- (Optional) Create initial models:
python create_models.pypython run_gui.pyAlternatively, if you're in the src folder:
python gui.py- Drawing: Click and drag in the white canvas area
- Clear Canvas:
Ctrl+Zor click "Clear Drawing" - Make Prediction:
EnterorCtrl+Xor click "Guess my digit!" - Add Training Data:
Ctrl+Vor click "Add my drawing to help the AI learn" - Train New Model:
Ctrl+Nor click "Teach the AI with my drawings" - Language Toggle: Click the flag icons (π¬π§/π©π°) to switch between English and Danish
For best results simulating handwriting, use a drawing tablet instead of a mouse.
CNN-DIGIT-RECOGNIZER/
βββ assets/ # Image assets and resources
β βββ flag_dk.png # Danish flag image
β βββ flag_uk.png # UK flag image
β βββ logo.png # Application logo
βββ models/ # Saved model files
β βββ model_mnist_0.keras # Trained Keras model
βββ src/ # Main application source code
β βββ __init__.py # Module initialization
β βββ gui.py # Main application window and UI logic
β βββ widgets.py # Custom UI components (canvas, charts, panels)
β βββ model.py # CNN model architecture definition
β βββ model_ensemble.py # Ensemble model management and predictions
β βββ data_utils.py # Data loading and preprocessing utilities
β βββ digit_preprocessing.py # Image preprocessing functions
β βββ model_evaluation.py # Model evaluation and metrics
βββ tests/ # Unit tests
β βββ __init__.py
β βββ test_model.py
β βββ test_data_utils.py
β βββ test_digit_preprocessing.py
βββ create_models.py # Script to create initial models
βββ run_gui.py # Entry point to launch the GUI application
βββ training.py # Model training utilities
βββ requirements.txt # Project dependencies
βββ README.md # This file
The application uses Convolutional Neural Networks (CNNs) with the following architecture:
Sequential([
Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)),
MaxPooling2D(pool_size=(2, 2)),
Conv2D(64, kernel_size=(3, 3), activation='relu'),
MaxPooling2D(pool_size=(2, 2)),
Flatten(),
Dense(128, activation='relu'),
Dropout(0.25),
Dense(10, activation='softmax') # 10 classes for digits 0-9
])- Data Augmentation: Rotation, shifting, and zooming for robustness
- Early Stopping: Prevents overfitting during training
- Learning Rate Reduction: Adaptive learning rate scheduling
- Ensemble Learning: Multiple models combined for better accuracy
The application uses an ensemble approach where:
- Multiple CNN models are trained on different data (MNIST, user contributions)
- Predictions from all models are averaged for final results
- New models are automatically added to the ensemble when trained
- Model files are stored in the
models/directory as.kerasfiles
- Canvas Drawing: Raw mouse/tablet input captured as PIL Image
- Binarization: Converted to binary using thresholding (200 threshold)
- Contour Detection: Find digit boundaries for cropping
- Centering: Digit centered in bounding box with padding
- Resizing: Scaled to 20x20px, then padded to 28x28px
- Normalization: Pixel values normalized to 0-1 range
All processing ensures compatibility with the MNIST dataset format:
- 28x28 pixel grayscale images
- White digits on black background
- Centered and size-normalized
- Float32 values in range [0, 1]
The application supports:
- English: Default interface language
- Danish: Alternative interface with flag toggle
Language switching affects:
- All UI text and labels
- Button text and tooltips
- Status messages and dialogs
- Instructions and help text
- Users draw digits and specify the correct label
- Drawings are preprocessed to MNIST format
- Data is stored temporarily until model training
- User adds multiple labeled drawings
- Click "Teach the AI with my drawings"
- New CNN model is trained on user data
- Model is automatically added to ensemble
- Training data is cleared after successful training
- Add multiple examples of each digit for better training
- Draw clearly and ensure digits are well-formed
- Use a drawing tablet for more natural handwriting
- Correct mislabeled predictions to improve accuracy
- TensorFlow: Deep learning framework and model training
- OpenCV: Image processing and preprocessing
- Tkinter: GUI framework (built into Python)
- PIL/Pillow: Image manipulation and display
- Matplotlib: Prediction confidence charts
- NumPy: Numerical computations
- scikit-learn: Data splitting utilities
- Models are loaded once at startup
- Boot up of the program is slow, so do be patient. ^^
- Predictions are near-instantaneous
- Training new models takes 1-3 minutes depending on data size
- When training new models, it is advised to have the dataset as large as possible, while also varied.
- Memory usage scales with number of models in ensemble
"Models are still loading"
- Wait for the initial model loading to complete
- Check that TensorFlow is properly installed
"Please draw a digit first!"
- Ensure you've drawn something in the white canvas area
- Try drawing with thicker strokes
Training fails
- Ensure you have sufficient training examples
- Check available disk space for model files
- Verify TensorFlow installation
Poor prediction accuracy
- Use a drawing tablet instead of mouse for better input
- Draw digits clearly and centered
- Add more training examples for problematic digits
- RAM: Minimum 4GB (8GB recommended for training)
- Storage: At least 1GB free space for models
- CPU: Multi-core processor recommended for training
- Input: Mouse or drawing tablet
MIT License
Copyright (c) 2025 Gabriel Visby SΓΈgaard Ganderup & Jakob Lykke LyngsΓΈe
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
- Built with TensorFlow and Keras
- Uses MNIST and EMNIST datasets for initial training
- GUI built with Python Tkinter
- Image processing with OpenCV and PIL
- Shan Carter and Michael Nielsen for their work on augmenting human intelligence through AI.
- Adam Dhalla for his accessible and comprehensive teaching on the mathematics of back propagation in neural networks.
- Andrej Karpathy, Younes Bensouda, and Andrew Ng for their outstanding teaching through Coursera.
- Michael Nielsen, again, for his continuing contributions to both neural network education and philosophical reflections on AI and human progress.
