Skip to content

Latest commit



100 lines (72 loc) · 4.19 KB

File metadata and controls

100 lines (72 loc) · 4.19 KB

Directory Structure

├── Initializes the model package, defining paths to model weights and labels.
├── (Optional) A zipped archive of the image dataset. You will need to unzip this.
├── Downloads CAPTCHA images from a specified URL.
├── Contains the model definition and inference pipeline for CAPTCHA prediction.
├── Provides a Gradio interface for labeling CAPTCHA images.
├── labels.json: Stores the labels for the CAPTCHA images in JSON format.
├── Trains the CAPTCHA solver model.
├── __pycache__/: Python cache directory.
└── weights/:
    └── model_0.99.pth: Pre-trained weights for the CAPTCHA solver model.


  1. Install Dependencies:

    pip install torch torchvision pillow requests tqdm gradio
  2. (Optional) Extract Dataset: If exists, unzip it into the model/ directory:

    unzip model/ -d model/


1. Data Fetching

  • Run the script to download CAPTCHA images:

    python model/
    • The number of images downloaded can be configured within the script.
    • Images are saved to the model/dataset directory.

2. Data Labeling

  • Start the Gradio labeling interface by running

    python model/
  • Open the provided URL in your browser.

  • Label the CAPTCHA images through the interface.

    • Important: Labels must consist of 4 numeric characters.
  • Labels are automatically saved to model/labels.json.

3. Training

  • Ensure you have a labeled dataset in model/labels.json.

  • Train the model using the script:

    python model/
  • Training progress and validation accuracy are printed to the console.

  • Plots of training/validation loss and validation accuracy are generated and saved as train.png.

  • The best model weights are saved to the model/weights directory, with the validation accuracy included in the filename.

4. Inference

  • The script demonstrates how to load the pre-trained model and perform inference on a single image.

    • Modify the image_path_to_predict variable within the if __name__ == "__main__": block to point to the image you want to predict.
    python model/
  • The predicted CAPTCHA text will be printed to the console.

  • The create_inference_pipeline function in model/ can be used to create a reusable inference function for integration into other applications.


  • model/ Defines paths to model weights and labels file for easy access.
  • model/ Downloads CAPTCHA images from a website. Requires internet access.
  • model/ Contains the CaptchaSolver model definition and the create_inference_pipeline function for creating a prediction pipeline. Includes a CTC decoding function.
  • model/ Provides a Gradio interface for labeling images. Uses the trained model to provide prediction hints.
  • model/labels.json: Stores the image paths and their corresponding labels in JSON format.
  • model/ Trains the CaptchaSolver model using the labeled dataset. Includes data loading, preprocessing, model definition, training loop, validation, and saving the best model weights. Also includes a modified CTC decoding function.
  • model/weights/model_0.99.pth: Pre-trained model weights.


  • The script uses a specific URL and headers to download CAPTCHA images. This may need to be updated if the source website changes.
  • The labels.json file should contain the path to each image relative to the script's location.
  • The script saves the model weights with the validation accuracy in the filename.
  • The ctc_decode function in and is crucial for handling repeating characters in the CAPTCHA labels.
  • The model architecture and training parameters can be adjusted in the script to improve performance.