diff --git a/Multimodal/TrOCR.ipynb b/Multimodal/TrOCR.ipynb new file mode 100644 index 0000000..f1f6770 --- /dev/null +++ b/Multimodal/TrOCR.ipynb @@ -0,0 +1,62 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [], + "gpuType": "T4" + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + }, + "accelerator": "GPU" + }, + "cells": [ + { + "cell_type": "markdown", + "source": [ + "## TrOCR\n", + "Text recognition is a long-standing research problem for document digitalization. Existing approaches for text recognition are usually built based on CNN for image understanding and RNN for char-level text generation. In addition, another language model is usually needed to improve the overall accuracy as a post-processing step. In this paper, we propose an end-to-end text recognition approach with pre-trained image Transformer and text Transformer models, namely TrOCR, which leverages the Transformer architecture for both image understanding and wordpiece-level text generation. The TrOCR model is simple but effective, and can be pre-trained with large-scale synthetic data and fine-tuned with human-labeled datasets. Experiments show that the TrOCR model outperforms the current state-of-the-art models on both printed and handwritten text recognition tasks.\n", + "\n" + ], + "metadata": { + "id": "2NXGTpw-OX_O" + } + }, + { + "cell_type": "code", + "source": [ + "%%capture\n", + "!pip install -q transformers\n", + "!pip install -q sentencepiece\n", + "!pip install -q jiwer\n", + "!pip install -q datasets\n", + "!pip install -q evaluate\n", + "!pip install -q -U accelerate\n", + "\n", + "\n", + "!pip install -q matplotlib\n", + "!pip install -q protobuf==3.20.1\n", + "!pip install -q tensorboard" + ], + "metadata": { + "id": "rfeboSP6PEVv" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cUgITgyiOJ0n" + }, + "outputs": [], + "source": [] + } + ] +} \ No newline at end of file