Add files via upload

andysingal · Jan 24, 2024 · 4a200a4 · 4a200a4
1 parent 8e5f91f
commit 4a200a4
Showing 1 changed file with 62 additions and 0 deletions.
diff --git a/Multimodal/TrOCR.ipynb b/Multimodal/TrOCR.ipynb
@@ -0,0 +1,62 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": [],
+      "gpuType": "T4"
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    },
+    "accelerator": "GPU"
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## TrOCR\n",
+        "Text recognition is a long-standing research problem for document digitalization. Existing approaches for text recognition are usually built based on CNN for image understanding and RNN for char-level text generation. In addition, another language model is usually needed to improve the overall accuracy as a post-processing step. In this paper, we propose an end-to-end text recognition approach with pre-trained image Transformer and text Transformer models, namely TrOCR, which leverages the Transformer architecture for both image understanding and wordpiece-level text generation. The TrOCR model is simple but effective, and can be pre-trained with large-scale synthetic data and fine-tuned with human-labeled datasets. Experiments show that the TrOCR model outperforms the current state-of-the-art models on both printed and handwritten text recognition tasks.\n",
+        "\n"
+      ],
+      "metadata": {
+        "id": "2NXGTpw-OX_O"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "%%capture\n",
+        "!pip install -q transformers\n",
+        "!pip install -q sentencepiece\n",
+        "!pip install -q jiwer\n",
+        "!pip install -q datasets\n",
+        "!pip install -q evaluate\n",
+        "!pip install -q -U accelerate\n",
+        "\n",
+        "\n",
+        "!pip install -q matplotlib\n",
+        "!pip install -q protobuf==3.20.1\n",
+        "!pip install -q tensorboard"
+      ],
+      "metadata": {
+        "id": "rfeboSP6PEVv"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "cUgITgyiOJ0n"
+      },
+      "outputs": [],
+      "source": []
+    }
+  ]
+}