-
Notifications
You must be signed in to change notification settings - Fork 46
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
8e5f91f
commit 4a200a4
Showing
1 changed file
with
62 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
{ | ||
"nbformat": 4, | ||
"nbformat_minor": 0, | ||
"metadata": { | ||
"colab": { | ||
"provenance": [], | ||
"gpuType": "T4" | ||
}, | ||
"kernelspec": { | ||
"name": "python3", | ||
"display_name": "Python 3" | ||
}, | ||
"language_info": { | ||
"name": "python" | ||
}, | ||
"accelerator": "GPU" | ||
}, | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"## TrOCR\n", | ||
"Text recognition is a long-standing research problem for document digitalization. Existing approaches for text recognition are usually built based on CNN for image understanding and RNN for char-level text generation. In addition, another language model is usually needed to improve the overall accuracy as a post-processing step. In this paper, we propose an end-to-end text recognition approach with pre-trained image Transformer and text Transformer models, namely TrOCR, which leverages the Transformer architecture for both image understanding and wordpiece-level text generation. The TrOCR model is simple but effective, and can be pre-trained with large-scale synthetic data and fine-tuned with human-labeled datasets. Experiments show that the TrOCR model outperforms the current state-of-the-art models on both printed and handwritten text recognition tasks.\n", | ||
"\n" | ||
], | ||
"metadata": { | ||
"id": "2NXGTpw-OX_O" | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"%%capture\n", | ||
"!pip install -q transformers\n", | ||
"!pip install -q sentencepiece\n", | ||
"!pip install -q jiwer\n", | ||
"!pip install -q datasets\n", | ||
"!pip install -q evaluate\n", | ||
"!pip install -q -U accelerate\n", | ||
"\n", | ||
"\n", | ||
"!pip install -q matplotlib\n", | ||
"!pip install -q protobuf==3.20.1\n", | ||
"!pip install -q tensorboard" | ||
], | ||
"metadata": { | ||
"id": "rfeboSP6PEVv" | ||
}, | ||
"execution_count": null, | ||
"outputs": [] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "cUgITgyiOJ0n" | ||
}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
] | ||
} |