Skip to content

Commit 3c80aea

Browse files
committed
Update the launcher and fix the setup
1 parent 885d032 commit 3c80aea

File tree

5 files changed

+142
-123
lines changed

5 files changed

+142
-123
lines changed

Diff for: MANIFEST.in

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
include *.md
2+
include *.txt

Diff for: README.md

+107-108
Original file line numberDiff line numberDiff line change
@@ -1,176 +1,175 @@
1-
# Attention-OCR
2-
Authours: [Qi Guo](http://qiguo.ml) and [Yuntian Deng](https://github.com/da03)
1+
# Attention-based OCR
32

4-
Visual Attention based OCR. The model first runs a sliding CNN on the image (images are resized to height 32 while preserving aspect ratio). Then an LSTM is stacked on top of the CNN. Finally, an attention model is used as a decoder for producing the final outputs.
3+
Visual attention-based OCR model for image recognition with additional tools for creating TFRecords datasets and exporting the trained model with weights as a [SavedModel](https://www.tensorflow.org/api_docs/python/tf/saved_model) or a frozen graph.
54

6-
![example image 0](http://cs.cmu.edu/~yuntiand/OCR-2.jpg)
5+
## Acknowledgements
76

8-
# Prerequsites
9-
Most of our code is written based on Tensorflow, but we also use Keras for the convolution part of our model. Besides, we use python package distance to calculate edit distance for evaluation. (However, that is not mandatory, if distance is not installed, we will do exact match).
7+
This project is based on a model by [Qi Guo](http://qiguo.ml) and [Yuntian Deng](https://github.com/da03). You can find the original model in the [da03/Attention-OCR](https://github.com/da03/Attention-OCR) repository.
108

11-
### Tensorflow: [Installation Instructions](https://www.tensorflow.org/install/) (tested on 1.2.0)
9+
## The model
1210

13-
### Distance (Optional):
11+
Authors: [Qi Guo](http://qiguo.ml) and [Yuntian Deng](https://github.com/da03).
1412

15-
```
16-
wget http://www.cs.cmu.edu/~yuntiand/Distance-0.1.3.tar.gz
17-
```
13+
The model first runs a sliding CNN on the image (images are resized to height 32 while preserving aspect ratio). Then an LSTM is stacked on top of the CNN. Finally, an attention model is used as a decoder for producing the final outputs.
1814

19-
```
20-
tar zxf Distance-0.1.3.tar.gz
21-
```
15+
![OCR example](http://cs.cmu.edu/~yuntiand/OCR-2.jpg)
16+
17+
## Installation
2218

2319
```
24-
cd distance; sudo python setup.py install
20+
pip install aocr
2521
```
2622

27-
# Usage:
23+
Note: Tensorflow 1.2 and Numpy will be installed as dependencies. Additional dependencies are `PIL`/`Pillow`, `distance`, and `six`.
2824

29-
Note: We assume that the working directory is `Attention-OCR`.
25+
## Usage
3026

31-
## Train
32-
33-
### Data Preparation
34-
We need a file (specified by parameter `data-path`) containing the path of images and the corresponding characters, e.g.:
27+
### Create a dataset
3528

3629
```
37-
path/to/image1 abc
38-
path/to/image2 def
30+
aocr dataset datasets/annotations-training.txt datasets/training.tfrecords
31+
aocr dataset datasets/annotations-testing.txt datasets/testing.tfrecords
3932
```
4033

41-
And we also need to specify a `data-base-dir` parameter such that we read the images from path `data-base-dir/path/to/image`. If `data-path` contains absolute path of images, then `data-base-dir` needs to be set to `/`.
42-
43-
### A Toy Example
44-
45-
For a toy example, we have prepared a training dataset of the specified format, which is a subset of [Synth 90k](http://www.robots.ox.ac.uk/~vgg/data/text/)
34+
Annotations are simple text files containing the image paths (either absolute or relative to your working dir) and their corresponding labels:
4635

4736
```
48-
wget http://www.cs.cmu.edu/~yuntiand/sample.tgz
37+
datasets/images/hello.jpg hello
38+
datasets/images/world.jpg world
4939
```
5040

41+
### Train
42+
5143
```
52-
tar zxf sample.tgz
44+
aocr train datasets/training.tfrecords
5345
```
5446

47+
A new model will be created, and the training will start. Note that it takes quite a long time to reach convergence, since we are training the CNN and attention model simultaneously.
48+
49+
The `--steps-per-checkpoint` parameter determines how often the model checkpoints will be saved (the default output dir is `checkpoints/`).
50+
51+
**Important:** there is a lot of available training options. See the CLI help or the `parameters` section of this README.
52+
53+
### Test and visualize
54+
5555
```
56-
python src/launcher.py --phase=train --data-path=sample/sample.txt --data-base-dir=sample --log-path=log.txt --no-load-model
56+
aocr test datasets/testing.tfrecords
5757
```
5858

59-
After a while, you will see something like the following output in `log.txt`:
59+
Additionally, you can visualize the attention results during testing (saved to `results/` by default):
6060

6161
```
62-
...
63-
2016-06-08 20:47:22,335 root INFO Created model with fresh parameters.
64-
2016-06-08 20:47:52,852 root INFO current_step: 0
65-
2016-06-08 20:48:01,253 root INFO step_time: 8.400597, step perplexity: 38.998714
66-
2016-06-08 20:48:01,385 root INFO current_step: 1
67-
2016-06-08 20:48:07,166 root INFO step_time: 5.781749, step perplexity: 38.998445
68-
2016-06-08 20:48:07,337 root INFO current_step: 2
69-
2016-06-08 20:48:12,322 root INFO step_time: 4.984972, step perplexity: 39.006730
70-
2016-06-08 20:48:12,347 root INFO current_step: 3
71-
2016-06-08 20:48:16,821 root INFO step_time: 4.473902, step perplexity: 39.000267
72-
2016-06-08 20:48:16,859 root INFO current_step: 4
73-
2016-06-08 20:48:21,452 root INFO step_time: 4.593249, step perplexity: 39.009864
74-
2016-06-08 20:48:21,530 root INFO current_step: 5
75-
2016-06-08 20:48:25,878 root INFO step_time: 4.348195, step perplexity: 38.987707
76-
2016-06-08 20:48:26,016 root INFO current_step: 6
77-
2016-06-08 20:48:30,851 root INFO step_time: 4.835423, step perplexity: 39.022887
62+
aocr test --visualize datasets/testing.tfrecords
7863
```
7964

80-
Note that it takes quite a long time to reach convergence, since we are training the CNN and attention model simultaneously.
65+
Example output images in `results/correct`:
8166

82-
## Test and visualize attention results
67+
Image 0 (j/j):
8368

84-
The test data format shall be the same as training data format. We have also prepared a test dataset of the specified format, which includes ICDAR03, ICDAR13, IIIT5k and SVT.
69+
![example image 0](http://cs.cmu.edu/~yuntiand/2evaluation_data_icdar13_images_word_370.png/image_0.jpg)
8570

86-
```
87-
wget http://www.cs.cmu.edu/~yuntiand/evaluation_data.tgz
88-
```
71+
Image 1 (u/u):
8972

90-
```
91-
tar zxf evaluation_data.tgz
92-
```
73+
![example image 1](http://cs.cmu.edu/~yuntiand/2evaluation_data_icdar13_images_word_370.png/image_1.jpg)
9374

94-
We also provide a trained model on Synth 90K:
75+
Image 2 (n/n):
9576

96-
```
97-
wget http://www.cs.cmu.edu/~yuntiand/model.tgz
98-
```
77+
![example image 2](http://cs.cmu.edu/~yuntiand/2evaluation_data_icdar13_images_word_370.png/image_2.jpg)
9978

100-
```
101-
tar zxf model.tgz
102-
```
79+
Image 3 (g/g):
80+
81+
![example image 3](http://cs.cmu.edu/~yuntiand/2evaluation_data_icdar13_images_word_370.png/image_3.jpg)
82+
83+
Image 4 (l/l):
84+
85+
![example image 4](http://cs.cmu.edu/~yuntiand/2evaluation_data_icdar13_images_word_370.png/image_4.jpg)
86+
87+
Image 5 (e/e):
88+
89+
![example image 5](http://cs.cmu.edu/~yuntiand/2evaluation_data_icdar13_images_word_370.png/image_5.jpg)
90+
91+
### Export
10392

10493
```
105-
python src/launcher.py --phase=test --visualize --data-path=evaluation_data/svt/test.txt --data-base-dir=evaluation_data/svt --log-path=log.txt --load-model --model-dir=model --output-dir=results
94+
aocr export exported-model
10695
```
10796

108-
After a while, you will see something like the following output in `log.txt`:
97+
Load weights from the latest checkpoints and export the model into the `./exported-model` directory.
10998

110-
```
111-
2016-06-08 22:36:31,638 root INFO Reading model parameters from model/translate.ckpt-47200
112-
2016-06-08 22:36:40,529 root INFO Compare word based on edit distance.
113-
2016-06-08 22:36:41,652 root INFO step_time: 1.119277, step perplexity: 1.056626
114-
2016-06-08 22:36:41,660 root INFO 1.000000 out of 1 correct
115-
2016-06-08 22:36:42,358 root INFO step_time: 0.696687, step perplexity: 2.003350
116-
2016-06-08 22:36:42,363 root INFO 1.666667 out of 2 correct
117-
2016-06-08 22:36:42,831 root INFO step_time: 0.466550, step perplexity: 1.501963
118-
2016-06-08 22:36:42,835 root INFO 2.466667 out of 3 correct
119-
2016-06-08 22:36:43,402 root INFO step_time: 0.562091, step perplexity: 1.269991
120-
2016-06-08 22:36:43,418 root INFO 3.366667 out of 4 correct
121-
2016-06-08 22:36:43,897 root INFO step_time: 0.477545, step perplexity: 1.072437
122-
2016-06-08 22:36:43,905 root INFO 4.366667 out of 5 correct
123-
2016-06-08 22:36:44,107 root INFO step_time: 0.195361, step perplexity: 2.071796
124-
2016-06-08 22:36:44,127 root INFO 5.144444 out of 6 correct
99+
## Google Cloud ML Engine
100+
101+
To train the model in the [Google Cloud Machine Learning Engine](https://cloud.google.com/ml-engine/), upload the training dataset into a Google Cloud Storage bucket and start a training job with the `gcloud` tool.
102+
103+
1. Set the environment variables:
125104

126105
```
106+
# Prefix for the job name.
107+
export JOB_PREFIX="aocr"
127108
128-
Example output images in `results/correct` (the output directory is set via parameter `output-dir` and the default is `results`): (Look closer to see it clearly.)
109+
# Region to launch the training job in.
110+
# Should be the same as the storage bucket region.
111+
export REGION="us-central1"
129112
130-
Format: Image `index` (`predicted`/`ground truth`) `Image file`
113+
# Your storage bucket.
114+
export GS_BUCKET="gs://aocr-bucket"
131115
132-
Image 0 (j/j): ![example image 0](http://cs.cmu.edu/~yuntiand/2evaluation_data_icdar13_images_word_370.png/image_0.jpg)
116+
# Path to store your training dataset in the bucket.
117+
export DATASET_UPLOAD_PATH="training.tfrecords"
118+
```
133119

134-
Image 1 (u/u): ![example image 1](http://cs.cmu.edu/~yuntiand/2evaluation_data_icdar13_images_word_370.png/image_1.jpg)
120+
2. Upload the training dataset:
135121

136-
Image 2 (n/n): ![example image 2](http://cs.cmu.edu/~yuntiand/2evaluation_data_icdar13_images_word_370.png/image_2.jpg)
122+
```
123+
gsutil cp datasets/training.tfrecords $GS_BUCKET/$DATASET_UPLOAD_PATH
124+
```
137125

138-
Image 3 (g/g): ![example image 3](http://cs.cmu.edu/~yuntiand/2evaluation_data_icdar13_images_word_370.png/image_3.jpg)
126+
3. Launch the ML Engine job:
139127

140-
Image 4 (l/l): ![example image 4](http://cs.cmu.edu/~yuntiand/2evaluation_data_icdar13_images_word_370.png/image_4.jpg)
128+
```
129+
export NOW=$(date +"%Y%m%d_%H%M%S")
130+
export JOB_NAME="$JOB_PREFIX$NOW"
131+
export JOB_DIR="$GS_BUCKET/$JOB_NAME"
141132
142-
Image 5 (e/e): ![example image 5](http://cs.cmu.edu/~yuntiand/2evaluation_data_icdar13_images_word_370.png/image_5.jpg)
133+
gcloud ml-engine jobs submit training $JOB_NAME \
134+
--job-dir $JOB_DIR \
135+
--module-name=aocr \
136+
--package-path=aocr \
137+
--region=$REGION \
138+
--scale-tier=BASIC_GPU \
139+
--runtime-version 1.2 \
140+
-- \
141+
train $GS_BUCKET/$DATASET_UPLOAD_PATH \
142+
--steps-per-checkpoint=3000
143+
```
143144

145+
## Parameters
144146

145-
# Parameters:
147+
### Global
148+
* `log-path`: Path for the log file.
146149

147-
- Control
148-
* `phase`: Determine whether to train or test.
149-
* `visualize`: Valid if `phase` is set to test. Output the attention maps on the original image.
150-
* `load-model`: Load model from `model-dir` or not.
150+
### Testing
151+
* `visualize`: Output the attention maps on the original image.
151152

152-
- Input and output
153-
* `data-base-dir`: The base directory of the image path in `data-path`. If the image path in `data-path` is absolute path, set it to `/`.
154-
* `data-path`: The path containing data file names and labels. Format per line: `image_path characters`.
155-
* `model-dir`: The directory for saving and loading model parameters (structure is not stored).
156-
* `log-path`: The path to put log.
157-
* `output-dir`: The path to put visualization results if `visualize` is set to True.
158-
* `steps-per-checkpoint`: Checkpointing (print perplexity, save model) per how many steps
153+
### Exporting
154+
* `format`: Format for the export (either `savedmodel` or `frozengraph`).
159155

160-
- Optimization
156+
### Training
157+
* `steps-per-checkpoint`: Checkpointing (print perplexity, save model) per how many steps
161158
* `num-epoch`: The number of whole data passes.
162-
* `batch-size`: Batch size. Only valid if `phase` is set to train.
163-
* `initial-learning-rate`: Initial learning rate, note the we use AdaDelta, so the initial value doe not matter much.
164-
165-
- Network
159+
* `batch-size`: Batch size.
160+
* `initial-learning-rate`: Initial learning rate, note the we use AdaDelta, so the initial value does not matter much.
166161
* `target-embedding-size`: Embedding dimension for each target.
167162
* `attn-use-lstm`: Whether or not use LSTM attention decoder cell.
168163
* `attn-num-hidden`: Number of hidden units in attention decoder cell.
169164
* `attn-num-layers`: Number of layers in attention decoder cell. (Encoder number of hidden units will be `attn-num-hidden`*`attn-num-layers`).
170165
* `target-vocab-size`: Target vocabulary size. Default is = 26+10+3 # 0: PADDING, 1: GO, 2: EOS, >2: 0-9, a-z
166+
* `no-resume`: Create new weights even if there are checkpoints present.
167+
* `max-gradient-norm`: Clip gradients to this norm.
168+
* `no-gradient-clipping`: Do not perform gradient clipping.
169+
* `gpu-id`: GPU to use.
170+
* `use-gru`: Use GRU cells.
171171

172-
173-
# References
172+
## References
174173

175174
[Convert a formula to its LaTex source](https://github.com/harvardnlp/im2markup)
176175

Diff for: aocr/__main__.py

+16-6
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,11 @@
1+
# TODO: single op for prediction
2+
# TODO: test visualization
3+
# TODO: clean up
4+
# TODO: update the readme
5+
# TODO: better CLI descriptions/syntax
6+
# TODO: export
7+
# TODO: move all the training parameters inside the training parser
8+
19
import sys
210
import argparse
311
import logging
@@ -13,6 +21,8 @@
1321

1422
def process_args(args, defaults):
1523
parser = argparse.ArgumentParser()
24+
parser.prog = 'aocr'
25+
1626
subparsers = parser.add_subparsers(help='Subcommands.')
1727

1828
# Global arguments
@@ -60,11 +70,11 @@ def process_args(args, defaults):
6070
', default=%s' % (defaults.VISUALIZE)))
6171

6272
# Exporting
63-
parser_export = subparsers.add_parser('export', help='Export the saved checkpoints for production.')
64-
parser_test.set_defaults(phase='export')
65-
parser_export.add_argument('export_path', metavar='path',
73+
parser_export = subparsers.add_parser('export', help='Export the model with weights for production use.')
74+
parser_export.set_defaults(phase='export')
75+
parser_export.add_argument('export_path', nargs='?', metavar='dir',
6676
type=str, default=defaults.EXPORT_PATH,
67-
help=('Path to export the model in the specified format,'
77+
help=('Directory to save the exported model to,'
6878
'default=%s'
6979
% (defaults.EXPORT_PATH)))
7080
parser_export.add_argument('--format', dest="format",
@@ -126,12 +136,12 @@ def process_args(args, defaults):
126136
type=str, default=defaults.OUTPUT_DIR,
127137
help=('Output directory, default=%s'
128138
% (defaults.OUTPUT_DIR)))
129-
parser.add_argument('--max_gradient_norm', dest="max_gradient_norm",
139+
parser.add_argument('--max-gradient-norm', dest="max_gradient_norm",
130140
type=int, default=defaults.MAX_GRADIENT_NORM,
131141
help=('Clip gradients to this norm.'
132142
', default=%s'
133143
% (defaults.MAX_GRADIENT_NORM)))
134-
parser.add_argument('--no-gradient_clipping', dest='clip_gradients', action='store_false',
144+
parser.add_argument('--no-gradient-clipping', dest='clip_gradients', action='store_false',
135145
help=('Do not perform gradient clipping, default for clip_gradients is %s' %
136146
(defaults.CLIP_GRADIENTS)))
137147
parser.set_defaults(clip_gradients=defaults.CLIP_GRADIENTS)

Diff for: requirements.txt

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
setuptools==32.1.0
2+
six==1.10.0
3+
numpy==1.12.1
4+
tensorflow==1.2.1
5+
Pillow==4.2.1
6+
distance==0.1.3

Diff for: setup.py

+11-9
Original file line numberDiff line numberDiff line change
@@ -2,26 +2,28 @@
22
from setuptools import setup
33

44
REQUIRED_PACKAGES = ['distance', 'tensorflow', 'numpy', 'six']
5-
6-
7-
def readme():
8-
with open('README.md') as file:
9-
return file.read()
5+
VERSION = '0.0.2'
6+
try:
7+
import pypandoc
8+
README = pypandoc.convert('README.md', 'rst')
9+
except(IOError, ImportError):
10+
README = open('README.md').read()
1011

1112

1213
setup(
1314
name='aocr',
1415
url='https://github.com/emedvedev/attention-ocr',
16+
download_url='https://github.com/emedvedev/attention-ocr/archive/{}.tar.gz'.format(VERSION),
1517
author='Ed Medvedev',
1618
author_email='[email protected]',
17-
version='0.1',
19+
version=VERSION,
1820
install_requires=REQUIRED_PACKAGES,
1921
packages=find_packages(),
2022
include_package_data=True,
2123
license='MIT',
22-
description='''Optical character recognition model
23-
for Tensorflow based on Visual Attention.''',
24-
long_description=readme(),
24+
description=('''Optical character recognition model '''
25+
'''for Tensorflow based on Visual Attention.'''),
26+
long_description=README,
2527
entry_points={
2628
'console_scripts': ['aocr=aocr.__main__:main'],
2729
}

0 commit comments

Comments
 (0)