-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
a0f5666
commit 01ae1ad
Showing
494 changed files
with
68,189 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
.DS_Store/ | ||
.idea/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
.DS_Store | ||
.idea | ||
|
||
checkpoint | ||
*.data-00000-of-00001 | ||
*.index | ||
*.meta | ||
|
||
__pychache__ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
# Note: | ||
This repo aims to provide a **Ready-to-Go** setup with TensorFlow environment for **Image Captioning Inference** using pre-trained model. For training from scratch or funetuning, please refer to [Tensorflow Model Repo](https://github.com/tensorflow/models/tree/master/research/im2txt). | ||
|
||
|
||
# Contents | ||
* [Model Overview](#model-overview) | ||
* [Introduction](#introduction) | ||
* [Architecture](#architecture) | ||
* [Requirement](#getting-started) | ||
* [Install](#install-required-packages) | ||
* [Get Pre-trained Model](#get-pre-trained-model) | ||
* [Generating Captions](#generating-captions) | ||
* [Issue](#encoutering-issue) | ||
|
||
## Model Overview | ||
|
||
### Introduction | ||
The *Show and Tell* model is a deep neural network that learns how to describe | ||
the content of images. For example: | ||
|
||
 | ||
|
||
*Show and Tell: A Neural Image Caption Generator* | ||
|
||
A TensorFlow implementation of the image-to-text model described in the paper: | ||
|
||
"Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning | ||
Challenge." | ||
|
||
Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan. | ||
|
||
*IEEE transactions on pattern analysis and machine intelligence (2016).* | ||
|
||
Full text available at: http://arxiv.org/abs/1609.06647 | ||
|
||
### Architecture | ||
Please refer to the original [Tensorflow Model Repo](https://github.com/tensorflow/models/tree/master/research/im2txt). | ||
|
||
## Requirement | ||
|
||
### Install Required Packages | ||
I strongly suggest that you run `pip install -r requirement.txt` in your CLI | ||
to get all packages needed. | ||
|
||
OR you could opt for manually installing the required packages below: | ||
|
||
* **TensorFlow** 1.0 or greater ([instructions](https://www.tensorflow.org/install/)) | ||
* **NumPy** ([instructions](http://www.scipy.org/install.html)) | ||
* **Natural Language Toolkit (NLTK)**: | ||
* First install NLTK ([instructions](http://www.nltk.org/install.html)) | ||
* Then install the NLTK data package "punkt" ([instructions](http://www.nltk.org/data.html)) | ||
|
||
### Get Pre-trained Model | ||
Download [inceptionv3 finetuned parameters over 1M](https://drive.google.com/open?id=1r4-9FEIbOUyBSvA-fFVFgvhFpgee6sF5) and you will get 4 files, and make sure to put them all into this path `im2txt/model/Hugh/train/` | ||
* **newmodel.ckpt-2000000.data-00000-of-00001** | ||
* **newmodel.ckpt-2000000.index** | ||
* **newmodel.ckpt-2000000.meta** | ||
* **checkpoint** | ||
|
||
## Generating Captions | ||
Your downloaded *Show and Tell* model can generate captions for any JPEG image! The | ||
following command line will generate captions for such an image. | ||
``` | ||
python im2txt/run_inference.py --checkpoint_path="im2txt/model/Hugh/train/newmodel.ckpt-2000000" --vocab_file="im2txt/data | ||
/Hugh/word_counts.txt" --input_files="im2txt/data/images/test.jpg" | ||
``` | ||
|
||
Example output: | ||
``` | ||
Captions for image test.jpg: | ||
0) a young boy wearing a hat and tie . (p=0.000195) | ||
1) a young boy wearing a blue shirt and tie . (p=0.000100) | ||
2) a young boy wearing a blue shirt and a tie . (p=0.000045) | ||
``` | ||
|
||
Note: you may get different results. Some variation between different models is | ||
expected. | ||
|
||
Here is the image: | ||
|
||
 | ||
|
||
## Encoutering Issue | ||
First, check out on this [thread](https://github.com/tensorflow/models/issues/466) and it's likely that you find answer there. Otherwise, open an issue and I will try to help you. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
workspace(name = "Im2txt") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
conda create -n im2txt python=3.6 pip --yes | ||
source ~/anaconda3/etc/profile.d/conda.sh | ||
conda activate im2txt | ||
|
||
cd /path/to/Im2txt | ||
|
||
# Install python libraries | ||
pip install -r requirement.txt | ||
|
||
echo "Done" |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
package(default_visibility = [":internal"]) | ||
|
||
licenses(["notice"]) # Apache 2.0 | ||
|
||
exports_files(["LICENSE"]) | ||
|
||
package_group( | ||
name = "internal", | ||
packages = [ | ||
"//im2txt/...", | ||
], | ||
) | ||
|
||
py_binary( | ||
name = "build_mscoco_data", | ||
srcs = [ | ||
"data/build_mscoco_data.py", | ||
], | ||
) | ||
|
||
sh_binary( | ||
name = "download_and_preprocess_mscoco", | ||
srcs = ["data/download_and_preprocess_mscoco.sh"], | ||
data = [ | ||
":build_mscoco_data", | ||
], | ||
) | ||
|
||
py_library( | ||
name = "configuration", | ||
srcs = ["configuration.py"], | ||
srcs_version = "PY2AND3", | ||
) | ||
|
||
py_library( | ||
name = "show_and_tell_model", | ||
srcs = ["show_and_tell_model.py"], | ||
srcs_version = "PY2AND3", | ||
deps = [ | ||
"//im2txt/ops:image_embedding", | ||
"//im2txt/ops:image_processing", | ||
"//im2txt/ops:inputs", | ||
], | ||
) | ||
|
||
py_test( | ||
name = "show_and_tell_model_test", | ||
size = "large", | ||
srcs = ["show_and_tell_model_test.py"], | ||
deps = [ | ||
":configuration", | ||
":show_and_tell_model", | ||
], | ||
) | ||
|
||
py_library( | ||
name = "inference_wrapper", | ||
srcs = ["inference_wrapper.py"], | ||
srcs_version = "PY2AND3", | ||
deps = [ | ||
":show_and_tell_model", | ||
"//im2txt/inference_utils:inference_wrapper_base", | ||
], | ||
) | ||
|
||
py_binary( | ||
name = "train", | ||
srcs = ["train.py"], | ||
srcs_version = "PY2AND3", | ||
deps = [ | ||
":configuration", | ||
":show_and_tell_model", | ||
], | ||
) | ||
|
||
py_binary( | ||
name = "evaluate", | ||
srcs = ["evaluate.py"], | ||
srcs_version = "PY2AND3", | ||
deps = [ | ||
":configuration", | ||
":show_and_tell_model", | ||
], | ||
) | ||
|
||
py_binary( | ||
name = "run_inference", | ||
srcs = ["run_inference.py"], | ||
srcs_version = "PY2AND3", | ||
deps = [ | ||
":configuration", | ||
":inference_wrapper", | ||
"//im2txt/inference_utils:caption_generator", | ||
"//im2txt/inference_utils:vocabulary", | ||
], | ||
) |
Empty file.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
# Copyright 2016 The TensorFlow Authors. All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# ============================================================================== | ||
|
||
"""Image-to-text model and training configurations.""" | ||
|
||
from __future__ import absolute_import | ||
from __future__ import division | ||
from __future__ import print_function | ||
|
||
|
||
class ModelConfig(object): | ||
"""Wrapper class for model hyperparameters.""" | ||
|
||
def __init__(self): | ||
"""Sets the default model hyperparameters.""" | ||
# File pattern of sharded TFRecord file containing SequenceExample protos. | ||
# Must be provided in training and evaluation modes. | ||
self.input_file_pattern = None | ||
|
||
# Image format ("jpeg" or "png"). | ||
self.image_format = "jpeg" | ||
|
||
# Approximate number of values per input shard. Used to ensure sufficient | ||
# mixing between shards in training. | ||
self.values_per_input_shard = 2300 | ||
# Minimum number of shards to keep in the input queue. | ||
self.input_queue_capacity_factor = 2 | ||
# Number of threads for prefetching SequenceExample protos. | ||
self.num_input_reader_threads = 1 | ||
|
||
# Name of the SequenceExample context feature containing image data. | ||
self.image_feature_name = "image/data" | ||
# Name of the SequenceExample feature list containing integer captions. | ||
self.caption_feature_name = "image/caption_ids" | ||
|
||
# Number of unique words in the vocab (plus 1, for <UNK>). | ||
# The default value is larger than the expected actual vocab size to allow | ||
# for differences between tokenizer versions used in preprocessing. There is | ||
# no harm in using a value greater than the actual vocab size, but using a | ||
# value less than the actual vocab size will result in an error. | ||
self.vocab_size = 12000 | ||
|
||
# Number of threads for image preprocessing. Should be a multiple of 2. | ||
self.num_preprocess_threads = 4 | ||
|
||
# Batch size. | ||
self.batch_size = 32 | ||
|
||
# File containing an Inception v3 checkpoint to initialize the variables | ||
# of the Inception model. Must be provided when starting training for the | ||
# first time. | ||
self.inception_checkpoint_file = None | ||
|
||
# Dimensions of Inception v3 input images. | ||
self.image_height = 299 | ||
self.image_width = 299 | ||
|
||
# Scale used to initialize model variables. | ||
self.initializer_scale = 0.08 | ||
|
||
# LSTM input and output dimensionality, respectively. | ||
self.embedding_size = 512 | ||
self.num_lstm_units = 512 | ||
|
||
# If < 1.0, the dropout keep probability applied to LSTM variables. | ||
self.lstm_dropout_keep_prob = 0.7 | ||
|
||
|
||
class TrainingConfig(object): | ||
"""Wrapper class for training hyperparameters.""" | ||
|
||
def __init__(self): | ||
"""Sets the default training hyperparameters.""" | ||
# Number of examples per epoch of training data. | ||
self.num_examples_per_epoch = 586363 | ||
|
||
# Optimizer for training the model. | ||
self.optimizer = "SGD" | ||
|
||
# Learning rate for the initial phase of training. | ||
self.initial_learning_rate = 2.0 | ||
self.learning_rate_decay_factor = 0.5 | ||
self.num_epochs_per_decay = 8.0 | ||
|
||
# Learning rate when fine tuning the Inception v3 parameters. | ||
self.train_inception_learning_rate = 0.0005 | ||
|
||
# If not None, clip gradients to this value. | ||
self.clip_gradients = 5.0 | ||
|
||
# How many model checkpoints to keep. | ||
self.max_checkpoints_to_keep = 5 |
Oops, something went wrong.