first draft of the code

dinhanhx · Nov 17, 2020 · 01ae1ad · 01ae1ad
1 parent a0f5666
commit 01ae1ad
Show file tree

Hide file tree

Showing 494 changed files with 68,189 additions and 1 deletion.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,2 @@
+.DS_Store/
+.idea/
diff --git a/Im2txt/.gitignore b/Im2txt/.gitignore
@@ -0,0 +1,9 @@
+.DS_Store
+.idea
+
+checkpoint
+*.data-00000-of-00001
+*.index
+*.meta
+
+__pychache__
diff --git a/Im2txt/README.md b/Im2txt/README.md
@@ -0,0 +1,84 @@
+# Note:
+This repo aims to provide a **Ready-to-Go** setup with TensorFlow environment for **Image Captioning Inference** using pre-trained model. For training from scratch or funetuning, please refer to [Tensorflow Model Repo](https://github.com/tensorflow/models/tree/master/research/im2txt).
+
+
+# Contents
+* [Model Overview](#model-overview)
+    * [Introduction](#introduction)
+    * [Architecture](#architecture)
+* [Requirement](#getting-started)
+    * [Install](#install-required-packages)
+    * [Get Pre-trained Model](#get-pre-trained-model)
+* [Generating Captions](#generating-captions)
+* [Issue](#encoutering-issue)
+
+## Model Overview
+
+### Introduction
+The *Show and Tell* model is a deep neural network that learns how to describe
+the content of images. For example:
+
+![Example captions](g3doc/example_captions.jpg)
+
+*Show and Tell: A Neural Image Caption Generator*
+
+A TensorFlow implementation of the image-to-text model described in the paper:
+
+"Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning
+Challenge."
+
+Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan.
+
+*IEEE transactions on pattern analysis and machine intelligence (2016).*
+
+Full text available at: http://arxiv.org/abs/1609.06647
+
+### Architecture
+Please refer to the original [Tensorflow Model Repo](https://github.com/tensorflow/models/tree/master/research/im2txt).
+
+## Requirement
+
+### Install Required Packages
+I strongly suggest that you run `pip install -r requirement.txt` in your CLI 
+to get all packages needed.
+
+OR you could opt for manually installing the required packages below:
+
+* **TensorFlow** 1.0 or greater ([instructions](https://www.tensorflow.org/install/))
+* **NumPy** ([instructions](http://www.scipy.org/install.html))
+* **Natural Language Toolkit (NLTK)**:
+    * First install NLTK ([instructions](http://www.nltk.org/install.html))
+    * Then install the NLTK data package "punkt" ([instructions](http://www.nltk.org/data.html))
+
+### Get Pre-trained Model
+Download [inceptionv3 finetuned parameters over 1M](https://drive.google.com/open?id=1r4-9FEIbOUyBSvA-fFVFgvhFpgee6sF5) and you will get 4 files, and make sure to put them all into this path `im2txt/model/Hugh/train/`
+* **newmodel.ckpt-2000000.data-00000-of-00001**
+* **newmodel.ckpt-2000000.index**
+* **newmodel.ckpt-2000000.meta**
+* **checkpoint**
+
+## Generating Captions
+Your downloaded *Show and Tell* model can generate captions for any JPEG image! The
+following command line will generate captions for such an image.
+```
+python im2txt/run_inference.py --checkpoint_path="im2txt/model/Hugh/train/newmodel.ckpt-2000000" --vocab_file="im2txt/data
+/Hugh/word_counts.txt" --input_files="im2txt/data/images/test.jpg"
+```
+
+Example output:
+```
+Captions for image test.jpg:
+  0) a young boy wearing a hat and tie . (p=0.000195)
+  1) a young boy wearing a blue shirt and tie . (p=0.000100)
+  2) a young boy wearing a blue shirt and a tie . (p=0.000045)
+```
+
+Note: you may get different results. Some variation between different models is
+expected.
+
+Here is the image:
+
+![ME](im2txt/data/images/test.jpg)
+
+## Encoutering Issue
+First, check out on this [thread](https://github.com/tensorflow/models/issues/466) and it's likely that you find answer there. Otherwise, open an issue and I will try to help you.
diff --git a/Im2txt/WORKSPACE b/Im2txt/WORKSPACE
@@ -0,0 +1 @@
+workspace(name = "Im2txt")
diff --git a/Im2txt/conda/init_im2txt_ubuntu.sh b/Im2txt/conda/init_im2txt_ubuntu.sh
@@ -0,0 +1,10 @@
+conda create -n im2txt python=3.6 pip --yes
+source ~/anaconda3/etc/profile.d/conda.sh
+conda activate im2txt
+
+cd /path/to/Im2txt
+
+# Install python libraries
+pip install -r requirement.txt
+
+echo "Done"
diff --git a/Im2txt/g3doc/COCO_val2014_000000224477.jpg b/Im2txt/g3doc/COCO_val2014_000000224477.jpg
diff --git a/Im2txt/g3doc/example_captions.jpg b/Im2txt/g3doc/example_captions.jpg
diff --git a/Im2txt/g3doc/show_and_tell_architecture.png b/Im2txt/g3doc/show_and_tell_architecture.png
diff --git a/Im2txt/im2txt/BUILD b/Im2txt/im2txt/BUILD
@@ -0,0 +1,96 @@
+package(default_visibility = [":internal"])
+
+licenses(["notice"])  # Apache 2.0
+
+exports_files(["LICENSE"])
+
+package_group(
+    name = "internal",
+    packages = [
+        "//im2txt/...",
+    ],
+)
+
+py_binary(
+    name = "build_mscoco_data",
+    srcs = [
+        "data/build_mscoco_data.py",
+    ],
+)
+
+sh_binary(
+    name = "download_and_preprocess_mscoco",
+    srcs = ["data/download_and_preprocess_mscoco.sh"],
+    data = [
+        ":build_mscoco_data",
+    ],
+)
+
+py_library(
+    name = "configuration",
+    srcs = ["configuration.py"],
+    srcs_version = "PY2AND3",
+)
+
+py_library(
+    name = "show_and_tell_model",
+    srcs = ["show_and_tell_model.py"],
+    srcs_version = "PY2AND3",
+    deps = [
+        "//im2txt/ops:image_embedding",
+        "//im2txt/ops:image_processing",
+        "//im2txt/ops:inputs",
+    ],
+)
+
+py_test(
+    name = "show_and_tell_model_test",
+    size = "large",
+    srcs = ["show_and_tell_model_test.py"],
+    deps = [
+        ":configuration",
+        ":show_and_tell_model",
+    ],
+)
+
+py_library(
+    name = "inference_wrapper",
+    srcs = ["inference_wrapper.py"],
+    srcs_version = "PY2AND3",
+    deps = [
+        ":show_and_tell_model",
+        "//im2txt/inference_utils:inference_wrapper_base",
+    ],
+)
+
+py_binary(
+    name = "train",
+    srcs = ["train.py"],
+    srcs_version = "PY2AND3",
+    deps = [
+        ":configuration",
+        ":show_and_tell_model",
+    ],
+)
+
+py_binary(
+    name = "evaluate",
+    srcs = ["evaluate.py"],
+    srcs_version = "PY2AND3",
+    deps = [
+        ":configuration",
+        ":show_and_tell_model",
+    ],
+)
+
+py_binary(
+    name = "run_inference",
+    srcs = ["run_inference.py"],
+    srcs_version = "PY2AND3",
+    deps = [
+        ":configuration",
+        ":inference_wrapper",
+        "//im2txt/inference_utils:caption_generator",
+        "//im2txt/inference_utils:vocabulary",
+    ],
+)
diff --git a/Im2txt/im2txt/__init__.py b/Im2txt/im2txt/__init__.py
diff --git a/Im2txt/im2txt/__pycache__/configuration.cpython-36.pyc b/Im2txt/im2txt/__pycache__/configuration.cpython-36.pyc
diff --git a/Im2txt/im2txt/__pycache__/inference_wrapper.cpython-36.pyc b/Im2txt/im2txt/__pycache__/inference_wrapper.cpython-36.pyc
diff --git a/Im2txt/im2txt/__pycache__/show_and_tell_model.cpython-36.pyc b/Im2txt/im2txt/__pycache__/show_and_tell_model.cpython-36.pyc
diff --git a/Im2txt/im2txt/configuration.py b/Im2txt/im2txt/configuration.py
@@ -0,0 +1,104 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Image-to-text model and training configurations."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+
+class ModelConfig(object):
+  """Wrapper class for model hyperparameters."""
+
+  def __init__(self):
+    """Sets the default model hyperparameters."""
+    # File pattern of sharded TFRecord file containing SequenceExample protos.
+    # Must be provided in training and evaluation modes.
+    self.input_file_pattern = None
+
+    # Image format ("jpeg" or "png").
+    self.image_format = "jpeg"
+
+    # Approximate number of values per input shard. Used to ensure sufficient
+    # mixing between shards in training.
+    self.values_per_input_shard = 2300
+    # Minimum number of shards to keep in the input queue.
+    self.input_queue_capacity_factor = 2
+    # Number of threads for prefetching SequenceExample protos.
+    self.num_input_reader_threads = 1
+
+    # Name of the SequenceExample context feature containing image data.
+    self.image_feature_name = "image/data"
+    # Name of the SequenceExample feature list containing integer captions.
+    self.caption_feature_name = "image/caption_ids"
+
+    # Number of unique words in the vocab (plus 1, for <UNK>).
+    # The default value is larger than the expected actual vocab size to allow
+    # for differences between tokenizer versions used in preprocessing. There is
+    # no harm in using a value greater than the actual vocab size, but using a
+    # value less than the actual vocab size will result in an error.
+    self.vocab_size = 12000
+
+    # Number of threads for image preprocessing. Should be a multiple of 2.
+    self.num_preprocess_threads = 4
+
+    # Batch size.
+    self.batch_size = 32
+
+    # File containing an Inception v3 checkpoint to initialize the variables
+    # of the Inception model. Must be provided when starting training for the
+    # first time.
+    self.inception_checkpoint_file = None
+
+    # Dimensions of Inception v3 input images.
+    self.image_height = 299
+    self.image_width = 299
+
+    # Scale used to initialize model variables.
+    self.initializer_scale = 0.08
+
+    # LSTM input and output dimensionality, respectively.
+    self.embedding_size = 512
+    self.num_lstm_units = 512
+
+    # If < 1.0, the dropout keep probability applied to LSTM variables.
+    self.lstm_dropout_keep_prob = 0.7
+
+
+class TrainingConfig(object):
+  """Wrapper class for training hyperparameters."""
+
+  def __init__(self):
+    """Sets the default training hyperparameters."""
+    # Number of examples per epoch of training data.
+    self.num_examples_per_epoch = 586363
+
+    # Optimizer for training the model.
+    self.optimizer = "SGD"
+
+    # Learning rate for the initial phase of training.
+    self.initial_learning_rate = 2.0
+    self.learning_rate_decay_factor = 0.5
+    self.num_epochs_per_decay = 8.0
+
+    # Learning rate when fine tuning the Inception v3 parameters.
+    self.train_inception_learning_rate = 0.0005
+
+    # If not None, clip gradients to this value.
+    self.clip_gradients = 5.0
+
+    # How many model checkpoints to keep.
+    self.max_checkpoints_to_keep = 5