Skip to content

Commit

Permalink
first draft of the code
Browse files Browse the repository at this point in the history
  • Loading branch information
vladsandulescu committed Nov 17, 2020
1 parent a0f5666 commit 01ae1ad
Show file tree
Hide file tree
Showing 494 changed files with 68,189 additions and 1 deletion.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
.DS_Store/
.idea/
9 changes: 9 additions & 0 deletions Im2txt/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
.DS_Store
.idea

checkpoint
*.data-00000-of-00001
*.index
*.meta

__pychache__
84 changes: 84 additions & 0 deletions Im2txt/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Note:
This repo aims to provide a **Ready-to-Go** setup with TensorFlow environment for **Image Captioning Inference** using pre-trained model. For training from scratch or funetuning, please refer to [Tensorflow Model Repo](https://github.com/tensorflow/models/tree/master/research/im2txt).


# Contents
* [Model Overview](#model-overview)
* [Introduction](#introduction)
* [Architecture](#architecture)
* [Requirement](#getting-started)
* [Install](#install-required-packages)
* [Get Pre-trained Model](#get-pre-trained-model)
* [Generating Captions](#generating-captions)
* [Issue](#encoutering-issue)

## Model Overview

### Introduction
The *Show and Tell* model is a deep neural network that learns how to describe
the content of images. For example:

![Example captions](g3doc/example_captions.jpg)

*Show and Tell: A Neural Image Caption Generator*

A TensorFlow implementation of the image-to-text model described in the paper:

"Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning
Challenge."

Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan.

*IEEE transactions on pattern analysis and machine intelligence (2016).*

Full text available at: http://arxiv.org/abs/1609.06647

### Architecture
Please refer to the original [Tensorflow Model Repo](https://github.com/tensorflow/models/tree/master/research/im2txt).

## Requirement

### Install Required Packages
I strongly suggest that you run `pip install -r requirement.txt` in your CLI
to get all packages needed.

OR you could opt for manually installing the required packages below:

* **TensorFlow** 1.0 or greater ([instructions](https://www.tensorflow.org/install/))
* **NumPy** ([instructions](http://www.scipy.org/install.html))
* **Natural Language Toolkit (NLTK)**:
* First install NLTK ([instructions](http://www.nltk.org/install.html))
* Then install the NLTK data package "punkt" ([instructions](http://www.nltk.org/data.html))

### Get Pre-trained Model
Download [inceptionv3 finetuned parameters over 1M](https://drive.google.com/open?id=1r4-9FEIbOUyBSvA-fFVFgvhFpgee6sF5) and you will get 4 files, and make sure to put them all into this path `im2txt/model/Hugh/train/`
* **newmodel.ckpt-2000000.data-00000-of-00001**
* **newmodel.ckpt-2000000.index**
* **newmodel.ckpt-2000000.meta**
* **checkpoint**

## Generating Captions
Your downloaded *Show and Tell* model can generate captions for any JPEG image! The
following command line will generate captions for such an image.
```
python im2txt/run_inference.py --checkpoint_path="im2txt/model/Hugh/train/newmodel.ckpt-2000000" --vocab_file="im2txt/data
/Hugh/word_counts.txt" --input_files="im2txt/data/images/test.jpg"
```

Example output:
```
Captions for image test.jpg:
0) a young boy wearing a hat and tie . (p=0.000195)
1) a young boy wearing a blue shirt and tie . (p=0.000100)
2) a young boy wearing a blue shirt and a tie . (p=0.000045)
```

Note: you may get different results. Some variation between different models is
expected.

Here is the image:

![ME](im2txt/data/images/test.jpg)

## Encoutering Issue
First, check out on this [thread](https://github.com/tensorflow/models/issues/466) and it's likely that you find answer there. Otherwise, open an issue and I will try to help you.
1 change: 1 addition & 0 deletions Im2txt/WORKSPACE
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
workspace(name = "Im2txt")
10 changes: 10 additions & 0 deletions Im2txt/conda/init_im2txt_ubuntu.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
conda create -n im2txt python=3.6 pip --yes
source ~/anaconda3/etc/profile.d/conda.sh
conda activate im2txt

cd /path/to/Im2txt

# Install python libraries
pip install -r requirement.txt

echo "Done"
Binary file added Im2txt/g3doc/COCO_val2014_000000224477.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Im2txt/g3doc/example_captions.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Im2txt/g3doc/show_and_tell_architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
96 changes: 96 additions & 0 deletions Im2txt/im2txt/BUILD
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
package(default_visibility = [":internal"])

licenses(["notice"]) # Apache 2.0

exports_files(["LICENSE"])

package_group(
name = "internal",
packages = [
"//im2txt/...",
],
)

py_binary(
name = "build_mscoco_data",
srcs = [
"data/build_mscoco_data.py",
],
)

sh_binary(
name = "download_and_preprocess_mscoco",
srcs = ["data/download_and_preprocess_mscoco.sh"],
data = [
":build_mscoco_data",
],
)

py_library(
name = "configuration",
srcs = ["configuration.py"],
srcs_version = "PY2AND3",
)

py_library(
name = "show_and_tell_model",
srcs = ["show_and_tell_model.py"],
srcs_version = "PY2AND3",
deps = [
"//im2txt/ops:image_embedding",
"//im2txt/ops:image_processing",
"//im2txt/ops:inputs",
],
)

py_test(
name = "show_and_tell_model_test",
size = "large",
srcs = ["show_and_tell_model_test.py"],
deps = [
":configuration",
":show_and_tell_model",
],
)

py_library(
name = "inference_wrapper",
srcs = ["inference_wrapper.py"],
srcs_version = "PY2AND3",
deps = [
":show_and_tell_model",
"//im2txt/inference_utils:inference_wrapper_base",
],
)

py_binary(
name = "train",
srcs = ["train.py"],
srcs_version = "PY2AND3",
deps = [
":configuration",
":show_and_tell_model",
],
)

py_binary(
name = "evaluate",
srcs = ["evaluate.py"],
srcs_version = "PY2AND3",
deps = [
":configuration",
":show_and_tell_model",
],
)

py_binary(
name = "run_inference",
srcs = ["run_inference.py"],
srcs_version = "PY2AND3",
deps = [
":configuration",
":inference_wrapper",
"//im2txt/inference_utils:caption_generator",
"//im2txt/inference_utils:vocabulary",
],
)
Empty file added Im2txt/im2txt/__init__.py
Empty file.
Binary file not shown.
Binary file not shown.
Binary file not shown.
104 changes: 104 additions & 0 deletions Im2txt/im2txt/configuration.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

"""Image-to-text model and training configurations."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function


class ModelConfig(object):
"""Wrapper class for model hyperparameters."""

def __init__(self):
"""Sets the default model hyperparameters."""
# File pattern of sharded TFRecord file containing SequenceExample protos.
# Must be provided in training and evaluation modes.
self.input_file_pattern = None

# Image format ("jpeg" or "png").
self.image_format = "jpeg"

# Approximate number of values per input shard. Used to ensure sufficient
# mixing between shards in training.
self.values_per_input_shard = 2300
# Minimum number of shards to keep in the input queue.
self.input_queue_capacity_factor = 2
# Number of threads for prefetching SequenceExample protos.
self.num_input_reader_threads = 1

# Name of the SequenceExample context feature containing image data.
self.image_feature_name = "image/data"
# Name of the SequenceExample feature list containing integer captions.
self.caption_feature_name = "image/caption_ids"

# Number of unique words in the vocab (plus 1, for <UNK>).
# The default value is larger than the expected actual vocab size to allow
# for differences between tokenizer versions used in preprocessing. There is
# no harm in using a value greater than the actual vocab size, but using a
# value less than the actual vocab size will result in an error.
self.vocab_size = 12000

# Number of threads for image preprocessing. Should be a multiple of 2.
self.num_preprocess_threads = 4

# Batch size.
self.batch_size = 32

# File containing an Inception v3 checkpoint to initialize the variables
# of the Inception model. Must be provided when starting training for the
# first time.
self.inception_checkpoint_file = None

# Dimensions of Inception v3 input images.
self.image_height = 299
self.image_width = 299

# Scale used to initialize model variables.
self.initializer_scale = 0.08

# LSTM input and output dimensionality, respectively.
self.embedding_size = 512
self.num_lstm_units = 512

# If < 1.0, the dropout keep probability applied to LSTM variables.
self.lstm_dropout_keep_prob = 0.7


class TrainingConfig(object):
"""Wrapper class for training hyperparameters."""

def __init__(self):
"""Sets the default training hyperparameters."""
# Number of examples per epoch of training data.
self.num_examples_per_epoch = 586363

# Optimizer for training the model.
self.optimizer = "SGD"

# Learning rate for the initial phase of training.
self.initial_learning_rate = 2.0
self.learning_rate_decay_factor = 0.5
self.num_epochs_per_decay = 8.0

# Learning rate when fine tuning the Inception v3 parameters.
self.train_inception_learning_rate = 0.0005

# If not None, clip gradients to this value.
self.clip_gradients = 5.0

# How many model checkpoints to keep.
self.max_checkpoints_to_keep = 5
Loading

0 comments on commit 01ae1ad

Please sign in to comment.