tensorflow
diff --git a/‎CONTRIBUTING.md
Lines changed: 2 additions & 3 deletions b/‎CONTRIBUTING.md
Lines changed: 2 additions & 3 deletions
diff --git a/‎LICENSE
Lines changed: 0 additions & 1 deletion b/‎LICENSE
Lines changed: 0 additions & 1 deletion
diff --git a/‎README
Lines changed: 0 additions & 1 deletion b/‎README
Lines changed: 0 additions & 1 deletion
diff --git a/‎README.md
Lines changed: 79 additions & 0 deletions b/‎README.md
Lines changed: 79 additions & 0 deletions
diff --git a/‎docs/index.md
Lines changed: 8 additions & 0 deletions b/‎docs/index.md
Lines changed: 8 additions & 0 deletions
diff --git a/‎getting_started.md
Lines changed: 1 addition & 0 deletions b/‎getting_started.md
Lines changed: 1 addition & 0 deletions
diff --git a/‎setup.py
Lines changed: 44 additions & 0 deletions b/‎setup.py
Lines changed: 44 additions & 0 deletions
diff --git a/‎tensorflow_transform/__init__.py
Lines changed: 1 addition & 0 deletions b/‎tensorflow_transform/__init__.py
Lines changed: 1 addition & 0 deletions
@@ -17,14 +17,13 @@ Follow either of the two links above to access the appropriate CLA and instructi
 
 ### Contributing code
 
-If you have improvements to TensorFlow Serving, send us your pull requests!
+If you have improvements to TensorFlow Transform, send us your pull requests!
 For those just getting started, Github has a [howto](https://help.github.com/articles/using-pull-requests/).
 
 If you want to contribute but you're not sure where to start, take a look at the
-[issues with the "contributions welcome" label](https://github.com/tensorflow/serving/labels/contributions%20welcome).
+[issues with the "contributions welcome" label](https://github.com/tensorflow/transform/labels/contributions%20welcome).
 These are issues that we believe are particularly well suited for outside
 contributions, often because we probably won't get to them right now. If you
 decide to start on an issue, leave a comment so that other people know that
 you're working on it. If you want to help out, but not alone, use the issue
 comment thread to coordinate.
-
@@ -201,4 +201,3 @@ Copyright 2015 The TF.Transform Authors.  All rights reserved.
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.
-
@@ -0,0 +1,79 @@
+# tf.Transform
+
+**tf.Transform** is a library for doing data preprocessing with TensorFlow. It
+allows users to combine various data processing frameworks (currently Apache
+Beam is supported but tf.Transform can be extended to support other frameworks),
+with TensorFlow, to transform data. Because tf.Transform is built on TensorFlow,
+it allows users to export a graph which re-creates the transformations they did
+to their data as a TensorFlow graph. This is important as the user can then
+incorporate the exported TensorFlow graph into their serving model, thus
+avoiding skew between the served model and the training data.
+
+## tf.Transform Concepts
+
+The most important concept of tf.Transform is the "preprocessing function". This
+is a logical description of a transformation of a dataset. The dataset is
+conceptualized as a dictionary of columns, and the preprocessing function is
+defined by means of two kinds of function:
+
+1) A "transform" which is a function defined using TensorFlow that accepts and
+returns tensors. Such a function is applied to some input columns and generates
+transformed columns. Users define their own transforms by first defining a
+function that operates on tensors, and then applying this to columns using the
+`tf_transform.transform` function.
+
+2) An "analyzer" which is a function that accepts columns and returns a
+"statistic". A statistic is like a column except that it only has a single
+value. An example of an analyzer is `tf_transform.min` which computes the
+minimum of a column. Currently tf.Transform provides a fixed set of analyzers.
+
+By combining analyzers and transforms, users can create arbitrary pipelines for
+transforming their data. In particular, users should define a "preprocessing
+function" which accepts and returns columns.
+
+Columns are not themselves wrappers around data, rather they are placeholders
+used to construct a definition of the user's logical pipeline. In order to apply
+such a pipeline to data, we rely on the implementation. The Apache Beam
+implementation provides `PTransform`s that apply a user's preprocessing function
+to data. The typical workflow of a tf.Transform user will be to construct a
+preprocessing function, and then incorporate this into a large Beam pipeline,
+ultimately materializing the data for training.
+
+## Background
+
+While TensorFlow allows users to do arbitrary manipulations on a single instance
+or batch of instances, some kinds of preprocessing require a full pass over the
+dataset. For example, normalizing an input value, computing a vocabulary for a
+string input (and then mapping the string to an int with this vocabulary), or
+bucketizing an input. While some of these operations can be done with TensorFlow
+in a streaming manner (e.g. calculating a running mean for normalization), in
+general it may be preferable or necessary to calculate these with a full pass
+over the data.
+
+## Installation and Dependencies
+
+The easiest way to install tf.Transform is with the PyPI package.
+
+`pip install tensorflow_transform`
+
+Currently tf.Transform requires that TensorFlow be installed but does not have
+an explicit dependency on TensorFlow as a package. See [TensorFlow
+documentation](https://www.tensorflow.org/get_started/os_setup) for more
+information on installing TensorFlow.
+
+This package depends on the Google Cloud Dataflow distribution of Apache Beam.
+Apache Beam is the package used to run distributed pipelines. Apache Beam is
+able to run pipelines in multiple ways, depending on the "runner" used. While
+Apache Beam is an open source package, currently the only distribution on PyPI
+is the Cloud Dataflow distribution. This package can run beam pipelines locally,
+or on Google Cloud Dataflow.
+
+When a base package for Apache Beam (containing no runners) is available, the
+tf.Transform package will depend only on this base package, and users will be
+able to install their own runners. tf.Transform will attempt to be as
+independent from the specific runner as possible.
+
+## Getting Started
+
+For instructions on using tf.Transform see the [getting started
+guide](./getting_started.md)
@@ -0,0 +1,8 @@
+# Tensorflow Transform
+
+TFTransform is a framework for transforming data with TensorFlow.
+It allows users to combine transformations defined in terms of functions that
+act on tensors, with full pass operations that analyze the entire dataset.
+By doing so we allow users to construct complex transformations involving
+vocabularies, normalization and bucketizing, while allow a user to easily
+incorporate these transformations into the serving graph.
@@ -0,0 +1 @@
+
@@ -0,0 +1,44 @@
+# Copyright 2016 Google Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Package Setup script for the tf.Transform binary.
+"""
+
+from setuptools import find_packages
+from setuptools import setup
+
+
+def get_required_install_packages():
+  required_install_packages = [
+      # TODO(elmerg) the beam dependency will come directly from the apache beam
+      # package once it is available.
+      'google-cloud-dataflow >= 0.4.4',
+  ]
+  return required_install_packages
+
+
+def get_version():
+  return '0.1'
+
+
+setup(
+    name='tensorflow-transform',
+    version=get_version(),
+    author='Google',
+    author_email='[email protected]',
+    namespace_packages=[],
+    install_requires=get_required_install_packages(),
+    packages=find_packages(),
+    include_package_data=True,
+    description='Tensorflow Transform',
+    requires=[])
@@ -0,0 +1 @@
+