outerbounds · madhur-ob · Jan 12, 2025 · Jan 12, 2025 · Jan 12, 2025 · Jan 12, 2025
diff --git a/README.md b/README.md
@@ -25,8 +25,8 @@ The examples in this repository are based on the [original TensorFlow Examples](
 
 | Directory | TensorFlow script description |
 | :--- | ---: |
-| [MirroredStrategy](examples/single-node/README.md) | Synchronous distributed training on multiple GPUs on one machine. |  
-| [MultiWorkerMirroredStrategy](examples/multi-node/README.md) | Synchronous distributed training across multiple workers, each with potentially multiple GPUs. |  
+| [MirroredStrategy](examples/single_node/README.md) | Synchronous distributed training on multiple GPUs on one machine. |
+| [MultiWorkerMirroredStrategy](examples/multi-node/README.md) | Synchronous distributed training across multiple workers, each with potentially multiple GPUs. |
 
 #### Parameter Server
 Not yet tested, please reach out to the Outerbounds team if you need help.

diff --git a/examples/single-node/README.md b/examples/single-node/README.md
diff --git a/examples/single-node/mnist_mirrored_strategy.py b/examples/single-node/mnist_mirrored_strategy.py
diff --git a/examples/single_node/README.md b/examples/single_node/README.md
@@ -0,0 +1,14 @@
+# Introduction
+
+The following four files showcase how to leverage tensorflow's `MirroredStrategy` with `@kubernetes`. This enables distributed training on multiple GPUs of a single machine. Note that it doesn't use the `@tensorflow` decorator.
+
+1. `gpu_profile.py` contains the `@gpu_profile` decorator, and is available [here](https://github.com/outerbounds/metaflow-gpu-profile). It is used in the file `flow.py`
+
+2. `train_mnist.py` contains the main snippet for how to use the `MirroredStrategy` while training a model on the MNIST dataset.
+
+3. `flow.py` contains a flow that uses the training code from `train_mnist.py` and uses the docker image `tensorflow/tensorflow:2.15.0-gpu` for GPU setup.
+
+- This can be run using `python flow.py --environment=pypi run`
+- If you are on the [Outerbounds](https://outerbounds.com/) platform, you can leverage `fast-bakery` for blazingly fast docker image builds. This can be used by `python flow.py --environment=fast-bakery run`
+
+4. `reload.ipynb` showcases how to use the trained model for inference later on. Please make sure to have `tensorflow==2.15.1` installed locally to be able to run this notebook correctly.
diff --git a/examples/single-node/flow.py → examples/single_node/flow.py b/examples/single-node/flow.py → examples/single_node/flow.py
@@ -1,6 +1,5 @@
-from metaflow import FlowSpec, step, batch, conda, environment
-
-N_GPU = 2
+from metaflow import FlowSpec, step, kubernetes, environment, pypi
+from gpu_profile import gpu_profile
 
 
 class SingleNodeTensorFlow(FlowSpec):
@@ -9,19 +8,27 @@ class SingleNodeTensorFlow(FlowSpec):
 
     @step
     def start(self):
-        self.next(self.foo)
+        self.next(self.train)
 
+    @gpu_profile(interval=1)
     @environment(vars={"TF_CPP_MIN_LOG_LEVEL": "2"})
-    @batch(gpu=N_GPU, image="tensorflow/tensorflow:latest-gpu")
+    @kubernetes(gpu=2, image="registry.hub.docker.com/tensorflow/tensorflow:2.15.0-gpu")
+    @pypi(
+        packages={
+            "tensorflow-datasets": "4.9.7",
+            "matplotlib": "3.10.0",
+        }
+    )
     @step
-    def foo(self):
-        from mnist_mirrored_strategy import main
+    def train(self):
+        from train_mnist import main
 
         main(
-            run=self,
             local_model_dir=self.local_model_dir,
             local_tar_name=self.local_tar_name,
+            run=self,
         )
+
         self.next(self.end)
 
     @step