Revise Colab setup instructions for TPU: streamline steps for importing notebooks and connecting to local Jupyter Lab.

RexBearIU · RexBearIU · commit a78bf5c0db55 · 2025-12-05T08:24:48.000Z
diff --git a/docs/guides.md b/docs/guides.md
@@ -23,4 +23,5 @@ guides/optimization.md
 guides/data_input_pipeline.md
 guides/checkpointing_solutions.md
 guides/monitoring_and_debugging.md
+guides/run_python_notebook.md
 ```
diff --git a/docs/guides/run_python_notebook.md b/docs/guides/run_python_notebook.md
@@ -1,13 +1,12 @@
-# Connect notebooks to TPUs
+# Run MaxText Python Notebooks on TPUs
 
-This guide provides comprehensive instructions for setting up Jupyter Lab on TPU and connecting it to Google Colab for running MaxText examples.
+This guide provides clear, step-by-step instructions for getting started with python notebooks on the two most popular platforms: Google Colab and a local JupyterLab environment.
 
 ## 📑 Table of Contents
 
 - [Prerequisites](#prerequisites)
-- [Method 1: Google Colab with TPU (Recommended)](#method-1-google-colab-with-tpu-recommended)
+- [Method 1: Google Colab with TPU](#method-1-google-colab-with-tpu)
 - [Method 2: Local Jupyter Lab with TPU](#method-2-local-jupyter-lab-with-tpu)
-- [Method 3: Colab + Local Jupyter Lab Hybrid](#method-3-colab--local-jupyter-lab-hybrid)
 - [Available Examples](#available-examples)
 - [Common Pitfalls & Debugging](#common-pitfalls--debugging)
 - [Support & Resources](#support-and-resources)
@@ -17,77 +16,82 @@ This guide provides comprehensive instructions for setting up Jupyter Lab on TPU
 
 Before starting, make sure you have:
 
+- ✅ Basic familiarity with Jupyter, Python, and Git
+
+**For Method 2 (Local Jupyter Lab) only:**
 - ✅ A Google Cloud Platform (GCP) account with billing enabled
 - ✅ TPU quota available in your region (check under IAM & Admin → Quotas)
-- ✅ Basic familiarity with Jupyter, Python, and Git
-- ✅ gcloud CLI installed locally if you plan to use Method 2 or 3
+- ✅ `tpu.nodes.create` permission to create a TPU VM
+- ✅ gcloud CLI installed locally
 - ✅ Firewall rules open for port 8888 (Jupyter) if accessing directly
 
-## Method 1: Google Colab with TPU (Recommended)
+## Method 1: Google Colab with TPU
 
-This is the fastest way to run MaxText without managing infrastructure.
+This is the fastest way to run MaxText python notebooks without managing infrastructure.
 
-### Step 1: Open Google Colab
+**⚠️ IMPORTANT NOTE ⚠️**
+The free tier of Google Colab provides access to `v5e-1 TPU`, but this access is not guaranteed and is subject to availability and usage limits.
 
-1. Go to [Google Colab](https://colab.research.google.com/)
-2. Sign in → New Notebook
+Before proceeding, please verify that the specific notebook you are running works reliably on the free-tier TPU resources. If you encounter frequent disconnections or resource limitations, you may need to:
 
-### Step 2: Enable TPU Runtime
+* Upgrade to a Colab Pro or Pro+ subscription for more stable and powerful TPU access.
 
-1. **Runtime** → **Change runtime type**
-2. Set **Hardware accelerator** → **TPU**
-3. Select TPU version:
-   - **v5e-8** → recommended for most MaxText examples, but it's a paid option
-   - **v5e-1** → free tier option (slower, but works for Qwen-0.6B demos)
-4. Click **Save**
+* Move to local Jupyter Lab setup method with access to a powerful TPU machine.
 
-### Step 3: Upload & Prepare MaxText
+### Step 1: Choose an Example
+1.a. Visit the [MaxText examples directory](https://github.com/AI-Hypercomputer/maxtext/tree/main/src/MaxText/examples) on Github.
 
-Upload notebooks or mount your GitHub repo
+1.b. Find the notebook you want to run (e.g., `sft_qwen3_demo.ipynb`) and copy its URL.
 
-> **Note:** In Colab, the repo root will usually be `/content/maxtext`
+### Step 2: Import into Colab
+2.a. Go to [Google Colab](https://colab.research.google.com/) and sign in.
 
-**Example:**
-```bash
-!git clone https://github.com/AI-Hypercomputer/maxtext.git
-%cd maxtext
-```
+2.b. Select **File** -> **Open Notebook**.
+
+2.c. Select the **GitHub** tab.
 
-### Step 4: Run Examples
+2.d. Paste the target `.ipynb` link you copied in step 1.b and press Enter.
 
-1. Open `src/MaxText/examples/`
-2. Try:
-   - `sft_qwen3_demo.ipynb`
-   - `sft_llama3_demo.ipynb`
-   - `rl_llama3_demo.ipynb` (GRPO/GSPO training)
+### Step 3: Enable TPU Runtime
 
+3.a. **Runtime** → **Change runtime type**
 
-> ⚡ **Tip:** If Colab disconnects, re-enable TPU and re-run setup cells. Save checkpoints to GCS or Drive.
+3.b. Select your desired **TPU** under **Hardware accelerator**
 
-> ⚡ **Tip:** If Colab asks to restart session - do it and continue to run cells
+3.c. Click **Save**
+
+### Step 4: Run the Notebook
+Follow the instructions within the notebook cells to install dependencies and run the training/inference.
 
 ## Method 2: Local Jupyter Lab with TPU
 
-This method gives you more control and is better for long training runs.
+You can run Python notebooks on a local JupyterLab environment, giving you full control over your computing resources.
 
 ### Step 1: Set Up TPU VM
 
 In Google Cloud Console:
 
-1. **Compute Engine** → **TPU** → **Create TPU Node**
-2. Example config:
+1.a. **Compute Engine** → **TPU** → **Create TPU**
+
+1.b. Example config:
    - **Name:** `maxtext-tpu-node`
-   - **TPU type:** `v5e-8` (or `v6p-8` for newer hardware)
-   - **Runtime Version:** `tpu-ubuntu-alpha-*` (matches your VM image)
+   - **TPU type:** Choose your desired TPU type
+   - **Runtime Version:** `tpu-ubuntu2204-base` (or other compatible runtime)
 
-### Step 2: Connect to TPU VM
+### Step 2: Connect with Port Forwarding
+Run the following command on your local machine:
+> **Note**: The `--` separator before the `-L` flag is required. This tunnels the remote port 8888 to your local machine securely.
 
 ```bash
-gcloud compute tpus tpu-vm ssh maxtext-tpu-node --zone=YOUR_ZONE
+gcloud compute tpus tpu-vm ssh maxtext-tpu-node --zone=YOUR_ZONE -- -L 8888:localhost:8888
 ```
 
+> **Note**: If you get a "bind: Address already in use" error, it means port 8888 is busy on your local computer. Change the first number to a different port, e.g., -L 9999:localhost:8888. You will then access Jupyter at localhost:9999.
+
 ### Step 3: Install Dependencies
 
+Run the following commands on your TPU-VM:
+
 ```bash
 sudo apt update && sudo apt upgrade -y
 sudo apt install python3-pip python3-dev git -y
@@ -100,23 +104,15 @@ pip3 install jupyterlab
 jupyter lab --ip=0.0.0.0 --port=8888 --no-browser --allow-root
 ```
 
-Copy the URL with token from terminal
+### Step 5: Access the Notebook
+5.a. Look at the terminal output for a URL that looks like: `http://127.0.0.1:8888/lab?token=...`
 
-### Step 5: Secure Access
+5.b. Copy that URL.
 
-#### Option A: SSH Tunnel (Recommended)
+5.c. Paste it into your **local computer's browser**.
+   * **Important:** If you changed the port in Step 2 (e.g., to `9999`), you must manually replace `8888` in the URL with `9999`.
+   * *Example:* `http://127.0.0.1:9999/lab?token=...`
 
-```bash
-gcloud compute tpus tpu-vm ssh maxtext-tpu-node --zone=YOUR_ZONE -- -L 8888:localhost:8888
-```
-
-Then open → `http://localhost:8888`
-
-
-## Method 3: Colab + Local Jupyter Lab Hybrid
-
-Set up Jupyter Lab as in step 2.
-Use the link for Jupyter Lab as a link for "Connect to a local runtime" in Collab - at the dropdown where you select the runtime.
 
 ## Available Examples
 
@@ -127,18 +123,7 @@ Use the link for Jupyter Lab as a link for "Connect to a local runtime" in Colla
 
 ### Reinforcement Learning (GRPO/GSPO) Training
 
-- **`rl_llama3_demo.ipynb`** → GRPO/GSPO training on math dataset (Colab/notebook)
-
-#### GRPO/GSPO Colab Usage
-
-For interactive GRPO or GSPO training in Google Colab or Jupyter:
-
-1. **Open** `src/MaxText/examples/rl_llama3_demo.ipynb`
-2. **Enable TPU runtime** (Runtime → Change runtime type → TPU)
-3. **Set `LOSS_ALGO`** to `"grpo"` for GRPO or `"gspo-token"` for GSPO
-4. **Run cells** to train Llama3.1-8B with GRPO or GSPO on GSM8K dataset
-
-> **Note:** GRPO (Group Relative Policy Optimization) optimizes each token, while GSPO (Group Sequence Policy Optimization) optimizes the whole sequence. The difference is controlled by the `loss_algo` parameter.
+- **`rl_llama3_demo.ipynb`** → GRPO/GSPO training on [OpenAI's GSM8K dataset](https://huggingface.co/datasets/openai/gsm8k)
 
 #### GRPO/GSPO Python Script Usage - local runs
 
@@ -176,22 +161,22 @@ python3 -m src.MaxText.rl.train_rl src/MaxText/configs/rl.yml \
 
 #### GRPO/GSPO Python Script Usage - cluster runs
 
-For running on clusters, please refer to `maxtext/docs/tutorials/grpo_with_pathways.md`
+For running on clusters, please refer to [Reinforcement Learning on multi-most TPUs](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/rl_on_multi_host.html)
 
 
 ## Common Pitfalls & Debugging
 
 | Issue | Solution |
 |-------|----------|
-| ❌ TPU runtime mismatch | Check TPU runtime version matches VM image (`tpu-ubuntu-alpha-*`) |
+| ❌ TPU runtime mismatch | Check TPU runtime version matches VM image |
 | ❌ Colab disconnects | Save checkpoints to GCS or Drive regularly |
 | ❌ "RESOURCE_EXHAUSTED" errors | Use smaller batch size or v5e-8 instead of v5e-1 |
 | ❌ Firewall blocked | Ensure port 8888 open, or always use SSH tunneling |
 | ❌ Path confusion | In Colab use `/content/maxtext`; in TPU VM use `~/maxtext` |
 
 ## Support and Resources
 
-- 📘 [MaxText Documentation](https://github.com/AI-Hypercomputer/maxtext)
+- 📘 [MaxText Documentation](https://maxtext.readthedocs.io/)
 - 💻 [Google Colab](https://colab.research.google.com)
 - ⚡ [Cloud TPU Docs](https://cloud.google.com/tpu/docs)
 - 🧩 [Jupyter Lab](https://jupyterlab.readthedocs.io)
diff --git a/docs/tutorials/post_training_index.md b/docs/tutorials/post_training_index.md
@@ -49,14 +49,13 @@ Pathways supercharges RL with:
 
 ## Getting started
 
-Start your Post-Training journey through quick experimentation with our [Google Colabs](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/how_to_run_colabs.html) or our Production level tutorials for [SFT](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/sft_on_multi_host.html) and [RL](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/rl_on_multi_host.html).
+Start your Post-Training journey through quick experimentation with [Python Notebooks](https://maxtext.readthedocs.io/en/latest/guides/run_python_notebook.html) or our Production level tutorials for [SFT](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/sft_on_multi_host.html) and [RL](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/rl_on_multi_host.html).
 
 ## More tutorials
 
 ```{toctree}
 :maxdepth: 1
 
-posttraining/how_to_run_colabs.md
 posttraining/sft.md
 posttraining/sft_on_multi_host.md
 posttraining/rl.md