You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/guides/run_python_notebook.md
+55-70Lines changed: 55 additions & 70 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,13 +1,12 @@
1
-
# Connect notebooks to TPUs
1
+
# Run MaxText Python Notebooks on TPUs
2
2
3
-
This guide provides comprehensive instructions for setting up Jupyter Lab on TPU and connecting it to Google Colab for running MaxText examples.
3
+
This guide provides clear, step-by-step instructions for getting started with python notebooks on the two most popular platforms: Google Colab and a local JupyterLab environment.
4
4
5
5
## 📑 Table of Contents
6
6
7
7
-[Prerequisites](#prerequisites)
8
-
-[Method 1: Google Colab with TPU (Recommended)](#method-1-google-colab-with-tpu-recommended)
8
+
-[Method 1: Google Colab with TPU](#method-1-google-colab-with-tpu)
9
9
-[Method 2: Local Jupyter Lab with TPU](#method-2-local-jupyter-lab-with-tpu)
10
-
-[Method 3: Colab + Local Jupyter Lab Hybrid](#method-3-colab--local-jupyter-lab-hybrid)
@@ -17,77 +16,82 @@ This guide provides comprehensive instructions for setting up Jupyter Lab on TPU
17
16
18
17
Before starting, make sure you have:
19
18
19
+
- ✅ Basic familiarity with Jupyter, Python, and Git
20
+
21
+
**For Method 2 (Local Jupyter Lab) only:**
20
22
- ✅ A Google Cloud Platform (GCP) account with billing enabled
21
23
- ✅ TPU quota available in your region (check under IAM & Admin → Quotas)
22
-
- ✅ Basic familiarity with Jupyter, Python, and Git
23
-
- ✅ gcloud CLI installed locally if you plan to use Method 2 or 3
24
+
- ✅ `tpu.nodes.create` permission to create a TPU VM
25
+
- ✅ gcloud CLI installed locally
24
26
- ✅ Firewall rules open for port 8888 (Jupyter) if accessing directly
25
27
26
-
## Method 1: Google Colab with TPU (Recommended)
28
+
## Method 1: Google Colab with TPU
27
29
28
-
This is the fastest way to run MaxText without managing infrastructure.
30
+
This is the fastest way to run MaxText python notebooks without managing infrastructure.
29
31
30
-
### Step 1: Open Google Colab
32
+
**⚠️ IMPORTANT NOTE ⚠️**
33
+
The free tier of Google Colab provides access to `v5e-1 TPU`, but this access is not guaranteed and is subject to availability and usage limits.
31
34
32
-
1. Go to [Google Colab](https://colab.research.google.com/)
33
-
2. Sign in → New Notebook
35
+
Before proceeding, please verify that the specific notebook you are running works reliably on the free-tier TPU resources. If you encounter frequent disconnections or resource limitations, you may need to:
34
36
35
-
### Step 2: Enable TPU Runtime
37
+
* Upgrade to a Colab Pro or Pro+ subscription for more stable and powerful TPU access.
36
38
37
-
1.**Runtime** → **Change runtime type**
38
-
2. Set **Hardware accelerator** → **TPU**
39
-
3. Select TPU version:
40
-
-**v5e-8** → recommended for most MaxText examples, but it's a paid option
41
-
-**v5e-1** → free tier option (slower, but works for Qwen-0.6B demos)
42
-
4. Click **Save**
39
+
* Move to local Jupyter Lab setup method with access to a powerful TPU machine.
43
40
44
-
### Step 3: Upload & Prepare MaxText
41
+
### Step 1: Choose an Example
42
+
1.a. Visit the [MaxText examples directory](https://github.com/AI-Hypercomputer/maxtext/tree/main/src/MaxText/examples) on Github.
45
43
46
-
Upload notebooks or mount your GitHub repo
44
+
1.b. Find the notebook you want to run (e.g., `sft_qwen3_demo.ipynb`) and copy its URL.
47
45
48
-
> **Note:** In Colab, the repo root will usually be `/content/maxtext`
46
+
### Step 2: Import into Colab
47
+
2.a. Go to [Google Colab](https://colab.research.google.com/) and sign in.
> **Note**: If you get a "bind: Address already in use" error, it means port 8888 is busy on your local computer. Change the first number to a different port, e.g., -L 9999:localhost:8888. You will then access Jupyter at localhost:9999.
2.**Enable TPU runtime** (Runtime → Change runtime type → TPU)
138
-
3.**Set `LOSS_ALGO`** to `"grpo"` for GRPO or `"gspo-token"` for GSPO
139
-
4.**Run cells** to train Llama3.1-8B with GRPO or GSPO on GSM8K dataset
140
-
141
-
> **Note:** GRPO (Group Relative Policy Optimization) optimizes each token, while GSPO (Group Sequence Policy Optimization) optimizes the whole sequence. The difference is controlled by the `loss_algo` parameter.
126
+
-**`rl_llama3_demo.ipynb`** → GRPO/GSPO training on [OpenAI's GSM8K dataset](https://huggingface.co/datasets/openai/gsm8k)
For running on clusters, please refer to `maxtext/docs/tutorials/grpo_with_pathways.md`
164
+
For running on clusters, please refer to [Reinforcement Learning on multi-most TPUs](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/rl_on_multi_host.html)
180
165
181
166
182
167
## Common Pitfalls & Debugging
183
168
184
169
| Issue | Solution |
185
170
|-------|----------|
186
-
| ❌ TPU runtime mismatch | Check TPU runtime version matches VM image (`tpu-ubuntu-alpha-*`) |
171
+
| ❌ TPU runtime mismatch | Check TPU runtime version matches VM image |
187
172
| ❌ Colab disconnects | Save checkpoints to GCS or Drive regularly |
188
173
| ❌ "RESOURCE_EXHAUSTED" errors | Use smaller batch size or v5e-8 instead of v5e-1 |
189
174
| ❌ Firewall blocked | Ensure port 8888 open, or always use SSH tunneling |
190
175
| ❌ Path confusion | In Colab use `/content/maxtext`; in TPU VM use `~/maxtext`|
Start your Post-Training journey through quick experimentation with our [Google Colabs](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/how_to_run_colabs.html) or our Production level tutorials for [SFT](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/sft_on_multi_host.html) and [RL](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/rl_on_multi_host.html).
52
+
Start your Post-Training journey through quick experimentation with [Python Notebooks](https://maxtext.readthedocs.io/en/latest/guides/run_python_notebook.html) or our Production level tutorials for [SFT](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/sft_on_multi_host.html) and [RL](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/rl_on_multi_host.html).
0 commit comments