Skip to content

Files

Latest commit

5df1ec3 · Jan 31, 2025

History

History
31 lines (22 loc) · 1.9 KB

Run_Gemma.md

File metadata and controls

31 lines (22 loc) · 1.9 KB

Gemma

Gemma is a family of lightweight, state-of-the art open models built from research and technology that we used to create the Gemini models.

Following the instructions at kaggle will let you download Gemma model weights. You will have to consent to license for Gemma using your kaggle account's API credentials.

After downloading the weights run convert_gemma_chkpt.py, which converts the checkpoint to be compatible with MaxText and uploads them to a GCS bucket. You can run decode and finetuning using instructions mentioned in the test scripts at end_to_end/tpu/gemma.

MaxText supports pretraining and finetuning with high performance

Model Flop utilization for training on v5e and v5p TPUs.

Model v5e-256 (bf16) v5p-128 (bf16) v5e-256 (int8) v5p-128 (int8)
Gemma-2b 58% 55% 64% 68%
Gemma-7b 58% 60% 70% 70%