Skip to content

Commit

Permalink
Resolve lint issues in T5X.
Browse files Browse the repository at this point in the history
Contains no functional changes.

PiperOrigin-RevId: 595429601
  • Loading branch information
texasmichelle authored and t5-copybara committed Jan 3, 2024
1 parent dbc4b6f commit 18f74b7
Show file tree
Hide file tree
Showing 71 changed files with 5,158 additions and 3,470 deletions.
2 changes: 2 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
list see the documentation:
https://www.sphinx-doc.org/en/master/usage/configuration.html
"""

# pylint:disable=all
# -- Path setup --------------------------------------------------------------

Expand All @@ -27,6 +28,7 @@
#
import os
import sys

sys.path.insert(0, os.path.abspath('..'))

# patch sphinx
Expand Down
1 change: 1 addition & 0 deletions docs/conf_sphinx_patch.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
import sphinx.ext.autodoc
import sphinx.ext.autosummary.generate as ag


# pylint:disable=all
def generate_autosummary_content(
name: str,
Expand Down
69 changes: 34 additions & 35 deletions docs/models.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,23 +131,26 @@ mT5 XXL | [mt5/xxl.gin](https://github.com/google-research/t5x/blob/main/t5x/e

#### mT5 LM-Adapted Checkpoints

These are the checkpoints released as part of the [zero-shot cross-lingual generation (XGen) paper](https://arxiv.org/abs/2205.12647).
These are the checkpoints released as part of the
[zero-shot cross-lingual generation (XGen) paper](https://arxiv.org/abs/2205.12647).

These "LM-adapted" models are initialized from mT5 (above) and trained for an
additional 100K steps on the LM objective discussed in the [T5 paper](https://arxiv.org/abs/1910.10683).
additional 100K steps on the LM objective discussed in the
[T5 paper](https://arxiv.org/abs/1910.10683).

This adaptation improves the ability of the model to be used for [prompt tuning](https://arxiv.org/abs/2104.08691).
This adaptation improves the ability of the model to be used for
[prompt tuning](https://arxiv.org/abs/2104.08691).

**Vocabulary:**
[mc4.250000.100extra](https://console.cloud.google.com/storage/browser/t5-data/vocabs/mc4.250000.100extra)

Model | Gin File Location | Checkpoint Location
-------------------- | ------------------------------------------------------------------------------------------------------------------- | -------------------
Model | Gin File Location | Checkpoint Location
-------------------- | ---------------------------------------------------------------------------- | -------------------
mT5 LM-Adapted Small | [mt5/small.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/mt5/small.gin) | [mt5_lm_adapted/small/checkpoint_1100000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/mt5_lm_adapted/small/checkpoint_1100000)
mT5 LM-Adapted Base | [mt5/base.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/mt5/base.gin) | [mt5_lm_adapted/base/checkpoint_1100000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/mt5_lm_adapted/base/checkpoint_1100000)
mT5 LM-Adapted Base | [mt5/base.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/mt5/base.gin) | [mt5_lm_adapted/base/checkpoint_1100000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/mt5_lm_adapted/base/checkpoint_1100000)
mT5 LM-Adapted Large | [mt5/large.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/mt5/large.gin) | [mt5_lm_adapted/large/checkpoint_1100000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/mt5_lm_adapted/large/checkpoint_1100000)
mT5 LM-Adapted XL | [mt5/xl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/mt5/xl.gin) | [mt5_lm_adapted/xl/checkpoint_1100000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/mt5_lm_adapted/xl/checkpoint_1100000)
mT5 LM-Adapted XXL | [mt5/xxl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/mt5/xxl.gin) | [mt5_lm_adapted/xxl/checkpoint_1100000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/mt5_lm_adapted/xxl/checkpoint_1100000)
mT5 LM-Adapted XL | [mt5/xl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/mt5/xl.gin) | [mt5_lm_adapted/xl/checkpoint_1100000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/mt5_lm_adapted/xl/checkpoint_1100000)
mT5 LM-Adapted XXL | [mt5/xxl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/mt5/xxl.gin) | [mt5_lm_adapted/xxl/checkpoint_1100000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/mt5_lm_adapted/xxl/checkpoint_1100000)

#### umT5 Checkpoints

Expand All @@ -159,20 +162,19 @@ to 13B parameters, trained on the mC4 corpus using a denoising objective. umT5
is trained on a fresher version of the mC4 corpus (3.1.0), and with a more
uniform language balancing strategy.

**Vocabulary:**
[umt5.256000](https://console.cloud.google.com/storage/browser/t5-data/vocabs/umt5.256000)
**Vocabulary:** [umt5.256000](https://console.cloud.google.com/storage/browser/t5-data/vocabs/umt5.256000)

Model | Gin File Location | Checkpoint Location
-------------------- | ------------------------------------------------------------------------------------------------------------------- | -------------------
Model | Gin File Location | Checkpoint Location
---------- | --------------------------------------------------------------------------------------------------------- | -------------------
umT5 Small | [umt5/pretrain_small.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/scalable_t5/umt5/pretrain_small.gin) | [umt5/small/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/umt5/small/checkpoint_1000000)
umT5 Base | [umt5/pretrain_base.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/scalable_t5/umt5/pretrain_base.gin) | [umt5/base/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/umt5/base/checkpoint_1000000)
umT5 XL | [umt5/pretrain_xl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/scalable_t5/umt5/pretrain_xl.gin) | [umt5/xl/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/umt5/xl/checkpoint_1000000)
umT5 XXL | [umt5/pretrain_xxl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/scalable_t5/umt5/pretrain_xxl.gin) | [umt5/xxl/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/umt5/xxl/checkpoint_1000000)
umT5 Base | [umt5/pretrain_base.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/scalable_t5/umt5/pretrain_base.gin) | [umt5/base/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/umt5/base/checkpoint_1000000)
umT5 XL | [umt5/pretrain_xl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/scalable_t5/umt5/pretrain_xl.gin) | [umt5/xl/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/umt5/xl/checkpoint_1000000)
umT5 XXL | [umt5/pretrain_xxl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/scalable_t5/umt5/pretrain_xxl.gin) | [umt5/xxl/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/umt5/xxl/checkpoint_1000000)

#### ByT5 Checkpoints

These are the checkpoints used in the paper [ByT5: Towards a Token-Free Future
with Pre-trained Byte-to-Byte Models](https://aclanthology.org/2022.tacl-1.17/).
These are the checkpoints used in the paper
[ByT5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models](https://aclanthology.org/2022.tacl-1.17/).
They are similar to mT5 (above), but are "token-free", processing text as raw
UTF-8 bytes, as opposed to using a pretrained subword vocabulary. These models
are more robust to character-level noise, and outperform parameter-matched mT5
Expand All @@ -182,8 +184,8 @@ pronunciation, or morphology. However inference is significantly slower, up to

**Vocabulary:** None

Model | Gin File Location | Checkpoint Location
--------- | ---------------------------------------------------------------------------- | -------------------
Model | Gin File Location | Checkpoint Location
---------- | ------------------------------------------------------------------------------ | -------------------
ByT5 Small | [byt5/small.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/byt5/small.gin) | [gs://t5-data/pretrained_models/t5x/byt5_small/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/byt5_small)
ByT5 Base | [byt5/base.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/byt5/base.gin) | [gs://t5-data/pretrained_models/t5x/byt5_base/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/byt5_base)
ByT5 Large | [byt5/large.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/byt5/large.gin) | [gs://t5-data/pretrained_models/t5x/byt5_large/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/byt5_large)
Expand Down Expand Up @@ -229,7 +231,6 @@ LongT5 Base | [longt5/models/longt5_1_1_transient_base.gin](https://github.com/
LongT5 Large | [longt5/models/longt5_1_1_transient_large.gin](https://github.com/google/flaxformer/tree/main/flaxformer/t5x/configs/longt5/models/longt5_1_1_transient_global_large.gin) | [gs://t5-data/pretrained_models/t5x/longt5/tglobal_large/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/longt5/tglobal_large)
LongT5 XL | [longt5/models/longt5_1_1_transient_xl.gin](https://github.com/google/flaxformer/tree/main/flaxformer/t5x/configs/longt5/models/longt5_1_1_transient_global_xl.gin) | [gs://t5-data/pretrained_models/t5x/longt5/tglobal_xl/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/longt5/tglobal_xl)


#### Mixture of Experts (MoE) Checkpoints

These MoE checkpoints need to be used with T5X MoE overrides -- specifically,
Expand All @@ -239,12 +240,12 @@ the MoeTrainer and the MoePjitPartitioner. For example, for fine-tuning, use the

##### Converted Mesh Tensorflow checkpoints

These are the checkpoints from the
[Switch Transformer model](https://arxiv.org/abs/2101.03961).

**Vocabulary:**
[cc_all.32000.100extra](https://console.cloud.google.com/storage/browser/t5-data/vocabs/cc_all.32000.100extra)


Model | Gin File Location | Checkpoint Location
---------------------------------------- | ------------------------------------------------------------------------------------------------------------ | -------------------
Switch Transformer Base 8 Experts | [switch_base.gin](https://github.com/google/flaxformer/tree/main/flaxformer/t5x/configs/moe/models/switch_base.gin) | [gs://t5-data/pretrained_models/t5x/moe/switch_classic/base/e8/checkpoint_500100](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/moe/switch_classic/base/e8)
Expand All @@ -258,33 +259,31 @@ Switch Transformer XXL 128 Experts | [switch_xxl.gin](https://github.com/g
Switch Transformer C 2048 Experts (1.6T) | [switch_c.gin](https://github.com/google/flaxformer/tree/main/flaxformer/t5x/configs/moe/models/switch_c.gin) | [gs://t5-data/pretrained_models/t5x/moe/switch_classic/c/e2048/checkpoint_611800](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/moe/switch_classic/c/e2048)





#### Flan-T5 Checkpoints

These are the checkpoints released as part of the paper [Scaling
Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416). They were
initialized from the [T5 1.1 LM-Adapted](#t5-11-lm-adapted-checkpoints) and instruction-finetuned.
These are the checkpoints released as part of the paper
[Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416).
They were initialized from the
[T5 1.1 LM-Adapted](#t5-11-lm-adapted-checkpoints) and instruction-finetuned.

They significantly outperform the LM-adapted checkpoints. For example,
Flan-T5-XXL outperforms T5-LM-XXL by 26.6% absolute on the normalized average
score. It even outperforms a much larger PaLM 62B model on [BigBench
Hard](https://arxiv.org/abs/2210.09261) a
set of challenging BigBench benchmark.
score. It even outperforms a much larger PaLM 62B model on
[BigBench Hard](https://arxiv.org/abs/2210.09261) a set of challenging BigBench
benchmark.

Unlike the vanilla T5 checkpoints, these can be directly used for
few-shot prompting as well as standard finetuning. See [Chung et al. 2022](https://arxiv.org/abs/2210.11416) for details.
Unlike the vanilla T5 checkpoints, these can be directly used for few-shot
prompting as well as standard finetuning. See
[Chung et al. 2022](https://arxiv.org/abs/2210.11416) for details.

Model | Gin File Location | Checkpoint Location
-------------------- | ------------------------------------------------------------------------------------------------------------------- | -------------------
Model | Gin File Location | Checkpoint Location
------------- | ---------------------------------------------------------------------------------- | -------------------
Flan-T5 Small | [t5_1_1/small.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/small.gin) | [gs://t5-data/pretrained_models/t5x/flan_t5_small/checkpoint_1198000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/flan_t5_small/checkpoint_1198000)
Flan-T5 Base | [t5_1_1/base.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/base.gin) | [gs://t5-data/pretrained_models/t5x/flan_t5_base/checkpoint_1184000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/flan_t5_base/checkpoint_1184000)
Flan-T5 Large | [t5_1_1_large.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/large.gin) | [gs://t5-data/pretrained_models/t5x/flan_t5_large/checkpoint_1164000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/flan_t5_large/checkpoint_1164000)
Flan-T5 XL | [t5_1_1_xl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/xl.gin) | [gs://t5-data/pretrained_models/t5x/flan_t5_xl/checkpoint_1138000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/flan_t5_xl/checkpoint_1138000)
Flan-T5 XXL | [t5_1_1_xxl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/xxl.gin) | [gs://t5-data/pretrained_models/t5x/flan_t5_xxl/checkpoint_1114000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/flan_t5_xxl/checkpoint_1114000)


#### UL2 Checkpoints

Checkpoints for 20B pretrained and FLAN-based instruction-tuned models using the
Expand Down
1 change: 1 addition & 0 deletions docs/tutorials.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# T5X Introductory Tutorial Series


## Overview

This series of guides is a self-contained introduction to T5X, a modular,
Expand Down
4 changes: 2 additions & 2 deletions docs/usage/auxiliary.md
Original file line number Diff line number Diff line change
Expand Up @@ -184,8 +184,8 @@ We outline a few common error patterns that we have encountered.
infer_eval job, it may be that even train_eval is too slow. In these
situations, we suggest adding the metrics from train_eval into the
`metrics_fn` argument of the SeqIO task and have them be computed in the
auxiliary job as well. To do this with teacher forcing, you will have to
use `train.py` instead of `eval.py`.
auxiliary job as well. To do this with teacher forcing, you will have to use
`train.py` instead of `eval.py`.

+ **Using `CHECKPOINT_PATH` rather `INITIAL_CHECKPOINT_PATH`.** For legacy
reasons, the auxiliary job uses the macro `INITIAL_CHECKPOINT_PATH` rather
Expand Down
4 changes: 2 additions & 2 deletions docs/usage/eval.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ defined with [SeqIO](https://github.com/google/seqio/blob/main/README.md).

Refer to this tutorial when you have an existing model that you want to
evaluate. If you would like to fine-tune your model before evaluation, please
refer to the [fine-tuning](finetune.md) tutorial. You can run evals as part of your
fine-tuning run as well.
refer to the [fine-tuning](finetune.md) tutorial. You can run evals as part of
your fine-tuning run as well.

## Overview

Expand Down
4 changes: 2 additions & 2 deletions docs/usage/gin.md
Original file line number Diff line number Diff line change
Expand Up @@ -299,8 +299,8 @@ two ways: gin files and override flags.
comma-separate list of search prefixes via `--gin_search_paths` and then
only specify the relative path to the `--gin_file` flags. However, we
strongly recommend against using `--gin_search_paths`. Using absolute paths
via the `--gin_file` flags will reduce sources of ambiguity and improve
the consistency of your scripts.
via the `--gin_file` flags will reduce sources of ambiguity and improve the
consistency of your scripts.

1. **Override Flags** Gin flags allow for more fine-grained overrides of any
configurable aspect of your run. These flags follow the single-line binding
Expand Down
10 changes: 5 additions & 5 deletions docs/usage/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@
## Introduction

T5X provides a flexible and customizable library for managing metrics. Metrics
in T5X rely on [CLU](https://github.com/google/CommonLoopUtils/blob/main/README.md), which broadly provides utilities for writing
training loops but specifically provides metric libraries that are extended by
T5X.
in T5X rely on [CLU](https://github.com/google/CommonLoopUtils/blob/main/README.md), which broadly provides utilities for
writing training loops but specifically provides metric libraries that are
extended by T5X.


NOTE: This document currently only applies to train and 'train_eval' metrics,
Expand All @@ -16,8 +16,8 @@ these three in the future.
## Metrics and Writers

CLU provides `Metric` and `MetricWriter` classes. Full details are provided in
[go/clu-metrics](https://github.com/google/CommonLoopUtils/blob/main/README.md-metrics), but a simplified summary will suffice for our
purposes.
[go/clu-metrics](https://github.com/google/CommonLoopUtils/blob/main/README.md-metrics), but a simplified summary will suffice
for our purposes.

[`clu.metrics.Metric`](https://github.com/google/CommonLoopUtils/tree/main/clu/metrics.py?q=symbol:%5CbMetric%5Cb)
provides an abstract interface for metrics. The interface can be simply
Expand Down
4 changes: 2 additions & 2 deletions docs/usage/pretrain.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,8 +190,8 @@ brief description of these configs:
A training run may consist of various randomized operations, e.g. dataset
shuffling, dropout, etc. However, it is often useful to have deterministic
training, meaning that the random operations are reproducible and robust to
preemption/restarts. To make your pretraining deterministic, in addition to
the params configured in `pretrain.gin`, you need to add the following configs:
preemption/restarts. To make your pretraining deterministic, in addition to the
params configured in `pretrain.gin`, you need to add the following configs:

+ sets the dataset seed to a fixed value: `train/utils.DatasetConfig.seed =
42`.
Expand Down
Loading

0 comments on commit 18f74b7

Please sign in to comment.