Skip to content

Commit

Permalink
Add airio multiprocessing options to t5x dataset config.
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 603539973
  • Loading branch information
gauravmishra authored and t5-copybara committed Feb 2, 2024
1 parent 4ff4291 commit cd94b76
Showing 1 changed file with 10 additions and 1 deletion.
11 changes: 10 additions & 1 deletion t5x/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -541,9 +541,18 @@ class DatasetConfig:
use_memory_cache: bool = True
# Whether to trim output features from tasks.
trim_output_features: bool = True
# AirIO-only: a list of runtime preprocessors to pass to airio. Generally used
### AirIO-only ###
# A list of runtime preprocessors to pass to airio. Generally used
# to configure feature converters and packing. Ignored for non-airio configs.
runtime_preprocessors: Sequence[Any] | None = None
# The number of threads reading from the data source in parallel. Passing None
# or 0 will use the default number of threads.
num_prefetch_threads: int | None = None
# Number of Python worker processes. More processes can speed up
# the pipeline if it's compute bound and bottlenecked on the CPython's GIL.
# 0 means no Python multiprocessing. All data loading and transformation
# will run in the main Python process.
num_workers: int | None = 0


def _hashed_index(x) -> int:
Expand Down

0 comments on commit cd94b76

Please sign in to comment.