Skip to content

Questions about Google Cloud Storage Usage in bits_and_tpu Branch #1297

Answered by rwightman
zw615 asked this question in Q&A
Discussion options

You must be logged in to vote

@zeyuwang615 you can specify a different batch size for validation, and the bits_and_tpu train script will automatically lower the number of workers for the validation dataset to reduce number of issues with higher parallelism for small val set. You should really only run into problems if you have a really small validation set.

You cannot use samplers with Iterable datasets, that's a limitation of the approach, each dataloader worker is completely independant for iterable datasets. So yes, repeat aug doesn't work. I tried doing repeat aug at a local (per worker) level but it didn't have the same impact (repeating within the same batch (each worker generates it's own batches) vs repeating …

Replies: 3 comments 2 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by zw615
Comment options

You must be logged in to vote
2 replies
@rwightman
Comment options

@Dreamer312
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants