ViT training in TPU not improving #1511

mayukh18 · 2022-10-23T20:59:02Z

mayukh18
Oct 23, 2022

I have been trying out the training for ViT Base on imagenet-1k on a TPU v3-8. Somehow my model's Top-1 accuracy falls down to ~0.1 after 20 epochs and doesn't show any improvement anymore. It reaches a peak of 0.2 to 0.3 in those early epochs and then falls off and stays mostly constant. I am not sure if this is some kind of overfitting and is there something I am doing wrong?

I have closely followed the README in bits_and_tpu branch and also tried different variations of the hyperparameters. Below is kind of the median of the hyperparameters I tried.
python3 launch_xla.py --num-devices 8 train.py /imagenet/path/ --model vit_base_patch16_224 --opt adamw --opt-eps 1e-6 --clip-grad 1.0 --drop-path 0.2 --mixup 0.8 --cutmix 1.0 --aa rand-m6-n4-mstd1.0-inc1 --weight-decay .08 --model-ema --model-ema-decay 0.999 --sched cosine -j 4 --warmup-lr 1e-6 --warmup-epochs 10 --epochs 100 --lr 5e-4 -b 128.
It did seem that with more warmup_epochs the falloff is delayed more. I have tried the 5-20 range.

Also I need to mention that I am using persistent disks, don't think that makes a difference though.

Any help is appreciated. Thanks.

rwightman · 2022-10-24T16:45:57Z

rwightman
Oct 24, 2022
Maintainer

@mayukh18 have you tried taking the ema out? and are sure it's not the delayed EMA?

Also, are you haven't forced any bfloat16 mode? that doesn't work well.... otherwise the hparams look reasonable

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ViT training in TPU not improving #1511

{{title}}

Replies: 1 comment

{{title}}

Select a reply

ViT training in TPU not improving #1511

mayukh18 Oct 23, 2022

Replies: 1 comment

rwightman Oct 24, 2022 Maintainer

mayukh18
Oct 23, 2022

rwightman
Oct 24, 2022
Maintainer