What makes timm vit_base_patch_16 achieve 86.006 vs 84.15 in original vit paper on Imagenet #847

jojo23333 · 2021-09-04T23:49:01Z

jojo23333
Sep 4, 2021

Hi, I just came across the VIT checkpoint used in this repo, as it is indicated in the results here:
https://github.com/rwightman/pytorch-image-models/blob/master/results/results-imagenet.csv

vit_base_patch16_384 achieves an 86.006 vs 84.15 reported in the paper Table 5:

There is a large gap between these two reported scores, I understand some of the improvements can be achieved by training with a longer scheme and better hyper-parameter choice. But since the +2 point boost is somewhat huge, so I wonder what could cause this huge performance boost. Is there some special technique applied in this? Where can I found the corresponding config file, checkpoint, and training details for the scores reported in the results-imagenet.csv?

Many thanks!

Answered by rwightman

Sep 5, 2021

@jojo23333 the checkpoints were updated with the best options from this paper that I was a part of, How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers -- https://arxiv.org/abs/2106.10270

More augmenation and regularization was used w/ the 21k pretraining, and a search over both those and the transfer hparams was performed

View full answer

rwightman · 2021-09-05T23:25:48Z

rwightman
Sep 5, 2021
Maintainer

@jojo23333 the checkpoints were updated with the best options from this paper that I was a part of, How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers -- https://arxiv.org/abs/2106.10270

More augmenation and regularization was used w/ the 21k pretraining, and a search over both those and the transfer hparams was performed

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

What makes timm vit_base_patch_16 achieve 86.006 vs 84.15 in original vit paper on Imagenet #847

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

What makes timm vit_base_patch_16 achieve 86.006 vs 84.15 in original vit paper on Imagenet #847

Uh oh!

jojo23333 Sep 4, 2021

Replies: 1 comment

Uh oh!

rwightman Sep 5, 2021 Maintainer

jojo23333
Sep 4, 2021

rwightman
Sep 5, 2021
Maintainer