Skip to content

Question about ViT models and their organization #524

Answered by rwightman
sayakpaul asked this question in Q&A
Discussion options

You must be logged in to vote

@sayakpaul

Models that...

  • start with vit_ and end with _in21k were trained on imagenet-21k and not fine-tuned, their classification heads were zero'd by google researchers so they don't work for 21k but can be fine tuned for other tasks (they have weights for the pre-logits that other models don't). They are always 224x224.
  • start with vit_ and have jx_ in the beginning of their weights were also trained by google and they are the ones that were pretrained on ImageNet-21k and fine-tuned on ImageNet-1k. They are 224 or 384.
  • start with deit_ are the FB trained models that were trained on ImageNet-1k w and w/o distillation (based on model name)
  • there is one model (my small variant) that was …

Replies: 2 comments 4 replies

Comment options

You must be logged in to vote
2 replies
@sayakpaul
Comment options

@NielsRogge
Comment options

Answer selected by sayakpaul
Comment options

You must be logged in to vote
2 replies
@rwightman
Comment options

@mwillwork
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
4 participants