Skip to content

Releases: huggingface/pytorch-image-models

Release v1.0.19

24 Jul 03:06
Compare
Choose a tag to compare

Patch release for Python 3.9 compat break in 1.0.18

July 23, 2025

  • Add set_input_size() method to EVA models, used by OpenCLIP 3.0.0 to allow resizing for timm based encoder models.
  • Release 1.0.18, needed for PE-Core S & T models in OpenCLIP 3.0.0
  • Fix small typing issue that broke Python 3.9 compat. 1.0.19 patch release.

July 21, 2025

  • ROPE support added to NaFlexViT. All models covered by the EVA base (eva.py) including EVA, EVA02, Meta PE ViT, timm SBB ViT w/ ROPE, and Naver ROPE-ViT can be now loaded in NaFlexViT when use_naflex=True passed at model creation time
  • More Meta PE ViT encoders added, including small/tiny variants, lang variants w/ tiling, and more spatial variants.
  • PatchDropout fixed with NaFlexViT and also w/ EVA models (regression after adding Naver ROPE-ViT)
  • Fix XY order with grid_indexing='xy', impacted non-square image use in 'xy' mode (only ROPE-ViT and PE impacted).

What's Changed

  • Add ROPE support to NaFlexVit (axial and mixed), and support most (all?) EVA based vit models & weights by @rwightman in #2552
  • Support set_input_size() in EVA models by @rwightman in #2554

Full Changelog: v1.0.17...v1.0.18

Release v1.0.18

23 Jul 20:03
Compare
Choose a tag to compare

July 23, 2025

  • Add set_input_size() method to EVA models, used by OpenCLIP 3.0.0 to allow resizing for timm based encoder models.
  • Release 1.0.18, needed for PE-Core S & T models in OpenCLIP 3.0.0

July 21, 2025

  • ROPE support added to NaFlexViT. All models covered by the EVA base (eva.py) including EVA, EVA02, Meta PE ViT, timm SBB ViT w/ ROPE, and Naver ROPE-ViT can be now loaded in NaFlexViT when use_naflex=True passed at model creation time
  • More Meta PE ViT encoders added, including small/tiny variants, lang variants w/ tiling, and more spatial variants.
  • PatchDropout fixed with NaFlexViT and also w/ EVA models (regression after adding Naver ROPE-ViT)
  • Fix XY order with grid_indexing='xy', impacted non-square image use in 'xy' mode (only ROPE-ViT and PE impacted).

What's Changed

  • Add ROPE support to NaFlexVit (axial and mixed), and support most (all?) EVA based vit models & weights by @rwightman in #2552
  • Support set_input_size() in EVA models by @rwightman in #2554

Full Changelog: v1.0.17...v1.0.18

Release v1.0.17

10 Jul 16:04
Compare
Choose a tag to compare

July 7, 2025

  • MobileNet-v5 backbone tweaks for improved Google Gemma 3n behaviour (to pair with updated official weights)
    • Add stem bias (zero'd in updated weights, compat break with old weights)
    • GELU -> GELU (tanh approx). A minor change to be closer to JAX
  • Add two arguments to layer-decay support, a min scale clamp and 'no optimization' scale threshold
  • Add 'Fp32' LayerNorm, RMSNorm, SimpleNorm variants that can be enabled to force computation of norm in float32
  • Some typing, argument cleanup for norm, norm+act layers done with above
  • Support Naver ROPE-ViT (https://github.com/naver-ai/rope-vit) in eva.py, add RotaryEmbeddingMixed module for mixed mode, weights on HuggingFace Hub
model img_size top1 top5 param_count
vit_large_patch16_rope_mixed_ape_224.naver_in1k 224 84.84 97.122 304.4
vit_large_patch16_rope_mixed_224.naver_in1k 224 84.828 97.116 304.2
vit_large_patch16_rope_ape_224.naver_in1k 224 84.65 97.154 304.37
vit_large_patch16_rope_224.naver_in1k 224 84.648 97.122 304.17
vit_base_patch16_rope_mixed_ape_224.naver_in1k 224 83.894 96.754 86.59
vit_base_patch16_rope_mixed_224.naver_in1k 224 83.804 96.712 86.44
vit_base_patch16_rope_ape_224.naver_in1k 224 83.782 96.61 86.59
vit_base_patch16_rope_224.naver_in1k 224 83.718 96.672 86.43
vit_small_patch16_rope_224.naver_in1k 224 81.23 95.022 21.98
vit_small_patch16_rope_mixed_224.naver_in1k 224 81.216 95.022 21.99
vit_small_patch16_rope_ape_224.naver_in1k 224 81.004 95.016 22.06
vit_small_patch16_rope_mixed_ape_224.naver_in1k 224 80.986 94.976 22.06
  • Some cleanup of ROPE modules, helpers, and FX tracing leaf registration
  • Preparing version 1.0.17 release

What's Changed

  • Adding Naver rope-vit compatibility to EVA ViT by @rwightman in #2529
  • Update no_grad usage to inference_mode if possible by @GuillaumeErhard in #2534
  • Add a min layer-decay scale clamp, and no optimization threshold to exclude groups from optimization by @rwightman in #2537
  • Add stem_bias option to MNV5. Resolve the norm layer so can pass string. by @rwightman in #2538
  • Add flag to enable float32 computation for normalization (norm + affine) by @rwightman in #2536
  • fix: mnv5 conv_stem bias and GELU with approximate=tanh by @RyanMullins in #2533
  • Fixup casting issues for weights/bias in fp32 norm layers by @rwightman in #2539
  • Fix H, W ordering for xy indexing in ROPE by @rwightman in #2541
  • Fix 3 typos in README.md by @robin-ede in #2544

New Contributors

Full Changelog: v1.0.16...v1.0.17

Release v1.0.16

26 Jun 18:44
7101adb
Compare
Choose a tag to compare

June 26, 2025

  • MobileNetV5 backbone (w/ encoder only variant) for Gemma 3n image encoder
  • Version 1.0.16 released

June 23, 2025

  • Add F.grid_sample based 2D and factorized pos embed resize to NaFlexViT. Faster when lots of different sizes (based on example by https://github.com/stas-sl).
  • Further speed up patch embed resample by replacing vmap with matmul (based on snippet by https://github.com/stas-sl).
  • Add 3 initial native aspect NaFlexViT checkpoints created while testing, ImageNet-1k and 3 different pos embed configs w/ same hparams.
Model Top-1 Acc Top-5 Acc Params (M) Eval Seq Len
naflexvit_base_patch16_par_gap.e300_s576_in1k 83.67 96.45 86.63 576
naflexvit_base_patch16_parfac_gap.e300_s576_in1k 83.63 96.41 86.46 576
naflexvit_base_patch16_gap.e300_s576_in1k 83.50 96.46 86.63 576
  • Support gradient checkpointing for forward_intermediates and fix some checkpointing bugs. Thanks https://github.com/brianhou0208
  • Add 'corrected weight decay' (https://arxiv.org/abs/2506.02285) as option to AdamW (legacy), Adopt, Kron, Adafactor (BV), Lamb, LaProp, Lion, NadamW, RmsPropTF, SGDW optimizers
  • Switch PE (perception encoder) ViT models to use native timm weights instead of remapping on the fly
  • Fix cuda stream bug in prefetch loader

June 5, 2025

  • Initial NaFlexVit model code. NaFlexVit is a Vision Transformer with:
    1. Encapsulated embedding and position encoding in a single module
    2. Support for nn.Linear patch embedding on pre-patchified (dictionary) inputs
    3. Support for NaFlex variable aspect, variable resolution (SigLip-2: https://arxiv.org/abs/2502.14786)
    4. Support for FlexiViT variable patch size (https://arxiv.org/abs/2212.08013)
    5. Support for NaViT fractional/factorized position embedding (https://arxiv.org/abs/2307.06304)
  • Existing vit models in vision_transformer.py can be loaded into the NaFlexVit model by adding the use_naflex=True flag to create_model
    • Some native weights coming soon
  • A full NaFlex data pipeline is available that allows training / fine-tuning / evaluating with variable aspect / size images
    • To enable in train.py and validate.py add the --naflex-loader arg, must be used with a NaFlexVit
  • To evaluate an existing (classic) ViT loaded in NaFlexVit model w/ NaFlex data pipe:
    • python validate.py /imagenet --amp -j 8 --model vit_base_patch16_224 --model-kwargs use_naflex=True --naflex-loader --naflex-max-seq-len 256
  • The training has some extra args features worth noting
    • The --naflex-train-seq-lens' argument specifies which sequence lengths to randomly pick from per batch during training
    • The --naflex-max-seq-len argument sets the target sequence length for validation
    • Adding --model-kwargs enable_patch_interpolator=True --naflex-patch-sizes 12 16 24 will enable random patch size selection per-batch w/ interpolation
    • The --naflex-loss-scale arg changes loss scaling mode per batch relative to the batch size, timm NaFlex loading changes the batch size for each seq len

May 28, 2025

What's Changed

New Contributors

Full Changelog: v1.0.15...v1.0.16

Release v1.0.15

23 Feb 05:07
Compare
Choose a tag to compare

Feb 21, 2025

  • SigLIP 2 ViT image encoders added (https://huggingface.co/collections/timm/siglip-2-67b8e72ba08b09dd97aecaf9)
    • Variable resolution / aspect NaFlex versions are a WIP
  • Add 'SO150M2' ViT weights trained with SBB recipes, great results, better for ImageNet than previous attempt w/ less training.
    • vit_so150m2_patch16_reg1_gap_448.sbb_e200_in12k_ft_in1k - 88.1% top-1
    • vit_so150m2_patch16_reg1_gap_384.sbb_e200_in12k_ft_in1k - 87.9% top-1
    • vit_so150m2_patch16_reg1_gap_256.sbb_e200_in12k_ft_in1k - 87.3% top-1
    • vit_so150m2_patch16_reg4_gap_256.sbb_e200_in12k
  • Updated InternViT-300M '2.5' weights
  • Release 1.0.15

Feb 1, 2025

  • FYI PyTorch 2.6 & Python 3.13 are tested and working w/ current main and released version of timm

Jan 27, 2025

What's Changed

New Contributors

Full Changelog: v1.0.14...v1.0.15

Release v1.0.14

19 Jan 23:05
Compare
Choose a tag to compare

Jan 19, 2025

  • Fix loading of LeViT safetensor weights, remove conversion code which should have been deactivated
  • Add 'SO150M' ViT weights trained with SBB recipes, decent results, but not optimal shape for ImageNet-12k/1k pretrain/ft
    • vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k_ft_in1k - 86.7% top-1
    • vit_so150m_patch16_reg4_gap_384.sbb_e250_in12k_ft_in1k - 87.4% top-1
    • vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k
  • Misc typing, typo, etc. cleanup
  • 1.0.14 release to get above LeViT fix out

What's Changed

New Contributors

Full Changelog: v1.0.13...v1.0.14

Release v1.0.13

09 Jan 18:49
Compare
Choose a tag to compare

Jan 9, 2025

  • Add support to train and validate in pure bfloat16 or float16
  • wandb project name arg added by https://github.com/caojiaolong, use arg.experiment for name
  • Fix old issue w/ checkpoint saving not working on filesystem w/o hard-link support (e.g. FUSE fs mounts)
  • 1.0.13 release

Jan 6, 2025

  • Add torch.utils.checkpoint.checkpoint() wrapper in timm.models that defaults use_reentrant=False, unless TIMM_REENTRANT_CKPT=1 is set in env.

Dec 31, 2024

What's Changed

New Contributors

Full Changelog: v1.0.12...v1.0.13

Release v1.0.12

03 Dec 19:05
Compare
Choose a tag to compare

Nov 28, 2024

Nov 12, 2024

  • Optimizer factory refactor
    • New factory works by registering optimizers using an OptimInfo dataclass w/ some key traits
    • Add list_optimizers, get_optimizer_class, get_optimizer_info to reworked create_optimizer_v2 fn to explore optimizers, get info or class
    • deprecate optim.optim_factory, move fns to optim/_optim_factory.py and optim/_param_groups.py and encourage import via timm.optim
  • Add Adopt (https://github.com/iShohei220/adopt) optimizer
  • Add 'Big Vision' variant of Adafactor (https://github.com/google-research/big_vision/blob/main/big_vision/optax.py) optimizer
  • Fix original Adafactor to pick better factorization dims for convolutions
  • Tweak LAMB optimizer with some improvements in torch.where functionality since original, refactor clipping a bit
  • dynamic img size support in vit, deit, eva improved to support resize from non-square patch grids, thanks https://github.com/wojtke

Oct 31, 2024

Add a set of new very well trained ResNet & ResNet-V2 18/34 (basic block) weights. See https://huggingface.co/blog/rwightman/resnet-trick-or-treat

Oct 19, 2024

  • Cleanup torch amp usage to avoid cuda specific calls, merge support for Ascend (NPU) devices from MengqingCao that should work now in PyTorch 2.5 w/ new device extension autoloading feature. Tested Intel Arc (XPU) in Pytorch 2.5 too and it (mostly) worked.

What's Changed

New Contributors

Full Changelog: v1.0.11...v1.0.12

v1.0.11 Release

16 Oct 21:19
Compare
Choose a tag to compare

Quick turnaround from 1.0.10 to fix an error impacting 3rd party packages that still import through a deprecated path that isn't tested.

Oct 16, 2024

Oct 14, 2024

  • Pre-activation (ResNetV2) version of 18/18d/34/34d ResNet model defs added by request (weights pending)
  • Release 1.0.10

Oct 11, 2024

  • MambaOut (https://github.com/yuweihao/MambaOut) model & weights added. A cheeky take on SSM vision models w/o the SSM (essentially ConvNeXt w/ gating). A mix of original weights + custom variations & weights.
model img_size top1 top5 param_count
mambaout_base_plus_rw.sw_e150_r384_in12k_ft_in1k 384 87.506 98.428 101.66
mambaout_base_plus_rw.sw_e150_in12k_ft_in1k 288 86.912 98.236 101.66
mambaout_base_plus_rw.sw_e150_in12k_ft_in1k 224 86.632 98.156 101.66
mambaout_base_tall_rw.sw_e500_in1k 288 84.974 97.332 86.48
mambaout_base_wide_rw.sw_e500_in1k 288 84.962 97.208 94.45
mambaout_base_short_rw.sw_e500_in1k 288 84.832 97.27 88.83
mambaout_base.in1k 288 84.72 96.93 84.81
mambaout_small_rw.sw_e450_in1k 288 84.598 97.098 48.5
mambaout_small.in1k 288 84.5 96.974 48.49
mambaout_base_wide_rw.sw_e500_in1k 224 84.454 96.864 94.45
mambaout_base_tall_rw.sw_e500_in1k 224 84.434 96.958 86.48
mambaout_base_short_rw.sw_e500_in1k 224 84.362 96.952 88.83
mambaout_base.in1k 224 84.168 96.68 84.81
mambaout_small.in1k 224 84.086 96.63 48.49
mambaout_small_rw.sw_e450_in1k 224 84.024 96.752 48.5
mambaout_tiny.in1k 288 83.448 96.538 26.55
mambaout_tiny.in1k 224 82.736 96.1 26.55
mambaout_kobe.in1k 288 81.054 95.718 9.14
mambaout_kobe.in1k 224 79.986 94.986 9.14
mambaout_femto.in1k 288 79.848 95.14 7.3
mambaout_femto.in1k 224 78.87 94.408 7.3

Sept 2024

Release v1.0.10

15 Oct 04:44
Compare
Choose a tag to compare

Oct 14, 2024

  • Pre-activation (ResNetV2) version of 18/18d/34/34d ResNet model defs added by request (weights pending)
  • Release 1.0.10

Oct 11, 2024

  • MambaOut (https://github.com/yuweihao/MambaOut) model & weights added. A cheeky take on SSM vision models w/o the SSM (essentially ConvNeXt w/ gating). A mix of original weights + custom variations & weights.
model img_size top1 top5 param_count
mambaout_base_plus_rw.sw_e150_r384_in12k_ft_in1k 384 87.506 98.428 101.66
mambaout_base_plus_rw.sw_e150_in12k_ft_in1k 288 86.912 98.236 101.66
mambaout_base_plus_rw.sw_e150_in12k_ft_in1k 224 86.632 98.156 101.66
mambaout_base_tall_rw.sw_e500_in1k 288 84.974 97.332 86.48
mambaout_base_wide_rw.sw_e500_in1k 288 84.962 97.208 94.45
mambaout_base_short_rw.sw_e500_in1k 288 84.832 97.27 88.83
mambaout_base.in1k 288 84.72 96.93 84.81
mambaout_small_rw.sw_e450_in1k 288 84.598 97.098 48.5
mambaout_small.in1k 288 84.5 96.974 48.49
mambaout_base_wide_rw.sw_e500_in1k 224 84.454 96.864 94.45
mambaout_base_tall_rw.sw_e500_in1k 224 84.434 96.958 86.48
mambaout_base_short_rw.sw_e500_in1k 224 84.362 96.952 88.83
mambaout_base.in1k 224 84.168 96.68 84.81
mambaout_small.in1k 224 84.086 96.63 48.49
mambaout_small_rw.sw_e450_in1k 224 84.024 96.752 48.5
mambaout_tiny.in1k 288 83.448 96.538 26.55
mambaout_tiny.in1k 224 82.736 96.1 26.55
mambaout_kobe.in1k 288 81.054 95.718 9.14
mambaout_kobe.in1k 224 79.986 94.986 9.14
mambaout_femto.in1k 288 79.848 95.14 7.3
mambaout_femto.in1k 224 78.87 94.408 7.3

Sept 2024