Swin-T training on ImageNet-1k --> does not converge #2468

RuslanGreenhead · 2025-04-11T08:50:15Z

RuslanGreenhead
Apr 11, 2025

Hello everyone!
I try to reproduce the original Swin Transformer paper results on ImageNet-1k classification. I use training configuration as stated in paper:
(batch_size=1024 (in my case --> 2 GPUs * 256 sampes each * 2 accumulation steps), AdamW, initial_lr=1e-3, weight_decay=0.05, grad_clip_norm=1.0, 300 epochs (first 20 - linear warmup, then - cosine decay), drop_path=0.2, other dropouts disabled)
But the model comes out on a plateau of about 35% val top-1 accuracy and does not converge further (train loss doesn't come down either)... Augmentations are same to the authors' ones
What can cause such a problem? And how can I fix it? Would be greatful for any piece of advice!

rwightman · 2025-04-11T14:06:51Z

rwightman
Apr 11, 2025
Maintainer

@RuslanGreenhead try turning off accumulation, not that well tested and probably not wortwhile here... and lr to 5e-4

1 reply

rwightman Apr 11, 2025
Maintainer

If it works without accum let me know, probably something to look at. Otherwise, possibly something going on with dataset... so detail on which copy of imagenet-1k is being used, how the folders are setup, etc needed. And I'm assuming using the train script here and not elsewhere...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Swin-T training on ImageNet-1k --> does not converge #2468

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Swin-T training on ImageNet-1k --> does not converge #2468

Uh oh!

RuslanGreenhead Apr 11, 2025

Replies: 1 comment · 1 reply

Uh oh!

rwightman Apr 11, 2025 Maintainer

Uh oh!

rwightman Apr 11, 2025 Maintainer

RuslanGreenhead
Apr 11, 2025

Replies: 1 comment 1 reply

rwightman
Apr 11, 2025
Maintainer

rwightman Apr 11, 2025
Maintainer