Swin-T training on ImageNet-1k --> does not converge #2468
Unanswered
RuslanGreenhead
asked this question in
Q&A
Replies: 1 comment 1 reply
-
@RuslanGreenhead try turning off accumulation, not that well tested and probably not wortwhile here... and lr to 5e-4 |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello everyone!
I try to reproduce the original Swin Transformer paper results on ImageNet-1k classification. I use training configuration as stated in paper:
(batch_size=1024 (in my case --> 2 GPUs * 256 sampes each * 2 accumulation steps), AdamW, initial_lr=1e-3, weight_decay=0.05, grad_clip_norm=1.0, 300 epochs (first 20 - linear warmup, then - cosine decay), drop_path=0.2, other dropouts disabled)
But the model comes out on a plateau of about 35% val top-1 accuracy and does not converge further (train loss doesn't come down either)... Augmentations are same to the authors' ones
What can cause such a problem? And how can I fix it? Would be greatful for any piece of advice!
Beta Was this translation helpful? Give feedback.
All reactions