ResNet-a2
#1409
Replies: 2 comments 4 replies
-
When I use this command run on the dgx sever, the GPU utile is not 100% at the most time, GPU wait for CPU(load data). So it is very slow than just traing on my desktop (2 1080Ti). Do es anybody know why it is? @rwightman |
Beta Was this translation helpful? Give feedback.
4 replies
-
FYI the best cloud setup I've found for training is Lambda Labs GPU cloud, their 4 GPU A100 or A6000 instances have a decent number of CPUs and fast local SSD that's good enough for a standard imagenet (as files and folder) dataset. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
@rwightman
I'm currently reproducing ResNet50 using a2 procedure(ResNet strikes back: An improved training procedure in timm). I would like to ask whether the following instruction can reproduce ResNet50 using a2 procedure to train perfectly?
I used 4 Tesla V100 to train:
./distributed_train.sh 4 imagenet/ --model resnet50 --aa rand-m7-mstd0.5-inc1 --mixup .1 --cutmix 1.0 --aug-repeats 3 --remode pixel --reprob 0.0 --crop-pct 0.95 --drop-path .05 --smoothing 0.0 --bce-loss --bce-target-thresh 0.2 --opt lamb --weight-decay .02 --sched cosine --epochs 300 --warmup-epochs 5 --lr 5e-3 --warmup-lr 1e-4 -b 512 -j 16 --amp --channels-last --seed 42
Beta Was this translation helpful? Give feedback.
All reactions