NFNet training throughput #500
Replies: 2 comments 6 replies
-
@aljoschaleonhardt they are slower in PyTorch, especially with their official config for the dm variants, XLA optimizes away some aspects of the model that eager mode PyTorch doesn't do well with. For my nfnet_f0s on an RTX3090, trying to replicate their latencies I get 306.75 samples/sec, 3.17 ms/sample, 101.347 ms/step. That is at batch_size 32 (they say 32 per device for their latencies) and the F0 train time res of 192x192. I can hit 566 img/s training at batch size 256 and 192x192 res. The dm_nfnet_f0 is slower at 245.59 samples/sec, 3.96 ms/sample, 126.800 ms/step. I ran those on the PyTorch NGC 20.12 container. One thing I noticed is that the perf of these NFNet models, dm and my silu variants has significant variation across recent Pytorch/NGC versions and hardware. Definitely do not enable channels-last (if you were doing so). See a recent comparison I did that includes the f0s and my l0c light variant of nfnet. https://gist.github.com/rwightman/bb59f9e245162cee0e38bd66bd8cd77f |
Beta Was this translation helpful? Give feedback.
-
@rwightman, |
Beta Was this translation helpful? Give feedback.
-
First off, I'm super impressed by how quickly the NFNet-F* implementations and weights landed in
timm
. Absolutely fantastic work @rwightman :)I've been tinkering with NFNet-F0 on some typical workloads but can't reproduce anything close to the latency values described in the paper, Brock et al. (2021). Running a really bare-bones setup in Lightning, I can push about 110 images per second through the
timm
SiLU version (nfnet_f0s
) and a little under 100 throughdm_nfnet_f0
. Theefficientnet_b5
intimm
(which should roughly match ImageNet perf of NFNet-F0) gives me around 180 images/sec under the exact same conditions (V100, native PyTorch AMP, synthetic data).According to the paper, EN-B5 should be ~8x slower (measured by time per training step) than F0. @rwightman On your benchmarks, are you able to reproduce their JAX numbers -- at least approximately? Any idea where/if I'm on the completely wrong track here?
Thanks in advance for everyone's opinion!
Beta Was this translation helpful? Give feedback.
All reactions