-
Hi again, I've been looking on compressing some of the models in the repo; for this, one usually needs the augmentation hyper-parameters used for training or fine-tuning the model (so the input is the same), and the LR sequence (since fine-tuning usually starts at a slightly higher LR). I was wondering if
Many thanks, |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
I'm not sure when/if I'll end up reporting all pretraining hparams with consistency. It is extra overhead and more importantly it will possibly have a big compat break at some point when I change the config system for the future timm bits code. There will be a dump of quite a few recent hparam sets and coverage of several different strategies or model specific procedures involving a few different optimizer + lr schedule combos. This will be timed with an upcoming paper. |
Beta Was this translation helpful? Give feedback.
There is research to suggest a relationship between (pre)training hyper params, augmentations, etc and how well the weights transfer to various tasks. Usually the fine-tuning aug + reg are held constant in these analysis though. One of the most extensive set of experiments here were for the ViT architectures in
How to train your ViT?
paper that I was involved with, there is a big spreadsheet with ~50k transfer weights (https://console.cloud.google.com/storage/browser/_details/vit_models/augreg/index.csv) with hparams for each pre-training weight listed, in1k vs in21k for pretraining, and a LR sweep for each transfer weights (but aug was low for transfer and fixed). I'd say it's a fairly…