Is it just me, or are the learning rate schedulers backwards? #918
Replies: 3 comments 1 reply
-
There is no error here. This is the correct mode of operation for warmup (red line). |
Beta Was this translation helpful? Give feedback.
-
I've always thought the same thing too, mash data in with a higher learning rate and apply the finishing touches with a lower rate. Something about DeepFaceLab and playing with that years ago... Cosine_annealing will get you the desired effect, with starting at a higher learning rate and declining to the specified final target rate. The inverse of that graph. I've been training faces with cosine annealing and the results are pretty good. I do think I've had better results by running sequential LRs on the same model with something like: 4e-6 - 10 epochs, 2e-6 - 40 epochs, 1e-6 - 70 epochs and finishing with 5e-7 - 30 epochs. (That's running a BS of 1 on a 768 model too) |
Beta Was this translation helpful? Give feedback.
-
Warmup is there to prevent early over fit, it's actually a good thing. |
Beta Was this translation helpful? Give feedback.
-
I set a constant with warmup to go over 200 steps from as the tooltip says "learning rate will start at 0 and increase over this many steps"
but surely you want to start with a more agressive learning rate and then over time use a finer and finer "brush" as it were.
i set a lr of 2e-5 and a warmup of 200 steps, it starts at 1e-7 and works its way backwards to 2e-5.
this seems counter-intuitive to me, as when doing TI training you start with a higher learning rate and then over time lower it as you pursue finer and finer details.
im going to do some testing with manually changing the learning rate for resumes to see what happens, but i think the schedulers are backwards
Beta Was this translation helpful? Give feedback.
All reactions