Is it just me, or are the learning rate schedulers backwards? #918

CaptnSeraph · 2023-02-08T00:16:13Z

CaptnSeraph
Feb 8, 2023

I set a constant with warmup to go over 200 steps from as the tooltip says "learning rate will start at 0 and increase over this many steps"

but surely you want to start with a more agressive learning rate and then over time use a finer and finer "brush" as it were.

i set a lr of 2e-5 and a warmup of 200 steps, it starts at 1e-7 and works its way backwards to 2e-5.

this seems counter-intuitive to me, as when doing TI training you start with a higher learning rate and then over time lower it as you pursue finer and finer details.

im going to do some testing with manually changing the learning rate for resumes to see what happens, but i think the schedulers are backwards

cerega66 · 2023-02-08T06:14:10Z

cerega66
Feb 8, 2023

There is no error here. This is the correct mode of operation for warmup (red line).

If you want to have a falloff during the training use either linear or cosine. Or any other custom scheduler.

0 replies

4joeknight4 · 2023-02-09T15:17:55Z

4joeknight4
Feb 9, 2023

I've always thought the same thing too, mash data in with a higher learning rate and apply the finishing touches with a lower rate. Something about DeepFaceLab and playing with that years ago... Cosine_annealing will get you the desired effect, with starting at a higher learning rate and declining to the specified final target rate. The inverse of that graph. I've been training faces with cosine annealing and the results are pretty good. I do think I've had better results by running sequential LRs on the same model with something like: 4e-6 - 10 epochs, 2e-6 - 40 epochs, 1e-6 - 70 epochs and finishing with 5e-7 - 30 epochs. (That's running a BS of 1 on a 768 model too)

0 replies

Zuxier · 2023-02-09T20:48:02Z

Zuxier
Feb 9, 2023
Collaborator

Warmup is there to prevent early over fit, it's actually a good thing.

1 reply

morphinapg Feb 10, 2023

How should warmup be used, like what values? Should it be based on number of images?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it just me, or are the learning rate schedulers backwards? #918

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Is it just me, or are the learning rate schedulers backwards? #918

CaptnSeraph Feb 8, 2023

Replies: 3 comments · 1 reply

cerega66 Feb 8, 2023

4joeknight4 Feb 9, 2023

Zuxier Feb 9, 2023 Collaborator

morphinapg Feb 10, 2023

CaptnSeraph
Feb 8, 2023

Replies: 3 comments 1 reply

cerega66
Feb 8, 2023

4joeknight4
Feb 9, 2023

Zuxier
Feb 9, 2023
Collaborator