Is anybody successfully training 2.1 768 based models on a 12GB card? #1161
Replies: 3 comments 2 replies
-
Yes, but only of full fp16 training in kohya_ss implementation of Dreambooth. But I dont know how much image quality is worse comparing to mixed precision training. |
Beta Was this translation helpful? Give feedback.
-
Ok, just tried, and looks the training started on RTX3060-12G. But as I mentioned, this GPU is not connected to a monitor as my system has several GPUs. These are the settings I tried: Preparing Dataset (With Caching) The usage is 11,735Mb, looks I have still a little bit of reserve: C:\Users\user>nvidia-smi |
Beta Was this translation helpful? Give feedback.
-
BTW, the training is extremely long, not only timewise, but also if we count epochs. I train a cartoon character, and at 200 epochs it is still a total mess, at about 500 it is still quite different from original, but decent. So, it looks it will take up to 1000 epochs to train, what is hours of GPU time. |
Beta Was this translation helpful? Give feedback.
-
I can get training working with Lora although I haven't had great results with the different settings I've tried. Ideally I'd prefer to train a full model properly, but I've tried turning on every memory saving feature I can, including disabling text encoder training and even disabling previews, and it still runs out of memory after about 26 steps no matter what I do.
Perhaps I'm missing something. Anybody have advice for training a 2.1 768 based model on a 12GB card, or perhaps have recommendations for Lora settings? Since I wanted to get as close to training a full model as possible, I increased my Lora ranks to max, and saving it as a checkpoint when training is done, and while there is some resemblance to my samples, the results I'm getting here overall are still nowhere near as good as what I get when I train a 1.5 based model without Lora. I've tried higher learning rates and lower learning rates. The default Lora rates seemed to overtrain very quickly, even with a polynomial schedule with a high power, while the lower rates (closer to those I use without Lora) worked to a point, but didn't really refine smaller details very well. Maybe something in between would work better but it seems like it would require a lot of trial and error to figure that out, so recommendations would be good.
Ideally I'd prefer to be able to train without Lora if it's possible, but maybe 12GB just isn't enough for that.
Beta Was this translation helpful? Give feedback.
All reactions