-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about training and training stability #11
Comments
If I can get VAE training code that works reliably and isn't a mess to read, I'll definitely release it - still experimenting with various approaches for that though... For the TAESD weights in the repo, my optimizer was |
BTW, it looks like the MosaicML folks are working on a cleaned up version of the Latent Diffusion VAE training code here mosaicml/diffusion#79 |
Adding some more info here. Changes in 1.2For TAESD 1.2, I removed the LPIPS and other icky hand-coded losses (now just using adversarial + very faint lowres MSE). I also added adversarial loss to the encoder training as well (though I'm not sure it made a difference). Various questions I've seen
Various figuresColor AugmentationColor augmentation (occasional hue / saturation shifting of input images) helped improve reproduction of saturated colors (which are otherwise rare in aesthetic datasets) ![]() Downsides of different lossesMSE/MAE can make everything very smooth (top is GT, bottom is a simple MSE-only decoder) LPIPS can cause recognizable artifacts on faces & eyes (top is from a run with LPIPS, bottom is a run without it) ![]() Adversarial loss can cause divergence if not handled properly: ![]() Blue eyesI don't remember what caused this ![]() |
Hello Ollin, I'm a Machine Learning beginner and want to train a TAESD (maybe a modified version) from scratch. |
@bcw222 I posted https://github.com/madebyollin/seraena/blob/main/TAESDXL_Training_Example.ipynb which should work as a starting point (and it does most of the complicated adversarial loss part). You can try adding additional pooled-MSE / LPIPS losses to speed up convergence |
Hello,
Thank you so much for all your awesome work. It is really great stuff!
I have some questions about training (not answered by #2). First, if it is possible, would you be able to release your training code? It would be super helpful.
I'm asking because I'm looking to train some autoencoders that combine multiple types of image information (i.e. rgb, depth, segmentation, etc.). I have a training script at the moment (based on diffusers here), but I'm finding that training is very unstable. I'm constantly getting NaNs. Did you find that any tricks were necessary to finetune the VAE without getting NaNs? What learning rate / batch size did you use?
Also, if you have any more experiments that you want to run, I have access to some good resources (8 A100s), so let me know. I feel like these autoencoders are quite under-explored and a lot more (semantic and geometric) information could be integrated into them.
Best,
greeneggsandyaml
The text was updated successfully, but these errors were encountered: