Questions about training and training stability #11

greeneggsandyaml · 2023-10-21T10:30:12Z

Hello,

Thank you so much for all your awesome work. It is really great stuff!

I have some questions about training (not answered by #2). First, if it is possible, would you be able to release your training code? It would be super helpful.

I'm asking because I'm looking to train some autoencoders that combine multiple types of image information (i.e. rgb, depth, segmentation, etc.). I have a training script at the moment (based on diffusers here), but I'm finding that training is very unstable. I'm constantly getting NaNs. Did you find that any tricks were necessary to finetune the VAE without getting NaNs? What learning rate / batch size did you use?

Also, if you have any more experiments that you want to run, I have access to some good resources (8 A100s), so let me know. I feel like these autoencoders are quite under-explored and a lot more (semantic and geometric) information could be integrated into them.

Best,
greeneggsandyaml

madebyollin · 2023-10-26T15:43:21Z

If I can get VAE training code that works reliably and isn't a mess to read, I'll definitely release it - still experimenting with various approaches for that though...

For the TAESD weights in the repo, my optimizer was th.optim.Adam(model.parameters(), 3e-4, betas=(0.9, 0.9), amsgrad=True) and batch size 16 - nothing fancy (trained on 1xA10 :P). My adversarial loss was "relativistic" (penalizing distance(disc(real).mean(), disc(fake).mean()), rather than just disc(fake).mean()) and I used a replay buffer of fakes for the discriminator training, both of which may have helped with stability. I also used several auxiliary losses (LPIPS + frequency-amplitude matching + MSE at 1/8 res) for the most recent model, which helped reduce the dependence on the adversarial loss a bit. I don't remember any persistent instability issues with this setup.

madebyollin · 2023-10-30T00:58:36Z

BTW, it looks like the MosaicML folks are working on a cleaned up version of the Latent Diffusion VAE training code here mosaicml/diffusion#79

madebyollin · 2024-01-29T15:52:56Z

Adding some more info here.

Changes in 1.2

For TAESD 1.2, I removed the LPIPS and other icky hand-coded losses (now just using adversarial + very faint lowres MSE). I also added adversarial loss to the encoder training as well (though I'm not sure it made a difference).

Various questions I've seen

Are the decoder targets GT images or SD-decoded images? GT; TAESD's decoder is a standalone conditional GAN, not a distilled model.
What dataset was used? Depends on model version, but usually some mix of photos (e.g. laion-aesthetic) and illustrations (e.g. danbooru2021), with some color / geometric augmentations
Do you delay adversarial loss until a certain number of steps (like the SD VAE does)? I usually prefer to start from a pretrained decoder model, but I don't have some specific number of steps in mind.
What do you mean by low-res MSE loss? like F.mse_loss(F.avg_pool2d(decoded, 8), F.avg_pool2d(real_images, 8)). Just making sure that the color of each 8x8 patch is approximately correct.
Which reference VAEs do you use? https://huggingface.co/stabilityai/sd-vae-ft-ema and https://huggingface.co/madebyollin/sdxl-vae-fp16-fix - these are used to supervise the encoder and also as a gold standard for decoder quality.

Various figures

Color Augmentation

Color augmentation (occasional hue / saturation shifting of input images) helped improve reproduction of saturated colors (which are otherwise rare in aesthetic datasets)

Downsides of different losses

MSE/MAE can make everything very smooth (top is GT, bottom is a simple MSE-only decoder)

LPIPS can cause recognizable artifacts on faces & eyes (top is from a run with LPIPS, bottom is a run without it)

Adversarial loss can cause divergence if not handled properly:

Blue eyes

I don't remember what caused this

bcw222 · 2024-08-01T08:04:54Z

If I can get VAE training code that works reliably and isn't a mess to read, I'll definitely release it - still experimenting with various approaches for that though...

Hello Ollin, I'm a Machine Learning beginner and want to train a TAESD (maybe a modified version) from scratch.
May you release the training code (even an early version as a reference) now?

madebyollin · 2024-08-01T15:09:16Z

@bcw222 I posted https://github.com/madebyollin/seraena/blob/main/TAESDXL_Training_Example.ipynb which should work as a starting point (and it does most of the complicated adversarial loss part). You can try adding additional pooled-MSE / LPIPS losses to speed up convergence

W6WM9M mentioned this issue Apr 16, 2024

Which autoencoder is TAESD distilled from? #15

Closed

madebyollin mentioned this issue Apr 24, 2024

How do you train the TAESD's latent space shared with SD VAE? #16

Closed

madebyollin mentioned this issue Jul 9, 2024

Image quality (SD2.1 Encoder, taesd Decoder ) #24

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about training and training stability #11

Questions about training and training stability #11

greeneggsandyaml commented Oct 21, 2023

madebyollin commented Oct 26, 2023

madebyollin commented Oct 30, 2023

madebyollin commented Jan 29, 2024

bcw222 commented Aug 1, 2024 •

edited

Loading

madebyollin commented Aug 1, 2024

Questions about training and training stability #11

Questions about training and training stability #11

Comments

greeneggsandyaml commented Oct 21, 2023

madebyollin commented Oct 26, 2023

madebyollin commented Oct 30, 2023

madebyollin commented Jan 29, 2024

Changes in 1.2

Various questions I've seen

Various figures

Color Augmentation

Downsides of different losses

Blue eyes

bcw222 commented Aug 1, 2024 • edited Loading

madebyollin commented Aug 1, 2024

bcw222 commented Aug 1, 2024 •

edited

Loading