Does diffusers have any automated testing to ensure the same inputs give the same outputs between versions? #11057

CodeExplode · 2025-03-14T03:12:51Z

CodeExplode
Mar 14, 2025

I am trying to debug a problem which has appeared in my Stable Diffusion training script where people now frequently generate with extra limbs or corrupted faces in trained models, issues which seem perhaps even present in the pre-training previews generated to ensure that the model is working, though it is very hard to say. It doesn't seem explainable with any changes in the dataset, and even filtering to train on only my highest quality data and excluding any complex poses, the problem still has persisted for over a month.

While trying to find possible explanations, I noticed that I updated my requirements.txt versions for most packages on the date this problem seem to have started, so have reverted to a very old version and am trying training again, and so far the previews look much higher quality. If this is the cause, there are a dozen possible explanations between torch version changes, optimizer upgrades, etc, though part of me wonders if it might somehow be related to Diffusers or Transformers (I mostly suspect this issue lies in the text encoding somewhere, as when I tried using an older finetuned text encoder the problem seem greatly reduced and more in line with previous results).

If my results do improve, I'm going to have to try upgrading requirements bit by bit and re-training which is costly in terms of time, but am wondering if the core packages such as diffusers and transformers might already have tests to confirm that their outputs would be generally persistent between versions and that it's probably not worth stressing about those too much.

asomoza · 2025-03-14T10:42:33Z

asomoza
Mar 14, 2025
Maintainer

Hi, yes we have tests that ensure that, you didn't write which model were you training but you can read the tests, for example for SDXL there is this test among others. All the pipelines have similar tests.

Also there was a change in the slices because transformers added FA2 I think and this affected the text encoders, sound really similar to what you're commenting but if I understand correctly what you're saying is that recent diffusers versions have worse results?

0 replies

CodeExplode · 2025-03-14T10:52:49Z

CodeExplode
Mar 14, 2025
Author

Heya thanks, that's good to confirm and in retrospect it seems pretty likely, but it should help me debug this issue more easily without writing those same tests again.

The training was for SD1.5, using diffusers @ a9d3f6 and transformers==4.46.3. I've reverted all my dependencies to a much older version with requirements initially copied from OneTrainer, with github.com/kashif/diffusers.git@a3dc213 and transformers==4.36.2, and have started training again, and the issue seems to finally be resolved, but it could have been any number of dependencies which were the cause, assuming it's also not just blind luck.

It might have also been an incorrect use of the VAE on my part, since I added this step to be SD3 compatible and which perhaps shouldn't have ever been called for SD1.5, which had no shift factor:

if vae.config.shift_factor is not None:
latents = latents - vae.config.shift_factor

Some other changes were:

(old)
torch==2.1.2+cu118
torchvision==0.16.2+cu118

(new)
torch==2.5.1+cu124
torchvision==0.20.1+cu124

And many other dependency changes which might have caused it.

2 replies

asomoza Mar 14, 2025
Maintainer

I see, IMO I don't think your issue was caused by the torch versions, I've used almost all of them and never saw more limbs or blurred faces because of it, but also I almost never use SD 1.5 after SDXL was released unless it's to test some issue with it. But if you find the cause of this I would appreciate a lot if you share it here so I can be aware of it if someone else has the same one in the future.

CodeExplode Mar 14, 2025
Author

It was very a subtle issue and frustratingly hard to identify, being noticeable in my finetunes I've trained on the same dataset (with incremental improvements) hundreds of times now over the past 2+ years, and suddenly this issue was persistent in every finetune. Not in every image, but enough of them that it was obvious that something was wrong. There was also an odd smearing of colours / textures sometimes, not seeming to respect object boundaries (such as clothes into skin), but not the majority of the time.

Initially I thought it must be new data or training techniques, but no tests of different sub selections or reverted techniques seemed to be able to stop it happening in every finetune. Then I noticed some odd corruptions were rarely present even in the pre-training samples which are generated just to confirm the model is working, so might not have been related to finetuning exactly, but rather something in the process going occasionally wrong, which then baked in errors during training.

Realizing that my requirements.txt was last updated within a few days the date I'd narrowed this down to starting on (and I rarely update the requirements) made me realize there's a likely cause in something which was changed (which may be entirely unrelated to diffusers), and training now with the older requirements seems to have finally solved it. There's still limb issues, but they're more in line with what is expected and was seen before, rather than the frequent appearance of extra arms or extra hands emerging from wrists, and the bleeding of textures / corruption of faces. It's really hard to imagine what could have caused that behaviour to be trained into the models and in a way which only emerged sometimes, but it might plausibly be an optimization to attention calculations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Does diffusers have any automated testing to ensure the same inputs give the same outputs between versions? #11057

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Does diffusers have any automated testing to ensure the same inputs give the same outputs between versions? #11057

Uh oh!

CodeExplode Mar 14, 2025

Replies: 2 comments · 2 replies

Uh oh!

asomoza Mar 14, 2025 Maintainer

Uh oh!

Uh oh!

CodeExplode Mar 14, 2025 Author

Uh oh!

asomoza Mar 14, 2025 Maintainer

Uh oh!

Uh oh!

CodeExplode Mar 14, 2025 Author

CodeExplode
Mar 14, 2025

Replies: 2 comments 2 replies

asomoza
Mar 14, 2025
Maintainer

CodeExplode
Mar 14, 2025
Author

asomoza Mar 14, 2025
Maintainer

CodeExplode Mar 14, 2025
Author