Skip to content

Refactor CLIP and update SD3. #2316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

james77777778
Copy link
Collaborator

@james77777778 james77777778 commented Jun 28, 2025

Description of the change

Hi team,
I tried to add a numeric check for SD3 but was unsuccessful. It seems that the output latent differs slightly from the diffusers impl.

Here’s what I tried:

  • Replace LayerNormalization with the custom layer which is the same as diffusers's impl.
  • Updated the dtype of text encoders to be float16.
  • Updated the scheduler of SD3 to be consistent with diffusers's impl.
  • Fixed the input latent for both diffusers's pipeline and keras_hub's pipeline.
  • Used diffusers.StableDiffusion3Pipeline.encode_prompt to get the text embeddings and fed them to keras_hub.models.StableDiffusion3TextToImage
  • Ensured the inputs of the diffusion model are consistent with diffusers's impl. (timesteps, sigma, text embeddings and input latent)
  • Manually ran the denoise_step to verify the latent numerics.

The latent numerics failed to pass the check with atol=1e-1.

However, the generated images look good.

Model Generated Image
stable_diffusion_3.5_medium stable_diffusion_3 5_medium
stable_diffusion_3.5_large TBD
stable_diffusion_3.5_large_turbo TBD

cc @abheesht17

I think this PR still helps narrow the gap between keras_hub's impl and diffusers's impl.
Uploading new presets is required.

Reference

Colab Notebook

Checklist

  • I have added all the necessary unit tests for my change.
  • I have verified that my change does not break existing code and works with all backends (TensorFlow, JAX, and PyTorch).
  • My PR is based on the latest changes of the main branch (if unsure, rebase the code).
  • I have followed the Keras Hub Model contribution guidelines in making these changes.
  • I have followed the Keras Hub API design guidelines in making these changes.
  • I have signed the Contributor License Agreement.

# TODO: Deprecate this in favor of
# `keras.layers.LayerNormalization(rms_scaling=True)` once we require Keras
# 3.10 or later.
class RMSNormalization(layers.Layer):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this, can you try using this layer: https://github.com/keras-team/keras/blob/master/keras/src/layers/normalization/rms_normalization.py#L7?

keras.layers.LayerNormalization(rms_scaling=True) was incorrectly implemented, but keras.layers.RMSNormalization is correct. It should work here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, keras.layers.RMSNormalization requires keras >= 3.9.

I have added an identical custom layer as a fallback if layers doesn't include it.

@james77777778
Copy link
Collaborator Author

james77777778 commented Jun 29, 2025

@abheesht17
I have added the numeric checks for the scheduler and the text encoders. Also fixed the float16 + cpu issue in torch by setting dtype="float32" in CLIP's LayerNormalization.

Using JAX + CUDA:

Model Numeric Checks
stable_diffusion_3.5_medium 🔶 Scheduler difference: 3.569126e-05
🔶 Text embeddings difference: 0.0018353278

EDITED:
Just spotted some upstream Keras issues. Will try to fix them later. Plz don't merge!

@divyashreepathihalli divyashreepathihalli added the kokoro:force-run Runs Tests on GPU label Jul 10, 2025
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Jul 10, 2025
@divyashreepathihalli divyashreepathihalli added the kokoro:force-run Runs Tests on GPU label Jul 10, 2025
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Jul 10, 2025
@divyashreepathihalli
Copy link
Collaborator

@james77777778 we have error tolerance error with TF GPU, can you please check?

@divyashreepathihalli
Copy link
Collaborator

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to align the Keras implementation of Stable Diffusion 3 with the diffusers library by updating the layer normalization, scheduler, and text encoder data types. The changes are extensive, touching model definitions, backbones, and the checkpoint conversion script. The introduction of a custom RMSNormalization layer for compatibility and the refactoring of the scheduler logic are significant improvements. The conversion script is also updated to perform numerical validation against diffusers, which is a great step towards ensuring correctness.

My review focuses on improving code clarity and maintainability. I've identified a few opportunities to refactor repetitive code blocks and clarify stateful logic in a custom layer. These changes should make the code easier to read and manage in the future. Overall, this is a solid contribution toward improving the model's fidelity with the reference implementation.

@keras-team keras-team deleted a comment from gemini-code-assist bot Jul 11, 2025
@james77777778 james77777778 force-pushed the update-sd3 branch 2 times, most recently from d48544e to 49743e9 Compare July 12, 2025 16:13
@james77777778 james77777778 added the kokoro:force-run Runs Tests on GPU label Jul 12, 2025
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Jul 12, 2025
@james77777778 james77777778 added the kokoro:force-run Runs Tests on GPU label Jul 13, 2025
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Jul 13, 2025
@james77777778
Copy link
Collaborator Author

james77777778 commented Jul 13, 2025

In this PR:

  • CLIP has been refactored to be consistent with SigLIP
    • CLIPPreprocessor now accepts prompts and images instead of only images.
    • Reorganize the CLIP dir to follow other tasks.
  • The dtype of text encoders in SD3 defaults to float16
  • Ensure the output numeric of FlowMatchEulerDiscreteScheduler is consistent with diffusers's impl.

Caution: We need to upload the updated presets for CLIP and SD3 after the merge.

@james77777778 james77777778 changed the title Update SD3's LayerNormalization, scheduler and the dtype of the text encoders. Refactor CLIP and update SD3. Jul 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants