Refactor `CLIP` and update SD3. #2316

james77777778 · 2025-06-28T15:20:07Z

Description of the change

Hi team,
I tried to add a numeric check for SD3 but was unsuccessful. It seems that the output latent differs slightly from the diffusers impl.

Here’s what I tried:

Replace LayerNormalization with the custom layer which is the same as diffusers's impl.
Updated the dtype of text encoders to be float16.
Updated the scheduler of SD3 to be consistent with diffusers's impl.
Fixed the input latent for both diffusers's pipeline and keras_hub's pipeline.
Used diffusers.StableDiffusion3Pipeline.encode_prompt to get the text embeddings and fed them to keras_hub.models.StableDiffusion3TextToImage
Ensured the inputs of the diffusion model are consistent with diffusers's impl. (timesteps, sigma, text embeddings and input latent)
Manually ran the denoise_step to verify the latent numerics.

The latent numerics failed to pass the check with atol=1e-1.

However, the generated images look good.

Model	Generated Image
`stable_diffusion_3.5_medium`
`stable_diffusion_3.5_large`	TBD
`stable_diffusion_3.5_large_turbo`	TBD

cc @abheesht17

I think this PR still helps narrow the gap between keras_hub's impl and diffusers's impl.
Uploading new presets is required.

Reference

Colab Notebook

Checklist

I have added all the necessary unit tests for my change.
I have verified that my change does not break existing code and works with all backends (TensorFlow, JAX, and PyTorch).
My PR is based on the latest changes of the main branch (if unsure, rebase the code).
I have followed the Keras Hub Model contribution guidelines in making these changes.
I have followed the Keras Hub API design guidelines in making these changes.
I have signed the Contributor License Agreement.

abheesht17 · 2025-06-28T15:36:45Z

keras_hub/src/models/stable_diffusion_3/mmdit.py

+# TODO: Deprecate this in favor of
+# `keras.layers.LayerNormalization(rms_scaling=True)` once we require Keras
+# 3.10 or later.
+class RMSNormalization(layers.Layer):


Instead of this, can you try using this layer: https://github.com/keras-team/keras/blob/master/keras/src/layers/normalization/rms_normalization.py#L7?

keras.layers.LayerNormalization(rms_scaling=True) was incorrectly implemented, but keras.layers.RMSNormalization is correct. It should work here.

Unfortunately, keras.layers.RMSNormalization requires keras >= 3.9.

I have added an identical custom layer as a fallback if layers doesn't include it.

james77777778 · 2025-06-29T07:18:52Z

@abheesht17
I have added the numeric checks for the scheduler and the text encoders. Also fixed the float16 + cpu issue in torch by setting dtype="float32" in CLIP's LayerNormalization.

Using JAX + CUDA:

Model	Numeric Checks
`stable_diffusion_3.5_medium`	🔶 Scheduler difference: 3.569126e-05 🔶 Text embeddings difference: 0.0018353278

EDITED:
Just spotted some upstream Keras issues. Will try to fix them later. Plz don't merge!

divyashreepathihalli · 2025-07-10T23:54:25Z

@james77777778 we have error tolerance error with TF GPU, can you please check?

divyashreepathihalli · 2025-07-10T23:54:41Z

/gemini review

gemini-code-assist

Code Review

This pull request aims to align the Keras implementation of Stable Diffusion 3 with the diffusers library by updating the layer normalization, scheduler, and text encoder data types. The changes are extensive, touching model definitions, backbones, and the checkpoint conversion script. The introduction of a custom RMSNormalization layer for compatibility and the refactoring of the scheduler logic are significant improvements. The conversion script is also updated to perform numerical validation against diffusers, which is a great step towards ensuring correctness.

My review focuses on improving code clarity and maintainability. I've identified a few opportunities to refactor repetitive code blocks and clarify stateful logic in a custom layer. These changes should make the code easier to read and manage in the future. Overall, this is a solid contribution toward improving the model's fidelity with the reference implementation.

keras_hub/src/models/stable_diffusion_3/mmdit.py

keras_hub/src/models/stable_diffusion_3/stable_diffusion_3_backbone.py

…D3 scheduler and text encoders.

james77777778 · 2025-07-13T05:58:14Z

In this PR:

CLIP has been refactored to be consistent with SigLIP
- CLIPPreprocessor now accepts prompts and images instead of only images.
- Reorganize the CLIP dir to follow other tasks.
The dtype of text encoders in SD3 defaults to float16
Ensure the output numeric of FlowMatchEulerDiscreteScheduler is consistent with diffusers's impl.

Caution: We need to upload the updated presets for CLIP and SD3 after the merge.

abheesht17 reviewed Jun 28, 2025

View reviewed changes

james77777778 mentioned this pull request Jul 3, 2025

Fix rms_normalization and layer_normalization to ensure consistent precision in computations. keras-team/keras#21438

Merged

divyashreepathihalli approved these changes Jul 10, 2025

View reviewed changes

divyashreepathihalli added the kokoro:force-run Runs Tests on GPU label Jul 10, 2025

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Jul 10, 2025

divyashreepathihalli added the kokoro:force-run Runs Tests on GPU label Jul 10, 2025

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Jul 10, 2025

gemini-code-assist bot reviewed Jul 10, 2025

View reviewed changes

keras_hub/src/models/stable_diffusion_3/mmdit.py Show resolved Hide resolved

keras_hub/src/models/stable_diffusion_3/stable_diffusion_3_backbone.py Outdated Show resolved Hide resolved

keras-team deleted a comment from gemini-code-assist bot Jul 11, 2025

james77777778 added 5 commits July 12, 2025 23:13

Update SD3 scheduler and the dtype of the text encoders.

f9f256a

Fix the test.

da215e3

Fix torch float16 issues and jax take issue. Add numeric checks for S…

dbb192f

…D3 scheduler and text encoders.

Fix CLIP test.

c5bcfa0

Refactor CLIP models.

a8d8be9

james77777778 force-pushed the update-sd3 branch 2 times, most recently from d48544e to 49743e9 Compare July 12, 2025 16:13

Update CLIP conversion script.

df3f364

james77777778 force-pushed the update-sd3 branch from 49743e9 to df3f364 Compare July 12, 2025 16:18

Update from_config.

2e8482c

james77777778 added the kokoro:force-run Runs Tests on GPU label Jul 12, 2025

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Jul 12, 2025

Fix tests.

1353b57

james77777778 added the kokoro:force-run Runs Tests on GPU label Jul 13, 2025

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Jul 13, 2025

james77777778 requested a review from divyashreepathihalli July 13, 2025 06:02

james77777778 changed the title ~~Update SD3's LayerNormalization, scheduler and the dtype of the text encoders.~~ Refactor CLIP and update SD3. Jul 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor `CLIP` and update SD3. #2316

Refactor `CLIP` and update SD3. #2316

Uh oh!

james77777778 commented Jun 28, 2025 •

edited

Loading

Uh oh!

abheesht17 Jun 28, 2025

Uh oh!

james77777778 Jun 28, 2025

Uh oh!

james77777778 commented Jun 29, 2025 •

edited

Loading

Uh oh!

divyashreepathihalli commented Jul 10, 2025

Uh oh!

divyashreepathihalli commented Jul 10, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

james77777778 commented Jul 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

Refactor CLIP and update SD3. #2316

Are you sure you want to change the base?

Refactor CLIP and update SD3. #2316

Uh oh!

Conversation

james77777778 commented Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of the change

Reference

Colab Notebook

Checklist

Uh oh!

abheesht17 Jun 28, 2025

Choose a reason for hiding this comment

Uh oh!

james77777778 Jun 28, 2025

Choose a reason for hiding this comment

Uh oh!

james77777778 commented Jun 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

divyashreepathihalli commented Jul 10, 2025

Uh oh!

divyashreepathihalli commented Jul 10, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

james77777778 commented Jul 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Refactor `CLIP` and update SD3. #2316

Refactor `CLIP` and update SD3. #2316

james77777778 commented Jun 28, 2025 •

edited

Loading

james77777778 commented Jun 29, 2025 •

edited

Loading

james77777778 commented Jul 13, 2025 •

edited

Loading