CogView4 (supports different length c and uc) #10649

zRzRzRzRzRzRzR · 2025-01-25T06:06:18Z

What does this PR do?

This PR aims to support the adaptation of the CogView with c and uc different length for diffusers.

We have reproduced the algorithm implementation, but this PR still requires further refinement. Currently, the output is pure green noise, so this PR remains in draft status and requires help from @a-r-r-o-w and @yiyixuxu.

update

Implement the basic CogView4 pipeline structure with the following changes: - Add CogView4 pipeline implementation - Implement DDIM scheduler for CogView4 - Add CogView3Plus transformer architecture - Update embedding models Current limitations: - CFG implementation uses padding for sequence length alignment - Need to verify transformer inference alignment with Megatron TODO: - Consider separate forward passes for condition/uncondition instead of padding approach

…n CogView4 pipeline Split the forward pass for conditional and unconditional predictions in the CogView4 pipeline to match the original implementation. The noise prediction is now done separately for each case before combining them for guidance. However, the results still need improvement. This is a work in progress as the generated images are not yet matching expected quality.

a-r-r-o-w · 2025-01-27T05:01:59Z

@zRzRzRzRzRzRzR @OleehyO Thanks for the PR! I'm going to take a look soon and try to help with debugging Megatron -> Diffusers. I see some additional changes made to the files that are not relevant to CogView (currently more than 200 files have been changed 😅). Could you revert those changes by doing something like git restore -s main examples/ scripts/

HuggingFaceDocBuilderDev · 2025-02-12T11:25:54Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

yiyixuxu

looks good to me! I think we still need to see if we can use existing scheduler though

yiyixuxu · 2025-02-12T20:57:24Z

src/diffusers/schedulers/scheduling_cogview.py

+from .scheduling_utils import SchedulerMixin
+
+
+class CogViewScheduler(SchedulerMixin, ConfigMixin):


this is still not resolved, no?

a-r-r-o-w

@yiyixuxu I think we should be good to merge. Could you review the scheduler related changes? I've mentioned the relevant parts

With these changes, we match the implementation before I took over with the latest updates. Outputs are identical

a-r-r-o-w · 2025-02-13T12:24:02Z

src/diffusers/schedulers/scheduling_flow_match_euler_discrete.py

@@ -98,13 +98,17 @@ def __init__(
        use_karras_sigmas: Optional[bool] = False,
        use_exponential_sigmas: Optional[bool] = False,
        use_beta_sigmas: Optional[bool] = False,
+        time_shift_type: str = "exponential",


We need this new parameter because CogView4 performs timestep shifting without first exponentiating mu

cc @hlky here too maybe if you think there's a better way to handle this

This is ok IMO, will let @yiyixuxu comment as well. We'll have 3 types of shifting (non-dynamic, dynamic with exponential shift, dynamic with linear shift) after this, so it can be looked at with the scheduler refactor.

looks good to me!

Hello, drear
Tanks for all you jobs . Am really appreciated.

a-r-r-o-w · 2025-02-13T12:25:59Z

src/diffusers/pipelines/cogview4/pipeline_cogview4.py

+        timesteps = (
+            np.linspace(self.scheduler.config.num_train_timesteps, 1.0, num_inference_steps)
+            if timesteps is None
+            else np.array(timesteps)
+        )
+        timesteps = timesteps.astype(np.int64)


This is a bit different from what we usually do. We don't use self.scheduler.timesteps returned from the call to retrieve_timesteps because those timesteps are from after resolution-based timestep shifting is applied. For CogView4, it seems like we need to use the timesteps from before applying shifting, but sigmas from after applying shifting.

that's really werid - but if it's on purpose (not a oversight or something), we need support it from retrieve_timesteps and also the set_timesteps method from scheduler to accept both custom timesteps and custom sigmas. Either that or maybe add an option to calculate timesteps based on the sigmas pre-shiting

otherwise I don't think it would function correctly for img2img or training, where you do not start from the first timestep and need to search against self.scheduler.timesteps
e.g. this function won't work

diffusers/src/diffusers/schedulers/scheduling_flow_match_euler_discrete.py

Line 299 in a0c2299

def index_for_timestep(self, timestep, schedule_timesteps=None):

We don't have access to the original codebase yet, so it will be hard to check if it's an oversight. It is weird that we have to do it this way, but if we don't do it (that is having sigmas corresponding to timesteps), the final outputs come out with some residual noise.

Also seems like in my latest update I made a mistake doing timesteps.astype(np.float32) from some local testing. Basically, we want integer timesteps here first (to round down the float values from linspace), but then need float32 timesteps for our scheduler to not raise an error:

diffusers/src/diffusers/schedulers/scheduling_flow_match_euler_discrete.py

Lines 366 to 372 in a0c2299

raise ValueError(

(

"Passing integer indices (e.g. from `enumerate(timesteps)`) as timesteps to"

" `EulerDiscreteScheduler.step()` is not supported. Make sure to pass"

" one of the `scheduler.timesteps` as a timestep."

),

)

So, it will have to be something like timesteps.astype(np.int64).astype(np.float32) to be consistent with the behaviour when we started updating the PR and to not error out in our scheduler

We could return timesteps without shifting, then apply shifting on the fly in scheduler.step?

it will have to be something like timesteps.astype(np.int64).astype(np.float32)

ok, it seems like custom timesteps might be the way to go because this logic here is just really custom (even if we calculate the timesteps without shifting, we also need to do this round up thing first)
basically, you need to:

remove the ValueError about passing sigma and timesteps at the same time

add timesteps to set_timesteps

diffusers/src/diffusers/schedulers/scheduling_euler_discrete.py

Line 323 in a0c2299

timesteps: Optional[List[int]] = None,

I'll do some more testing with a fresh mind in the morning to verify if we need differing timesteps vs sigmas here (to recheck if it is a possible oversight or not). I do think I did everything correctly when I tried earlier today, and as a result we might need to do what you mentioned, but wouldn't hurt to delay a little longer and verify again if it might help save us a bunch of changes.

@zRzRzRzRzRzRzR If it would be possible to share just the scheduler implemention related files with us, it would really help us understand if changes are required. No problem if not :) We can wait for the official release from THUDM and update our implementation

I think we're going to have to go forward with the update regarding both timesteps and sigmas being provided. Almost anything else I try seems to result in residual noise. I've pushed some changes regarding handling with some additional comments in .step() for better understanding. Okay to remove if it's not really required

a-r-r-o-w · 2025-02-13T12:27:29Z

scripts/convert_cogview4_to_diffusers_megatron.py

+    scheduler = FlowMatchEulerDiscreteScheduler(
+        base_shift=0.25, max_shift=0.75, base_image_seq_len=256, use_dynamic_shifting=True, time_shift_type="linear"
+    )


@zRzRzRzRzRzRzR The converted checkpoints will have to also update the scheduler config (since we're no longer using the CogView4DDIMScheduler)

I have checked, the modification in this part is understandable.

a-r-r-o-w · 2025-02-13T12:27:57Z

scripts/convert_cogview4_to_diffusers.py

+    scheduler = FlowMatchEulerDiscreteScheduler(
+        base_shift=0.25, max_shift=0.75, base_image_seq_len=256, use_dynamic_shifting=True, time_shift_type="linear"
+    )


@zRzRzRzRzRzRzR Same comment as above

Based on the current algorithm comparison and the images produced in practice, this change seems to be functioning properly.

src/diffusers/models/embeddings.py

Co-authored-by: YiYi Xu <[email protected]>

src/diffusers/pipelines/cogview4/pipeline_cogview4.py

yiyixuxu · 2025-02-14T19:24:20Z

src/diffusers/schedulers/scheduling_flow_match_euler_discrete.py

-            timesteps = np.linspace(
-                self._sigma_to_t(self.sigma_max), self._sigma_to_t(self.sigma_min), num_inference_steps
-            )
+        self.num_inference_steps = num_inference_steps


num_inference_steps is an optional argument, only need to be passed if no custom sigmas or timesteps is passed

think we will also need a few checks here to make sure:

num_inference_steps is passed when sigmas are timesteps are both None

len(sigmas) == len(timesteps) is same if both are passed

num_inference_steps is same as len(sigmas)/len(timesteps) if custom sigmas/timesteps are passed

Added the checks

yiyixuxu

left one last comment, look good to me otherwise!

src/diffusers/schedulers/scheduling_flow_match_euler_discrete.py

Co-authored-by: YiYi Xu <[email protected]>

nitinmukesh · 2025-02-17T13:12:35Z

Hello @zRzRzRzRzRzRzR

The code was merged so I tried to run but models do not exists THUDM/CogView4-6B
Any plan to make the model available for testing.

vladmandic · 2025-02-22T19:17:15Z

btw, whats the status of this model? afaik, it was supposed to be released by now...

zRzRzRzRzRzRzR and others added 29 commits January 14, 2025 20:27

init

2640bcf

Merge branch 'huggingface:main' into cogview4

eba11fa

encode with glm

6163679

draft schedule

6090ea7

feat(scheduler): Add CogView scheduler implementation

c7d1227

Merge remote-tracking branch 'origin/cogview4' into cogview4

e9f6626

Merge branch 'huggingface:main' into cogview4

549b357

Merge branch 'huggingface:main' into cogview4

004d002

feat(embeddings): add CogView 2D rotary positional embedding

f4457fb

Merge remote-tracking branch 'origin/cogview4' into cogview4

5f8d33b

1

9a93218

Update pipeline_cogview4.py

ca000dd

fix the timestep init and sigma

7ab4a3f

update latent

56ceaa6

draft patch(not work)

a7179a2

Merge branch 'cogview4'

c9ddf50

Merge pull request #2 from zRzRzRzRzRzRzR/main

2f30cc1

update

fix

e6b8907

use with -2 hidden state

b86bfd4

remove text_projector

c4d1e69

1

7916140

[WIP] Add tensor-reload to align input from transformer block

f8945ce

[WIP] for older glm

bf7f322

use with cogview4 transformers forward twice of u and uc

dd6568b

Merge branch 'huggingface:main' into cogview4

6f5407e

Update convert_cogview4_to_diffusers.py

9e5b991

remove this

36b1682

update toctree.yml

2046cf2

yiyixuxu reviewed Feb 12, 2025

View reviewed changes

a-r-r-o-w added 3 commits February 13, 2025 13:21

use flow match scheduler instead of custom

39e1198

Merge branch 'main' into cogview4

b566a9f

remove scheduling_cogview.py

b4c9fde

a-r-r-o-w approved these changes Feb 13, 2025

View reviewed changes

a-r-r-o-w requested a review from yiyixuxu February 13, 2025 12:29

add tiktoken to test dependencies

a137e17

yiyixuxu reviewed Feb 13, 2025

View reviewed changes

src/diffusers/models/embeddings.py Outdated Show resolved Hide resolved

Update src/diffusers/models/embeddings.py

da420fb

Co-authored-by: YiYi Xu <[email protected]>

yiyixuxu reviewed Feb 13, 2025

View reviewed changes

src/diffusers/pipelines/cogview4/pipeline_cogview4.py Outdated Show resolved Hide resolved

a-r-r-o-w added 2 commits February 13, 2025 21:30

apply suggestions from review

4003b9c

use diffusers apply_rotary_emb

35c0ec6

a-r-r-o-w changed the title ~~Cogview support with c and uc different length~~ CogView4 (supports different length c and uc) Feb 13, 2025

a-r-r-o-w added close-to-merge roadmap Add to current release roadmap labels Feb 13, 2025

a-r-r-o-w added 3 commits February 14, 2025 16:39

update flow match scheduler to accept timesteps

d328c5e

Merge branch 'main' into cogview4

d637d3a

fix comment

4c37ef0

yiyixuxu reviewed Feb 14, 2025

View reviewed changes

a-r-r-o-w added 2 commits February 14, 2025 21:42

apply review sugestions

90c240b

Merge branch 'main' into cogview4

5c11298

yiyixuxu approved these changes Feb 14, 2025

View reviewed changes

src/diffusers/schedulers/scheduling_flow_match_euler_discrete.py Outdated Show resolved Hide resolved

Update src/diffusers/schedulers/scheduling_flow_match_euler_discrete.py

2f12b7a

Co-authored-by: YiYi Xu <[email protected]>

a-r-r-o-w merged commit d90cd36 into huggingface:main Feb 15, 2025
12 checks passed

a-r-r-o-w mentioned this pull request Feb 16, 2025

Issues with FlowMatchEulerDiscreteScheduler.set_timesteps() #10637

Closed

yiyixuxu removed this from Diffusers Roadmap 0.35 Apr 18, 2025

		from .scheduling_utils import SchedulerMixin


		class CogViewScheduler(SchedulerMixin, ConfigMixin):

	raise ValueError(
	(
	"Passing integer indices (e.g. from `enumerate(timesteps)`) as timesteps to"
	" `EulerDiscreteScheduler.step()` is not supported. Make sure to pass"
	" one of the `scheduler.timesteps` as a timestep."
	),
	)

CogView4 (supports different length c and uc) #10649

CogView4 (supports different length c and uc) #10649

Conversation

zRzRzRzRzRzRzR commented Jan 25, 2025

What does this PR do?

Uh oh!

a-r-r-o-w commented Jan 27, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Feb 12, 2025

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

a-r-r-o-w left a comment

Choose a reason for hiding this comment

Uh oh!

a-r-r-o-w Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yiyixuxu Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

a-r-r-o-w Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

a-r-r-o-w Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nitinmukesh commented Feb 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vladmandic commented Feb 22, 2025

Uh oh!

Uh oh!

a-r-r-o-w Feb 13, 2025 •

edited

Loading

yiyixuxu Feb 13, 2025 •

edited

Loading

a-r-r-o-w Feb 13, 2025 •

edited

Loading

a-r-r-o-w Feb 13, 2025 •

edited

Loading

nitinmukesh commented Feb 17, 2025 •

edited

Loading