Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CogView4 (supports different length c and uc) #10649

Open
wants to merge 82 commits into
base: main
Choose a base branch
from

Conversation

zRzRzRzRzRzRzR
Copy link
Contributor

What does this PR do?

This PR aims to support the adaptation of the CogView with c and uc different length for diffusers.

We have reproduced the algorithm implementation, but this PR still requires further refinement. Currently, the output is pure green noise, so this PR remains in draft status and requires help from @a-r-r-o-w and @yiyixuxu.

zRzRzRzRzRzRzR and others added 29 commits January 14, 2025 20:27
Implement the basic CogView4 pipeline structure with the following changes:
- Add CogView4 pipeline implementation
- Implement DDIM scheduler for CogView4
- Add CogView3Plus transformer architecture
- Update embedding models

Current limitations:
- CFG implementation uses padding for sequence length alignment
- Need to verify transformer inference alignment with Megatron

TODO:
- Consider separate forward passes for condition/uncondition
  instead of padding approach
…n CogView4 pipeline

Split the forward pass for conditional and unconditional predictions in the CogView4 pipeline to match the original implementation. The noise prediction is now done separately for each case before combining them for guidance. However, the results still need improvement.

This is a work in progress as the generated images are not yet matching expected quality.
@a-r-r-o-w
Copy link
Member

@zRzRzRzRzRzRzR @OleehyO Thanks for the PR! I'm going to take a look soon and try to help with debugging Megatron -> Diffusers. I see some additional changes made to the files that are not relevant to CogView (currently more than 200 files have been changed 😅). Could you revert those changes by doing something like git restore -s main examples/ scripts/

@zRzRzRzRzRzRzR zRzRzRzRzRzRzR changed the title Cogview support with c and uc different length (Not work now) Cogview support with c and uc different length Feb 6, 2025
@zRzRzRzRzRzRzR
Copy link
Contributor Author

Now it is working
image

Copy link
Member

@a-r-r-o-w a-r-r-o-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good to merge from the modeling and pipeline perspective. For scheduler review, will defer to @yiyixuxu. I think we don't need a specific scheduler for CogView4 from a quick look but YiYi has a better understanding of it.

The outputs don't match to the values (numerically. they still match visually) before I took over with the latest changes. The differences are because:

  • RoPE in float32
  • latents are always in float32 and casted to transformer dtype before forward pass. Previously, it was bfloat16 (latent creation + transformer forward) -> float32 (scheduler step) -> bfloat16 again (transformer forward)
  • Changed how multi_modulate and multi_gate worked. This should practically have no difference in outputs, but seems like there's a difference on order of 1e-4 due to not performing the layer norm on concatenated encoder_hidden_states and hidden_states.

The reason for two forward passes in the pipeline is because the conditioning embeds and unconditioning embeds can have different sequence lengths.

The model for testing is available here (private): https://huggingface.co/ZP2HF/CogView4-6B-0125. This has the weights in the same format as the model that will be released next week. BUT, it is not the final weights based on my conversation with Yuxuan and an improved version will be released.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me! I think we still need to see if we can use existing scheduler though

from .scheduling_utils import SchedulerMixin


class CogViewScheduler(SchedulerMixin, ConfigMixin):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is still not resolved, no?

Copy link
Member

@a-r-r-o-w a-r-r-o-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yiyixuxu I think we should be good to merge. Could you review the scheduler related changes? I've mentioned the relevant parts

With these changes, we match the implementation before I took over with the latest updates. Outputs are identical

@@ -98,13 +98,17 @@ def __init__(
use_karras_sigmas: Optional[bool] = False,
use_exponential_sigmas: Optional[bool] = False,
use_beta_sigmas: Optional[bool] = False,
time_shift_type: str = "exponential",
Copy link
Member

@a-r-r-o-w a-r-r-o-w Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this new parameter because CogView4 performs timestep shifting without first exponentiating mu

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @hlky here too maybe if you think there's a better way to handle this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ok IMO, will let @yiyixuxu comment as well. We'll have 3 types of shifting (non-dynamic, dynamic with exponential shift, dynamic with linear shift) after this, so it can be looked at with the scheduler refactor.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me!

Comment on lines 573 to 578
timesteps = (
np.linspace(self.scheduler.config.num_train_timesteps, 1.0, num_inference_steps)
if timesteps is None
else np.array(timesteps)
)
timesteps = timesteps.astype(np.int64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit different from what we usually do. We don't use self.scheduler.timesteps returned from the call to retrieve_timesteps because those timesteps are from after resolution-based timestep shifting is applied. For CogView4, it seems like we need to use the timesteps from before applying shifting, but sigmas from after applying shifting.

Copy link
Collaborator

@yiyixuxu yiyixuxu Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's really werid - but if it's on purpose (not a oversight or something), we need support it from retrieve_timesteps and also the set_timesteps method from scheduler to accept both custom timesteps and custom sigmas. Either that or maybe add an option to calculate timesteps based on the sigmas pre-shiting

otherwise I don't think it would function correctly for img2img or training, where you do not start from the first timestep and need to search against self.scheduler.timesteps
e.g. this function won't work

def index_for_timestep(self, timestep, schedule_timesteps=None):

Copy link
Member

@a-r-r-o-w a-r-r-o-w Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have access to the original codebase yet, so it will be hard to check if it's an oversight. It is weird that we have to do it this way, but if we don't do it (that is having sigmas corresponding to timesteps), the final outputs come out with some residual noise.

Also seems like in my latest update I made a mistake doing timesteps.astype(np.float32) from some local testing. Basically, we want integer timesteps here first (to round down the float values from linspace), but then need float32 timesteps for our scheduler to not raise an error:

raise ValueError(
(
"Passing integer indices (e.g. from `enumerate(timesteps)`) as timesteps to"
" `EulerDiscreteScheduler.step()` is not supported. Make sure to pass"
" one of the `scheduler.timesteps` as a timestep."
),
)

So, it will have to be something like timesteps.astype(np.int64).astype(np.float32) to be consistent with the behaviour when we started updating the PR and to not error out in our scheduler

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could return timesteps without shifting, then apply shifting on the fly in scheduler.step?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will have to be something like timesteps.astype(np.int64).astype(np.float32)

ok, it seems like custom timesteps might be the way to go because this logic here is just really custom (even if we calculate the timesteps without shifting, we also need to do this round up thing first)
basically, you need to:

  1. remove the ValueError about passing sigma and timesteps at the same time
  2. add timesteps to set_timesteps
    timesteps: Optional[List[int]] = None,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do some more testing with a fresh mind in the morning to verify if we need differing timesteps vs sigmas here (to recheck if it is a possible oversight or not). I do think I did everything correctly when I tried earlier today, and as a result we might need to do what you mentioned, but wouldn't hurt to delay a little longer and verify again if it might help save us a bunch of changes.

@zRzRzRzRzRzRzR If it would be possible to share just the scheduler implemention related files with us, it would really help us understand if changes are required. No problem if not :) We can wait for the official release from THUDM and update our implementation

Comment on lines +343 to +345
scheduler = FlowMatchEulerDiscreteScheduler(
base_shift=0.25, max_shift=0.75, base_image_seq_len=256, use_dynamic_shifting=True, time_shift_type="linear"
)
Copy link
Member

@a-r-r-o-w a-r-r-o-w Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zRzRzRzRzRzRzR The converted checkpoints will have to also update the scheduler config (since we're no longer using the CogView4DDIMScheduler)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked, the modification in this part is understandable.

Comment on lines +225 to +227
scheduler = FlowMatchEulerDiscreteScheduler(
base_shift=0.25, max_shift=0.75, base_image_seq_len=256, use_dynamic_shifting=True, time_shift_type="linear"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zRzRzRzRzRzRzR Same comment as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the current algorithm comparison and the images produced in practice, this change seems to be functioning properly.

@a-r-r-o-w a-r-r-o-w requested a review from yiyixuxu February 13, 2025 12:29
@a-r-r-o-w a-r-r-o-w changed the title Cogview support with c and uc different length CogView4 (supports different length c and uc) Feb 13, 2025
@a-r-r-o-w a-r-r-o-w added close-to-merge roadmap Add to current release roadmap labels Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
close-to-merge roadmap Add to current release roadmap
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

6 participants