We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
N/A
I am finetuning the T2V model, and wanted to understand why we are required to have 4x+1 frame count.
4x+1
I see that the DownSample3D module in the VAE will split the first frame off, and only interpolate the remaining frames.
DownSample3D
CogVideo/sat/vae_modules/cp_enc_dec.py
Line 574 in 8f1829f
Why do we not set frames to 48, why do we need a frame that doesn't interpolate with others?
48
The text was updated successfully, but these errors were encountered:
We follow magvit-v2 (https://arxiv.org/html/2310.05737v2). 4x+1 enable joint training with images and videos
Sorry, something went wrong.
If I'm only finetuning with videos, would it be better to just train without the extra 1?
same question
yzy-thu
zRzRzRzRzRzRzR
tengjiayan20
No branches or pull requests
System Info / 系統信息
N/A
Information / 问题信息
Reproduction / 复现过程
N/A
Expected behavior / 期待表现
I am finetuning the T2V model, and wanted to understand why we are required to have
4x+1
frame count.I see that the
DownSample3D
module in the VAE will split the first frame off, and only interpolate the remaining frames.CogVideo/sat/vae_modules/cp_enc_dec.py
Line 574 in 8f1829f
Why do we not set frames to
48
, why do we need a frame that doesn't interpolate with others?The text was updated successfully, but these errors were encountered: