Reason for 49 frames (extra split for interpolation) #659

karan-dalal · 2025-01-12T04:25:09Z

N/A

N/A

I am finetuning the T2V model, and wanted to understand why we are required to have 4x+1 frame count.

I see that the DownSample3D module in the VAE will split the first frame off, and only interpolate the remaining frames.

Line 574 in 8f1829f

if get_context_parallel_rank() == 0 and fake_cp:

Why do we not set frames to 48, why do we need a frame that doesn't interpolate with others?

The text was updated successfully, but these errors were encountered:

yzy-thu · 2025-01-14T05:08:07Z

We follow magvit-v2 (https://arxiv.org/html/2310.05737v2). 4x+1 enable joint training with images and videos

karan-dalal · 2025-01-14T05:10:08Z

If I'm only finetuning with videos, would it be better to just train without the extra 1?

zhuochen02 · 2025-01-14T13:40:12Z

same question

Wang-pengfei · 2025-01-24T08:42:31Z

same question

zRzRzRzRzRzRzR assigned zRzRzRzRzRzRzR and tengjiayan20 Jan 12, 2025

zRzRzRzRzRzRzR assigned yzy-thu Jan 24, 2025

Provide feedback