A problem regarding stable video diffusion, with inference results yielding black images #7399

DoloresChong · 2024-03-20T02:22:11Z

DoloresChong
Mar 20, 2024

I encountered an issue with inference in stable video diffusion. When I attempted to perform inference using the sample code, it indicated insufficient VRAM (video random access memory). Subsequently, I reduced the width and height of the output size, which resulted in the inference process running quickly. However, every frame image I obtained turned out to be black. I'm unsure about the reason behind this issue.

Sample code:

python
Copy code
pipeline = StableVideoDiffusionPipeline.from_pretrained(
"stabilityai/stable-video-diffusion-img2vid", torch_dtype=torch.float16, variant="fp16"
)
pipeline.enable_model_cpu_offload()
img = Image.open(r"genner_img\2024-03-11\23-09-10.jpg")

frames = pipeline(img, decode_chunk_size=1, generator=generator, output_type='np', height=256, width=256).frames[0]

export_to_video(frames, "generated.mp4", fps=7)

tolgacangoz · 2024-03-20T10:20:46Z

tolgacangoz
Mar 20, 2024

I suspect that there is/are operation(s) requiring float32 calculations in your hardware, but with float16 calculations they output NaN and this results in black images. This was exactly what I experienced. Of course, this might not be your case. Is there any warning?
Also, could you try this:

pipeline = StableVideoDiffusionPipeline.from_pretrained(
"stabilityai/stable-video-diffusion-img2vid", torch_dtype=torch.float32)

pipeline.enable_model_cpu_offload() or pipeline.enable_sequential_cpu_offload()
img = Image.open(r"genner_img\2024-03-11\23-09-10.jpg")

frames = pipeline(img, decode_chunk_size=1, generator=generator, output_type='np', height=256, width=256).frames[0]

export_to_video(frames, "generated.mp4", fps=7)

Sequential offload might take long, but I wonder if it works in this way.

4 replies

tolgacangoz Mar 20, 2024

Also, you could add pipeline.unet.enable_forward_chunking().

DoloresChong Mar 21, 2024
Author

Thank you very much for your suggestion. I believe what you mentioned about 'float32' is the reason. I reduced the output size because my GPU memory is only 6GB, and running at 512x512 would result in an out-of-memory error. This is quite challenging

tolgacangoz Mar 21, 2024

I see. It might be possible with enable_sequential_cpu_offload for float32, but it might be extremely slow. Could you try all the optimization methods to see if it helps?
Do you see any warning when you run with float16?

DoloresChong Mar 21, 2024
Author

Thank you very much for your warmhearted support in advance! With your guidance, I attempted inference in the 'float32' format and successfully executed it. Therefore, I believe the ultimate reason is this: Due to insufficient GPU memory, some parts of the SVD (stable video diffusion) model were loaded onto the CPU for execution. Meanwhile, I specified the format as 'float16' on CUDA, but since the data format on the CPU is 'float32', it resulted in an error

sayakpaul · 2024-03-20T12:33:13Z

sayakpaul
Mar 20, 2024
Maintainer

It could also be because SVD is not known to generate videos at such a low resolution.

2 replies

tolgacangoz Mar 20, 2024

Yeah, I tried with 256x256 in Colab, and the result isn't a black video but an almost complete gibberish.

Reference Image	256x256

DoloresChong Mar 21, 2024
Author

Thank you for your response. The reason may indeed be as I mentioned in my previous reply

A problem regarding stable video diffusion, with inference results yielding black images #7399

Uh oh!

DoloresChong Mar 20, 2024

Replies: 2 comments · 6 replies

Uh oh!

Uh oh!

tolgacangoz Mar 20, 2024

Uh oh!

tolgacangoz Mar 20, 2024

Uh oh!

DoloresChong Mar 21, 2024 Author

Uh oh!

Uh oh!

tolgacangoz Mar 21, 2024

Uh oh!

DoloresChong Mar 21, 2024 Author

Uh oh!

sayakpaul Mar 20, 2024 Maintainer

Uh oh!

Uh oh!

tolgacangoz Mar 20, 2024

Uh oh!

DoloresChong Mar 21, 2024 Author

DoloresChong
Mar 20, 2024

Replies: 2 comments 6 replies

tolgacangoz
Mar 20, 2024

DoloresChong Mar 21, 2024
Author

DoloresChong Mar 21, 2024
Author

sayakpaul
Mar 20, 2024
Maintainer

DoloresChong Mar 21, 2024
Author