Skip to content

Commit

Permalink
Merge pull request #678 from THUDM/CogVideoX_dev
Browse files Browse the repository at this point in the history
docs: clarify frame number requirements for CogVideoX models
  • Loading branch information
zRzRzRzRzRzRzR authored Jan 22, 2025
2 parents ea994c7 + d9e2a41 commit bbe909d
Show file tree
Hide file tree
Showing 4 changed files with 18 additions and 3 deletions.
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,11 @@ models we currently offer, along with their foundational information.
<td colspan="1" style="text-align: center;"> Min(W, H) = 768 <br> 768 ≤ Max(W, H) ≤ 1360 <br> Max(W, H) % 16 = 0 </td>
<td colspan="3" style="text-align: center;">720 * 480</td>
</tr>
<tr>
<td style="text-align: center;">Number of Frames</td>
<td colspan="2" style="text-align: center;">Should be <b>16N + 1</b> where N <= 10 (default 81)</td>
<td colspan="3" style="text-align: center;">Should be <b>8N + 1</b> where N <= 6 (default 49)</td>
</tr>
<tr>
<td style="text-align: center;">Inference Precision</td>
<td colspan="2" style="text-align: center;"><b>BF16 (Recommended)</b>, FP16, FP32, FP8*, INT8, Not supported: INT4</td>
Expand Down
5 changes: 5 additions & 0 deletions README_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,11 @@ CogVideoXは、[清影](https://chatglm.cn/video?fr=osm_cogvideox) と同源の
<td colspan="1" style="text-align: center;"> Min(W, H) = 768 <br> 768 ≤ Max(W, H) ≤ 1360 <br> Max(W, H) % 16 = 0 </td>
<td colspan="3" style="text-align: center;">720 * 480</td>
</tr>
<tr>
<td style="text-align: center;">フレーム数</td>
<td colspan="2" style="text-align: center;"><b>16N + 1</b> (N <= 10) である必要があります (デフォルト 81)</td>
<td colspan="3" style="text-align: center;"><b>8N + 1</b> (N <= 6) である必要があります (デフォルト 49)</td>
</tr>
<tr>
<td style="text-align: center;">推論精度</td>
<td colspan="2" style="text-align: center;"><b>BF16(推奨)</b>, FP16, FP32,FP8*,INT8,INT4非対応</td>
Expand Down
7 changes: 6 additions & 1 deletion README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,12 @@ CogVideoX是 [清影](https://chatglm.cn/video?fr=osm_cogvideox) 同源的开源
<td colspan="1" style="text-align: center;">1360 * 768</td>
<td colspan="1" style="text-align: center;"> Min(W, H) = 768 <br> 768 ≤ Max(W, H) ≤ 1360 <br> Max(W, H) % 16 = 0 </td>
<td colspan="3" style="text-align: center;">720 * 480</td>
</tr>
</tr>
<tr>
<td style="text-align: center;">帧数</td>
<td colspan="2" style="text-align: center;">必须为 <b>16N + 1</b> 其中 N <= 10 (默认 81)</td>
<td colspan="3" style="text-align: center;">必须为 <b>8N + 1</b> 其中 N <= 6 (默认 49)</td>
</tr>
<tr>
<td style="text-align: center;">推理精度</td>
<td colspan="2" style="text-align: center;"><b>BF16(推荐)</b>, FP16, FP32,FP8*,INT8,不支持INT4</td>
Expand Down
4 changes: 2 additions & 2 deletions inference/cli_demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ def generate_video(
if width is None or height is None:
height, width = desired_resolution
logging.info(f"\033[1mUsing default resolution {desired_resolution} for {model_name}\033[0m")
elif (width, height) != desired_resolution:
elif (height, width) != desired_resolution:
if generate_type == "i2v":
# For i2v models, use user-defined width and height
logging.warning(
Expand All @@ -111,7 +111,7 @@ def generate_video(
logging.warning(
f"\033[1;31m{model_name} is not supported for custom resolution. Setting back to default resolution {desired_resolution}.\033[0m"
)
width, height = desired_resolution
height, width = desired_resolution

if generate_type == "i2v":
pipe = CogVideoXImageToVideoPipeline.from_pretrained(model_path, torch_dtype=dtype)
Expand Down

0 comments on commit bbe909d

Please sign in to comment.