首先给项目点赞！这是一个很好的全模态能力尝试项目，我将模型量化，以及测试情况做个汇报，还有一些报错问题跟项目团队请教，谢谢！

测试环境：Ubuntu 22.04，128GB RAM，Python 3.11，CUDA 12.4，Torch 2.6.0，RTX 4090 24GB 单显卡。
测试模型：Ming-Lite-Omni-1.5，量化方式：NF4 与 bf16 相结合。

首先分析了模型结构、权重和精度，个人认为应该可以通过量化，把项目在单卡 24GB 显存内跑起来，但不确定量化对输出质量的影响有多大。
在测试过程中，除了视频内容识别 OOM 外，全部模型加载后，现在基本在 20-23GB 左右。
在测试过程中，发现 repo 中的代码，很多还是 v1.0 时候的代码，好像还没有完全根据 v1.5 的升级，将相应代码全部调整完成。

模型刚加载完成后，显存占用 20GB 左右，如下：

<img width="558" height="148" alt="Image" src="https://github.com/user-attachments/assets/dbb3ddcc-69d2-458f-8230-2a64a8152c5f" />

基础文字对话功能正常：

![Image](https://github.com/user-attachments/assets/08280808-cf1b-4e81-986f-333f9c91de07)

图像识别对话功能正常：

![Image](https://github.com/user-attachments/assets/82d42ad7-6ab6-47f3-b08b-73606ada03d7)

视频识别对话功能 OOM：

![Image](https://github.com/user-attachments/assets/088c41d7-39d1-44ce-b44d-6a8fd2ea3df7)

语音识别对话功能正常：

![Image](https://github.com/user-attachments/assets/bfeeecbe-f47d-47eb-818a-ee220f067b2c)

文生图功能正常：

![Image](https://github.com/user-attachments/assets/bbd9c5f6-2470-499e-81b5-2a80acff0090)

文字输入，语音输出，或者语音对话功能报错，好像是 talker 部分模型加载问题：

![Image](https://github.com/user-attachments/assets/d61e7e73-679d-47d2-bcd8-3bfea02ad21d)

报错日志：

history: [(('/home/tkadm/Ming/temp/e8b1237d57a5922949fe61c6bca802ae2ddd7d63c4159763d08acaa7aaef4683/speechQA_sample.wav',), None)]
[{'role': 'HUMAN', 'content': [{'type': 'audio', 'audio': '/home/tkadm/Ming/temp/e8b1237d57a5922949fe61c6bca802ae2ddd7d63c4159763d08acaa7aaef4683/speechQA_sample.wav'}]}]
Traceback (most recent call last):
  File "/home/tkadm/miniconda3/envs/video/lib/python3.11/site-packages/gradio/queueing.py", line 715, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tkadm/miniconda3/envs/video/lib/python3.11/site-packages/gradio/route_utils.py", line 322, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tkadm/miniconda3/envs/video/lib/python3.11/site-packages/gradio/blocks.py", line 2220, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tkadm/miniconda3/envs/video/lib/python3.11/site-packages/gradio/blocks.py", line 1743, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tkadm/miniconda3/envs/video/lib/python3.11/site-packages/gradio/utils.py", line 739, in async_iteration
    return await anext(iterator)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/tkadm/miniconda3/envs/video/lib/python3.11/site-packages/gradio/utils.py", line 733, in __anext__
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tkadm/miniconda3/envs/video/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tkadm/miniconda3/envs/video/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2470, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/home/tkadm/miniconda3/envs/video/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 967, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tkadm/miniconda3/envs/video/lib/python3.11/site-packages/gradio/utils.py", line 716, in run_sync_iterator_async
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "/home/tkadm/miniconda3/envs/video/lib/python3.11/site-packages/gradio/utils.py", line 877, in gen_wrapper
    response = next(iterator)
               ^^^^^^^^^^^^^^
  File "/home/tkadm/Ming/gradio_demo_me-old.py", line 344, in chat_predict
    text, audio_path, image_path = generate(model, processor, messages, state, use_audio_response=use_audio_response)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tkadm/Ming/gradio_demo_me-old.py", line 252, in generate
    audio_path = text_to_speach(model, text, outputs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tkadm/Ming/gradio_demo_me-old.py", line 151, in text_to_speach
    audio_detokenizer = AudioDetokenizer(
                        ^^^^^^^^^^^^^^^^^
  File "/home/tkadm/Ming/modeling_bailing_talker.py", line 661, in __init__
    self.model.load(flow_model_path, hifigan_model_path)
  File "/home/tkadm/Ming/audio_detokenizer/cli/flow_stream_model.py", line 39, in load
    self.flow.load_state_dict(torch.load(flow_model, map_location=self.device), strict=True)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tkadm/miniconda3/envs/video/lib/python3.11/site-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. 
        (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
        (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
        WeightsUnpickler error: Unsupported global: GLOBAL pathlib.PosixPath was not an allowed global by default. Please use `torch.serialization.add_safe_globals([PosixPath])` or the `torch.serialization.safe_globals([PosixPath])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.

图像编辑功能报错，好像是图片尺寸问题：

![Image](https://github.com/user-attachments/assets/a390b9ad-ba07-41eb-857f-67a148edfe70)

错误日志如下：

history: [(('/home/tkadm/Ming/temp/16da8d87c405652ce67ca4fe9eb661562eb47afb9ff1b0a3cd872bd3a9a5a1d0/cake.jpg',), None), ('Add a candle on top of the cake', None)]
[{'role': 'HUMAN', 'content': [{'type': 'image', 'image': '/home/tkadm/Ming/temp/16da8d87c405652ce67ca4fe9eb661562eb47afb9ff1b0a3cd872bd3a9a5a1d0/cake.jpg'}, {'type': 'text', 'text': 'Add a candle on top of the cake'}]}]
Traceback (most recent call last):
  File "/home/tkadm/miniconda3/envs/video/lib/python3.11/site-packages/gradio/queueing.py", line 715, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tkadm/miniconda3/envs/video/lib/python3.11/site-packages/gradio/route_utils.py", line 322, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tkadm/miniconda3/envs/video/lib/python3.11/site-packages/gradio/blocks.py", line 2220, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tkadm/miniconda3/envs/video/lib/python3.11/site-packages/gradio/blocks.py", line 1743, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tkadm/miniconda3/envs/video/lib/python3.11/site-packages/gradio/utils.py", line 739, in async_iteration
    return await anext(iterator)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/tkadm/miniconda3/envs/video/lib/python3.11/site-packages/gradio/utils.py", line 733, in __anext__
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tkadm/miniconda3/envs/video/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tkadm/miniconda3/envs/video/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2470, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/home/tkadm/miniconda3/envs/video/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 967, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tkadm/miniconda3/envs/video/lib/python3.11/site-packages/gradio/utils.py", line 716, in run_sync_iterator_async
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "/home/tkadm/miniconda3/envs/video/lib/python3.11/site-packages/gradio/utils.py", line 877, in gen_wrapper
    response = next(iterator)
               ^^^^^^^^^^^^^^
  File "/home/tkadm/Ming/gradio_demo_me-old.py", line 344, in chat_predict
    text, audio_path, image_path = generate(model, processor, messages, state, use_audio_response=use_audio_response)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tkadm/Ming/gradio_demo_me-old.py", line 256, in generate
    image_path = generate_image(model, processor, messages, has_audio=has_audio)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tkadm/Ming/gradio_demo_me-old.py", line 133, in generate_image
    image = model.generate(
            ^^^^^^^^^^^^^^^
TypeError: modeling_bailingmm.BailingMMNativeForConditionalGeneration.generate() got multiple values for keyword argument 'image_gen_width'

烦请项目团队帮忙分析一下错误原因，谢谢！
由于时间关系，还没来得及详细阅读分析代码，可能很多理解上还不够到位，请项目团队指点。
待代码全部测试通过后，我将把量化后的模型，上传到 https://huggingface.co/wikeeyang 本人的模型目录下分享。



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

首先给项目点赞！这是一个很好的全模态能力尝试项目，我将模型量化，以及测试情况做个汇报，还有一些报错问题跟项目团队请教，谢谢！ #42

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

首先给项目点赞！这是一个很好的全模态能力尝试项目，我将模型量化，以及测试情况做个汇报，还有一些报错问题跟项目团队请教，谢谢！ #42

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions