Skip to content

Implemented Qwen3.5#61

Draft
alcoftTAO wants to merge 3 commits intoJamePeng:mainfrom
TAO71-AI:main
Draft

Implemented Qwen3.5#61
alcoftTAO wants to merge 3 commits intoJamePeng:mainfrom
TAO71-AI:main

Conversation

@alcoftTAO
Copy link

This PR implements Qwen3.5 models. Not tested yet due to lack of compute power on my end.

This PR is going to be a draft for now until I can test it with smaller models and also check and fix the chat template and parameters of the Qwen35ChatHandler class.

I still need to decide which parameters are useful.

@JamePeng JamePeng force-pushed the main branch 5 times, most recently from 76d8272 to 68eacae Compare February 19, 2026 14:03
@JamePeng
Copy link
Owner

Detailed adaptation work can be done after Qwen3.5-9B-Instruct and Qwen3.5-35B-A3B-Instruct are released.
Indeed, the current open-source Qwen3.5 model is too large.

@yamikumo-DSD
Copy link

yamikumo-DSD commented Feb 23, 2026

I've tested pruned version of the Qwen 3.5 model.
https://huggingface.co/infinityai/Qwen3.5-397B-REAP-50-IQ3_M/tree/main
This model is also suffered from memory_seq_rm failure problem (issue).
So, I guess even after the ChatHandler implemented, multiple turn conversation won't work currently.

@alcoftTAO
Copy link
Author

I'm updating and testing this with Qwen3.5-27B right now.

@alcoftTAO alcoftTAO reopened this Feb 26, 2026
@alcoftTAO
Copy link
Author

Closed the PR temporarily to update to the latest commit.
I will continue to test this.

@JamePeng
Copy link
Owner

I'm currently refactoring some logic locally, but the hybrid structure of qwen3-next and qwen3.5 is basically running. You can continue testing after I finish the initial implementation.

@alcoftTAO
Copy link
Author

alcoftTAO commented Feb 26, 2026

I'm having some trouble testing this.

Traceback (most recent call last):
  ...
  File ".../.env/lib/python3.11/site-packages/llama_cpp/__init__.py", line 1, in <module>
    from .llama_cpp import *
  File ".../.env/lib/python3.11/site-packages/llama_cpp/llama_cpp.py", line 1391, in <module>
    @ctypes_function(
     ^^^^^^^^^^^^^^^^
  File ".../.env/lib/python3.11/site-packages/llama_cpp/_ctypes_extensions.py", line 160, in decorator
    func = getattr(lib, name)
           ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/ctypes/__init__.py", line 389, in __getattr__
    func = self.__getitem__(name)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/ctypes/__init__.py", line 394, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: /usr/lib/libllama.so.0: undefined symbol: llama_params_fit

I'm not sure if it's related to my OS, Python version, or this project. Last time I compiled it (about two days ago) it worked fine.

I'm currently refactoring some logic locally, but the hybrid structure of qwen3-next and qwen3.5 is basically running. You can continue testing after I finish the initial implementation.

Does it crash because of this?

@JamePeng
Copy link
Owner

No, it's because your compiled library is incompatible or the library file is missing.

@alcoftTAO
Copy link
Author

Okay, thanks! I'll try to fix it as soon as possible to continue testing.

@JamePeng
Copy link
Owner

JamePeng commented Feb 26, 2026

The initial implementation is now live. You can try out qwen3.5 and see how it performs. Looking forward to your feedback.
Otherwise,hybrid structure models also have many multimodal. For example, LFM2-VL can now run normally without the previous hack code, and LFM2.5-VL should also have no problem.

@alcoftTAO
Copy link
Author

I've tested the chat template with and without images, thinking mode, etc.

I've only used Qwen3.5-27B (quantized to IQ2_M and mmproj in F16) with a ctx of 4096 tokens, but should work fine with any other Qwen3.5-series model and quantization.


The model seems to work fine without images, but hallucinates a lot when using images (can't describe them properly, etc.).

Since the model is so big and I'm just asking for a simple description of the image, the quantization I'm using should not be a problem.

Note

Activating the thinking mode and looking at the reasoning content, the model does not see the image. I'll check the chat template to make sure it's not causing this problem.

The user wants me to describe an image. However, looking at the "Picture 1" content provided, it appears to be empty or contains only exclamation marks (which might be a placeholder or error). Since I cannot see any actual image content in the prompt, I need to inform the user that I cannot see the image.
...

I'll continue to work on this.

@alcoftTAO
Copy link
Author

Looking at the console output, I find this:

find_slot: non-consecutive token position 513 after 512 for sequence 0 with 512 new tokens
find_slot: non-consecutive token position 513 after 512 for sequence 0 with 512 new tokens
find_slot: non-consecutive token position 513 after 513 for sequence 0 with 512 new tokens
find_slot: non-consecutive token position 513 after 513 for sequence 0 with 512 new tokens
find_slot: non-consecutive token position 559 after 513 for sequence 0 with 15 new tokens
find_slot: non-consecutive token position 559 after 513 for sequence 0 with 15 new tokens
find_slot: non-consecutive token position 513 after 512 for sequence 0 with 512 new tokens
find_slot: non-consecutive token position 513 after 512 for sequence 0 with 512 new tokens
find_slot: non-consecutive token position 513 after 513 for sequence 0 with 512 new tokens
find_slot: non-consecutive token position 513 after 513 for sequence 0 with 512 new tokens
find_slot: non-consecutive token position 557 after 513 for sequence 0 with 13 new tokens
find_slot: non-consecutive token position 557 after 513 for sequence 0 with 13 new tokens

I have no idea if it's related to this bug, but is only printed when adding an image to the prompt.


Comparing the Qwen35ChatHandler's template with the Qwen3VLChatHandler's template, the code to load an image is similar:

Qwen3VLChatHandler:

...
                    {%- if 'image_url' in content -%}
                        {%- set image_count.value = image_count.value + 1 -%}
                        {%- if add_vision_id -%}
                            {{- 'Picture ' -}}
                            {{- image_count.value | string -}}
                            {{- ': ' -}}
                        {%- endif -%}
                        {{- '<|vision_start|>' -}}
                        {%- if content.image_url is string -%}
                            {{- content.image_url -}}
                        {%- else -%}
                            {{- content.image_url.url -}}
                        {%- endif -%}
                        {{- '<|vision_end|>' -}}
                    {%- endif -%}
...

Qwen35ChatHandler:

...
                    {%- if 'image' in item or 'image_url' in item -%}
                        {%- if is_system_content -%}
                            {{- raise_exception('System message cannot contain images.') -}}
                        {%- endif -%}
                        {%- if do_vision_count -%}
                            {%- set image_count.value = image_count.value + 1 -%}
                        {%- endif -%}
                        {%- if add_vision_id -%}
                            {{- 'Picture ' ~ image_count.value ~ ': ' -}}
                        {%- endif -%}
                        {{- '<|vision_start|>' -}}
                        {%- if 'image' in item -%}
                            {%- if item.image is string -%}
                                {{- item.image -}}
                            {%- else -%}
                                {{- item.image.url -}}
                            {%- endif -%}
                        {%- elif 'image_url' in item -%}
                            {%- if item.image_url is string -%}
                                {{- item.image_url -}}
                            {%- else -%}
                                {{- item.image_url.url -}}
                            {%- endif -%}
                        {%- endif -%}
                        {{- '<|vision_end|>' -}}
                    {%- elif 'video' in item -%}
...

@JamePeng
Copy link
Owner

JamePeng commented Feb 27, 2026

        # Clear state for multiple runs
        llama.reset()
        llama._ctx.memory_clear(True)
        llama.n_tokens = 0

These all need to be removed; the generation and eval processes should now manage them.

@JamePeng
Copy link
Owner

The model seems to work fine without images, but hallucinates a lot when using images (can't describe them properly, etc.).

Since the model is so big and I'm just asking for a simple description of the image, the quantization I'm using should not be a problem.

Note

Activating the thinking mode and looking at the reasoning content, the model does not see the image. I'll check the chat template to make sure it's not causing this problem.

You can check how the final format template it constructs is assembled, and whether there are any <media> elements mounted in it.

@yamikumo-DSD
Copy link

        # Clear state for multiple runs
        llama.reset()
        llama._ctx.memory_clear(True)
        llama.n_tokens = 0

These all need to be removed; the generation and eval processes should now manage them.

I guess, even after you remove this part, an identical codes will be called by Llava15ChatHandler because of return super().__call__(**kwargs). Is it Okay?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants