Implemented Qwen3.5 by alcoftTAO · Pull Request #61 · JamePeng/llama-cpp-python

alcoftTAO · 2026-02-16T19:58:31Z

This PR implements Qwen3.5 models. Not tested yet due to lack of compute power on my end.

This PR is going to be a draft for now until I can test it with smaller models and also check and fix the chat template and parameters of the Qwen35ChatHandler class.

I still need to decide which parameters are useful.

JamePeng · 2026-02-19T14:53:49Z

Detailed adaptation work can be done after Qwen3.5-9B-Instruct and Qwen3.5-35B-A3B-Instruct are released.
Indeed, the current open-source Qwen3.5 model is too large.

yamikumo-DSD · 2026-02-23T01:19:44Z

I've tested pruned version of the Qwen 3.5 model.
https://huggingface.co/infinityai/Qwen3.5-397B-REAP-50-IQ3_M/tree/main
This model is also suffered from memory_seq_rm failure problem (issue).
So, I guess even after the ChatHandler implemented, multiple turn conversation won't work currently.

alcoftTAO · 2026-02-24T22:47:36Z

I'm updating and testing this with Qwen3.5-27B right now.

alcoftTAO · 2026-02-26T20:27:13Z

Closed the PR temporarily to update to the latest commit.
I will continue to test this.

JamePeng · 2026-02-26T20:30:56Z

I'm currently refactoring some logic locally, but the hybrid structure of qwen3-next and qwen3.5 is basically running. You can continue testing after I finish the initial implementation.

alcoftTAO · 2026-02-26T20:47:58Z

I'm having some trouble testing this.

Traceback (most recent call last):
  ...
  File ".../.env/lib/python3.11/site-packages/llama_cpp/__init__.py", line 1, in <module>
    from .llama_cpp import *
  File ".../.env/lib/python3.11/site-packages/llama_cpp/llama_cpp.py", line 1391, in <module>
    @ctypes_function(
     ^^^^^^^^^^^^^^^^
  File ".../.env/lib/python3.11/site-packages/llama_cpp/_ctypes_extensions.py", line 160, in decorator
    func = getattr(lib, name)
           ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/ctypes/__init__.py", line 389, in __getattr__
    func = self.__getitem__(name)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/ctypes/__init__.py", line 394, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: /usr/lib/libllama.so.0: undefined symbol: llama_params_fit

I'm not sure if it's related to my OS, Python version, or this project. Last time I compiled it (about two days ago) it worked fine.

I'm currently refactoring some logic locally, but the hybrid structure of qwen3-next and qwen3.5 is basically running. You can continue testing after I finish the initial implementation.

Does it crash because of this?

JamePeng · 2026-02-26T21:25:43Z

No, it's because your compiled library is incompatible or the library file is missing.

alcoftTAO · 2026-02-26T22:04:04Z

Okay, thanks! I'll try to fix it as soon as possible to continue testing.

JamePeng · 2026-02-26T22:12:04Z

The initial implementation is now live. You can try out qwen3.5 and see how it performs. Looking forward to your feedback.
Otherwise，hybrid structure models also have many multimodal. For example, LFM2-VL can now run normally without the previous hack code, and LFM2.5-VL should also have no problem.

alcoftTAO · 2026-02-27T04:07:59Z

I've tested the chat template with and without images, thinking mode, etc.

I've only used Qwen3.5-27B (quantized to IQ2_M and mmproj in F16) with a ctx of 4096 tokens, but should work fine with any other Qwen3.5-series model and quantization.

The model seems to work fine without images, but hallucinates a lot when using images (can't describe them properly, etc.).

Since the model is so big and I'm just asking for a simple description of the image, the quantization I'm using should not be a problem.

Note

Activating the thinking mode and looking at the reasoning content, the model does not see the image. I'll check the chat template to make sure it's not causing this problem.

The user wants me to describe an image. However, looking at the "Picture 1" content provided, it appears to be empty or contains only exclamation marks (which might be a placeholder or error). Since I cannot see any actual image content in the prompt, I need to inform the user that I cannot see the image.
...

I'll continue to work on this.

alcoftTAO · 2026-02-27T04:20:17Z

Looking at the console output, I find this:

find_slot: non-consecutive token position 513 after 512 for sequence 0 with 512 new tokens
find_slot: non-consecutive token position 513 after 512 for sequence 0 with 512 new tokens
find_slot: non-consecutive token position 513 after 513 for sequence 0 with 512 new tokens
find_slot: non-consecutive token position 513 after 513 for sequence 0 with 512 new tokens
find_slot: non-consecutive token position 559 after 513 for sequence 0 with 15 new tokens
find_slot: non-consecutive token position 559 after 513 for sequence 0 with 15 new tokens
find_slot: non-consecutive token position 513 after 512 for sequence 0 with 512 new tokens
find_slot: non-consecutive token position 513 after 512 for sequence 0 with 512 new tokens
find_slot: non-consecutive token position 513 after 513 for sequence 0 with 512 new tokens
find_slot: non-consecutive token position 513 after 513 for sequence 0 with 512 new tokens
find_slot: non-consecutive token position 557 after 513 for sequence 0 with 13 new tokens
find_slot: non-consecutive token position 557 after 513 for sequence 0 with 13 new tokens

I have no idea if it's related to this bug, but is only printed when adding an image to the prompt.

Comparing the Qwen35ChatHandler's template with the Qwen3VLChatHandler's template, the code to load an image is similar:

Qwen3VLChatHandler:

...
                    {%- if 'image_url' in content -%}
                        {%- set image_count.value = image_count.value + 1 -%}
                        {%- if add_vision_id -%}
                            {{- 'Picture ' -}}
                            {{- image_count.value | string -}}
                            {{- ': ' -}}
                        {%- endif -%}
                        {{- '<|vision_start|>' -}}
                        {%- if content.image_url is string -%}
                            {{- content.image_url -}}
                        {%- else -%}
                            {{- content.image_url.url -}}
                        {%- endif -%}
                        {{- '<|vision_end|>' -}}
                    {%- endif -%}
...

Qwen35ChatHandler:

...
                    {%- if 'image' in item or 'image_url' in item -%}
                        {%- if is_system_content -%}
                            {{- raise_exception('System message cannot contain images.') -}}
                        {%- endif -%}
                        {%- if do_vision_count -%}
                            {%- set image_count.value = image_count.value + 1 -%}
                        {%- endif -%}
                        {%- if add_vision_id -%}
                            {{- 'Picture ' ~ image_count.value ~ ': ' -}}
                        {%- endif -%}
                        {{- '<|vision_start|>' -}}
                        {%- if 'image' in item -%}
                            {%- if item.image is string -%}
                                {{- item.image -}}
                            {%- else -%}
                                {{- item.image.url -}}
                            {%- endif -%}
                        {%- elif 'image_url' in item -%}
                            {%- if item.image_url is string -%}
                                {{- item.image_url -}}
                            {%- else -%}
                                {{- item.image_url.url -}}
                            {%- endif -%}
                        {%- endif -%}
                        {{- '<|vision_end|>' -}}
                    {%- elif 'video' in item -%}
...

JamePeng · 2026-02-27T09:53:40Z

        # Clear state for multiple runs
        llama.reset()
        llama._ctx.memory_clear(True)
        llama.n_tokens = 0

These all need to be removed; the generation and eval processes should now manage them.

JamePeng · 2026-02-27T10:01:05Z

The model seems to work fine without images, but hallucinates a lot when using images (can't describe them properly, etc.).

Since the model is so big and I'm just asking for a simple description of the image, the quantization I'm using should not be a problem.

Note

Activating the thinking mode and looking at the reasoning content, the model does not see the image. I'll check the chat template to make sure it's not causing this problem.

You can check how the final format template it constructs is assembled, and whether there are any <media> elements mounted in it.

yamikumo-DSD · 2026-02-27T11:56:31Z

        # Clear state for multiple runs
        llama.reset()
        llama._ctx.memory_clear(True)
        llama.n_tokens = 0

These all need to be removed; the generation and eval processes should now manage them.

I guess, even after you remove this part, an identical codes will be called by Llava15ChatHandler because of return super().__call__(**kwargs). Is it Okay?

JamePeng force-pushed the main branch 5 times, most recently from 76d8272 to 68eacae Compare February 19, 2026 14:03

JamePeng force-pushed the main branch from 762fa07 to f6dda38 Compare February 26, 2026 16:26

alcoftTAO closed this Feb 26, 2026

alcoftTAO force-pushed the main branch from a841552 to 781790f Compare February 26, 2026 20:23

updated template

af3d8bd

alcoftTAO reopened this Feb 26, 2026

JamePeng force-pushed the main branch from 2861c22 to 978ddf7 Compare February 26, 2026 22:20

alcoftTAO and others added 2 commits February 27, 2026 01:43

Merge branch 'JamePeng:main' into main

8a2395a

tested chat template, changed some parameters

60e1015

Conversation

alcoftTAO commented Feb 16, 2026

Uh oh!

JamePeng commented Feb 19, 2026

Uh oh!

yamikumo-DSD commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alcoftTAO commented Feb 24, 2026

Uh oh!

alcoftTAO commented Feb 26, 2026

Uh oh!

JamePeng commented Feb 26, 2026

Uh oh!

alcoftTAO commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JamePeng commented Feb 26, 2026

Uh oh!

alcoftTAO commented Feb 26, 2026

Uh oh!

JamePeng commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alcoftTAO commented Feb 27, 2026

Uh oh!

alcoftTAO commented Feb 27, 2026

Uh oh!

JamePeng commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JamePeng commented Feb 27, 2026

Uh oh!

yamikumo-DSD commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yamikumo-DSD commented Feb 23, 2026 •

edited

Loading

alcoftTAO commented Feb 26, 2026 •

edited

Loading

JamePeng commented Feb 26, 2026 •

edited

Loading

JamePeng commented Feb 27, 2026 •

edited

Loading