Add support for GLM-Edge and GLM-Edge-V series models #10573

piDack · 2024-11-29T05:53:51Z

This pull request support for the GLM-Edge-Chat 1.5B & 4B and GLM-Edge-V 2B & 5B series of models within the llama.cpp.

Note: The current model pretrain -> gguf only supports using the transformers version 4.47.0.dev0.

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

…port_glm_edge_model

src/llama.cpp

Co-authored-by: Xuan Son Nguyen <[email protected]>

arch-btw · 2024-12-11T17:23:43Z

Works great.

piDack · 2024-12-19T08:51:22Z

Is there anyone available to review the code?

arch-btw · 2024-12-24T21:05:52Z

@piDack As this would be adding support for GlmForCausalLM for these vision models, I'm curious to know if we could create a more modular or generic implementation that could also be used for the other GlmForCausalLM model(s)?

I'm asking because glm-4-9b-chat-hf is currently broken with the new transformers-only implementation:

python convert_hf_to_gguf.py /home/Models/glm-4-9b-chat-hf --outtype f32
INFO:hf-to-gguf:Loading model: glm-4-9b-chat-hf
ERROR:hf-to-gguf:Model GlmForCausalLM is not supported

The version with the custom python files still works but if we're moving away from that (related discussion), it might be best to support GlmForCausalLM in general.

Are there any parts of this PR that could be refactored or generalized for broader applicability so that we can support both and maybe upcoming models? Thank you.

piDack · 2025-01-07T09:39:35Z

@piDack As this would be adding support for GlmForCausalLM for these vision models, I'm curious to know if we could create a more modular or generic implementation that could also be used for the other GlmForCausalLM model(s)?

I'm asking because glm-4-9b-chat-hf is currently broken with the new transformers-only implementation:
python convert_hf_to_gguf.py /home/Models/glm-4-9b-chat-hf --outtype f32
INFO:hf-to-gguf:Loading model: glm-4-9b-chat-hf
ERROR:hf-to-gguf:Model GlmForCausalLM is not supported
The version with the custom python files still works but if we're moving away from that (related discussion), it might be best to support GlmForCausalLM in general.

Are there any parts of this PR that could be refactored or generalized for broader applicability so that we can support both and maybe upcoming models? Thank you.

I will tried to do it

piDack · 2025-01-26T08:02:21Z

@piDack As this would be adding support for GlmForCausalLM for these vision models, I'm curious to know if we could create a more modular or generic implementation that could also be used for the other GlmForCausalLM model(s)?

I'm asking because glm-4-9b-chat-hf is currently broken with the new transformers-only implementation:
python convert_hf_to_gguf.py /home/Models/glm-4-9b-chat-hf --outtype f32
INFO:hf-to-gguf:Loading model: glm-4-9b-chat-hf
ERROR:hf-to-gguf:Model GlmForCausalLM is not supported
The version with the custom python files still works but if we're moving away from that (related discussion), it might be best to support GlmForCausalLM in general.

Are there any parts of this PR that could be refactored or generalized for broader applicability so that we can support both and maybe upcoming models? Thank you.

done

arch-btw · 2025-01-28T13:15:04Z

Perfect! Thank you @piDack
@ngxson could you please take a look?

ngxson · 2025-01-28T18:56:47Z

src/llama-chat.cpp

@@ -114,7 +115,7 @@ llm_chat_template llm_chat_detect_template(const std::string & tmpl) {
        }
    } else if (tmpl_contains("<|assistant|>") && tmpl_contains("<|end|>")) {
        return LLM_CHAT_TEMPLATE_PHI_3;
-    } else if (tmpl_contains("<|assistant|>") && tmpl_contains("<|user|>")) {
+    } else if (tmpl_contains("\n<|assistant|>") && tmpl_contains("<|user|>")) {


This can be a dangerous change because it can break existing chat templates (the test is not exhaustive, unfortunately)

Would be nice to add support for LLM_CHAT_TEMPLATE_GLMEDGE while not touching existing code

ngxson · 2025-01-28T18:57:26Z

src/llama-model.cpp


-                        layer.wqkv = create_tensor(tn(LLM_TENSOR_ATTN_QKV, "weight", i), {n_embd, n_embd + 2*n_embd_gqa}, 0);
-                        layer.bqkv = create_tensor(tn(LLM_TENSOR_ATTN_QKV, "bias", i),   {n_embd + 2*n_embd_gqa}, 0);
+                        if(layer.wqkv == nullptr){


Suggested change

if(layer.wqkv == nullptr){

if(layer.wqkv == nullptr) {

Small style fix (please also apply it to other places)

ngxson · 2025-01-28T18:57:57Z

src/llama.cpp

-                cb(Qcur, "Qcur", il);
-                cb(Kcur, "Kcur", il);
-                cb(Vcur, "Vcur", il);
+                if(model.type == LLM_TYPE_1_5B|| model.type == LLM_TYPE_4B || model.type == LLM_TYPE_9B){


Suggested change

if(model.type == LLM_TYPE_1_5B|| model.type == LLM_TYPE_4B || model.type == LLM_TYPE_9B){

if(model.type == LLM_TYPE_1_5B|| model.type == LLM_TYPE_4B || model.type == LLM_TYPE_9B) {

same here

ngxson · 2025-01-28T18:58:26Z

src/llama.cpp

+                        cur = ggml_add(ctx0, cur, model.layers[il].bqkv);
+                        cb(cur, "bqkv", il);
+                    }
+                    Qcur = ggml_cont(ctx0, ggml_view_2d(ctx0, cur, n_embd,     n_tokens, cur->nb[1], 0*sizeof(float)*(n_embd)));


0*sizeof(float)*(n_embd) is always equal to 0, is this expected?

ngxson · 2025-01-28T19:02:46Z

examples/llava/clip.cpp

@@ -2475,6 +2553,12 @@ bool clip_image_batch_encode(clip_ctx * ctx, const int n_threads, const clip_ima
    if (ctx->has_minicpmv_projector) {
        GGML_ASSERT(batch_size == 1);
    }
+    if(ctx->has_glm_projector){
+        GGML_ASSERT(batch_size == 1);
+        ggml_tensor * boi = ctx->vision_model.boi_w;


I don't quite get how these boi and eoi are used. I guess that they are 1d array of the begin-of-image and end-of-image token embedding, right?

If that's incorrect, you explain this a bit further? Thanks.

liyuhang and others added 7 commits November 8, 2024 03:33

add glm edge chat model

677058f

Merge branch 'master' of https://github.com/piDack/llama.cpp into sup…

a249dc0

…port_glm_edge_model

use config partial_rotary_factor as rope ratio

4f69662

support for glm edge model

6fc90cb

Merge branch 'master' of https://github.com/piDack/llama.cpp into sup…

ae41d3e

…port_glm_edge_model

vision model support

3b27041

remove debug info

55a6f95

github-actions bot added testing Everything test related examples python python script changes labels Nov 29, 2024

piDack added 4 commits November 29, 2024 06:05

fix format

7d80a4a

llava.cpp trailing whitespace

6c50e9c

Merge branch 'master' of https://github.com/piDack/llama.cpp into sup…

816d93d

…port_glm_edge_model

remove unused AutoTokenizer

6928805

ngxson reviewed Nov 29, 2024

View reviewed changes

src/llama.cpp Outdated Show resolved Hide resolved

piDack and others added 5 commits November 30, 2024 10:29

Update src/llama.cpp for not contain <|end|> or </s>

5ff5632

Co-authored-by: Xuan Son Nguyen <[email protected]>

Merge branch 'master' into support_glm_edge_model

82cbfda

add edge template

3b409c1

fix chat template

6e9fdb0

Merge branch 'ggerganov:master' into support_glm_edge_model

bc93d2a

Merge branch 'master' into support_glm_edge_model

f91cf62

fix confict

24bad77

arch-btw mentioned this pull request Jan 16, 2025

Misc. bug: cannot convert GLM-4-9B-Chat (glm-4-9b-chat-hf) to GGUF format #11263

Open

liyuhang added 3 commits January 26, 2025 10:40

merge

86bce2b

fix confict

d9db092

fix ci err

593cc86

liyuhang added 3 commits January 26, 2025 12:57

fix format err

f077b03

fix template err

9f5d809

9b hf chat support

1099ef2

ngxson reviewed Jan 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for GLM-Edge and GLM-Edge-V series models #10573

Add support for GLM-Edge and GLM-Edge-V series models #10573

piDack commented Nov 29, 2024

arch-btw commented Dec 11, 2024

piDack commented Dec 19, 2024

arch-btw commented Dec 24, 2024

piDack commented Jan 7, 2025

piDack commented Jan 26, 2025

arch-btw commented Jan 28, 2025

ngxson Jan 28, 2025

ngxson Jan 28, 2025

ngxson Jan 28, 2025

ngxson Jan 28, 2025

ngxson Jan 28, 2025

	if(model.type == LLM_TYPE_1_5B\|\| model.type == LLM_TYPE_4B \|\| model.type == LLM_TYPE_9B){
	if(model.type == LLM_TYPE_1_5B\|\| model.type == LLM_TYPE_4B \|\| model.type == LLM_TYPE_9B) {

Add support for GLM-Edge and GLM-Edge-V series models #10573

Are you sure you want to change the base?

Add support for GLM-Edge and GLM-Edge-V series models #10573

Conversation

piDack commented Nov 29, 2024

arch-btw commented Dec 11, 2024

piDack commented Dec 19, 2024

arch-btw commented Dec 24, 2024

piDack commented Jan 7, 2025

piDack commented Jan 26, 2025

arch-btw commented Jan 28, 2025

ngxson Jan 28, 2025

Choose a reason for hiding this comment

ngxson Jan 28, 2025

Choose a reason for hiding this comment

ngxson Jan 28, 2025

Choose a reason for hiding this comment

ngxson Jan 28, 2025

Choose a reason for hiding this comment

ngxson Jan 28, 2025

Choose a reason for hiding this comment