-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for GLM-Edge and GLM-Edge-V series models #10573
base: master
Are you sure you want to change the base?
Conversation
…port_glm_edge_model
…port_glm_edge_model
Is there anyone available to review the code? |
@piDack As this would be adding support for GlmForCausalLM for these vision models, I'm curious to know if we could create a more modular or generic implementation that could also be used for the other GlmForCausalLM model(s)? I'm asking because glm-4-9b-chat-hf is currently broken with the new transformers-only implementation:
The version with the custom python files still works but if we're moving away from that (related discussion), it might be best to support GlmForCausalLM in general. Are there any parts of this PR that could be refactored or generalized for broader applicability so that we can support both and maybe upcoming models? Thank you. |
I will tried to do it |
done |
@@ -114,7 +115,7 @@ llm_chat_template llm_chat_detect_template(const std::string & tmpl) { | |||
} | |||
} else if (tmpl_contains("<|assistant|>") && tmpl_contains("<|end|>")) { | |||
return LLM_CHAT_TEMPLATE_PHI_3; | |||
} else if (tmpl_contains("<|assistant|>") && tmpl_contains("<|user|>")) { | |||
} else if (tmpl_contains("\n<|assistant|>") && tmpl_contains("<|user|>")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be a dangerous change because it can break existing chat templates (the test is not exhaustive, unfortunately)
Would be nice to add support for LLM_CHAT_TEMPLATE_GLMEDGE
while not touching existing code
|
||
layer.wqkv = create_tensor(tn(LLM_TENSOR_ATTN_QKV, "weight", i), {n_embd, n_embd + 2*n_embd_gqa}, 0); | ||
layer.bqkv = create_tensor(tn(LLM_TENSOR_ATTN_QKV, "bias", i), {n_embd + 2*n_embd_gqa}, 0); | ||
if(layer.wqkv == nullptr){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if(layer.wqkv == nullptr){ | |
if(layer.wqkv == nullptr) { |
Small style fix (please also apply it to other places)
cb(Qcur, "Qcur", il); | ||
cb(Kcur, "Kcur", il); | ||
cb(Vcur, "Vcur", il); | ||
if(model.type == LLM_TYPE_1_5B|| model.type == LLM_TYPE_4B || model.type == LLM_TYPE_9B){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if(model.type == LLM_TYPE_1_5B|| model.type == LLM_TYPE_4B || model.type == LLM_TYPE_9B){ | |
if(model.type == LLM_TYPE_1_5B|| model.type == LLM_TYPE_4B || model.type == LLM_TYPE_9B) { |
same here
cur = ggml_add(ctx0, cur, model.layers[il].bqkv); | ||
cb(cur, "bqkv", il); | ||
} | ||
Qcur = ggml_cont(ctx0, ggml_view_2d(ctx0, cur, n_embd, n_tokens, cur->nb[1], 0*sizeof(float)*(n_embd))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0*sizeof(float)*(n_embd)
is always equal to 0
, is this expected?
@@ -2475,6 +2553,12 @@ bool clip_image_batch_encode(clip_ctx * ctx, const int n_threads, const clip_ima | |||
if (ctx->has_minicpmv_projector) { | |||
GGML_ASSERT(batch_size == 1); | |||
} | |||
if(ctx->has_glm_projector){ | |||
GGML_ASSERT(batch_size == 1); | |||
ggml_tensor * boi = ctx->vision_model.boi_w; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite get how these boi
and eoi
are used. I guess that they are 1d array of the begin-of-image and end-of-image token embedding, right?
If that's incorrect, you explain this a bit further? Thanks.
This pull request support for the GLM-Edge-Chat 1.5B & 4B and GLM-Edge-V 2B & 5B series of models within the llama.cpp.
Note: The current model pretrain -> gguf only supports using the transformers version 4.47.0.dev0.