-
Notifications
You must be signed in to change notification settings - Fork 13k
ggml-zdnn: fix #15414, activate FP16 and BF16 acceleration and incorrect zTensor free #15839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
…al if guard" This reverts commit 6e780a4. Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
This reverts commit 0da4b6a. Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Are you sure that you are looking at a weight? It might be part of the attention computation. |
Sorry I missed this. Yep I can confirm that I am looking at a weight tensor, unless my debugging code is wrong. Debug Patch diff --git a/ggml/src/ggml-zdnn/ggml-zdnn.cpp b/ggml/src/ggml-zdnn/ggml-zdnn.cpp
index 7947aab87..bd04beb2d 100644
--- a/ggml/src/ggml-zdnn/ggml-zdnn.cpp
+++ b/ggml/src/ggml-zdnn/ggml-zdnn.cpp
@@ -130,7 +130,11 @@ static void ggml_zdnn_mul_mat_op(ggml_backend_zdnn_context * ctx, const ggml_ten
// TODO: Weights are somehow not going through `ggml_backend_zdnn_buffer_set_tensor` during model loading.
// So we need to load the weights here. Remove this when the issue is fixed.
// Problem might be residing in `ggml_backend_zdnn_device_supports_buft`.
- if (weights_extra->ztensor.is_transformed == false) ggml_zdnn_load_tensor(weights_extra->ztensor, weights->data);
+ if (weights_extra->ztensor.is_transformed == false) {
+ GGML_LOG_INFO("%s: tensor->name = %s | tensor->buffer->usage = %d\n", __func__, weights->name, weights->buffer->usage);
+ ggml_zdnn_load_tensor(weights_extra->ztensor, weights->data);
+ std::raise(SIGINT);
+ }
// GGML_LOG_INFO("%s: tensor '%s' tensor dimensions: [%ld, %ld, %ld, %ld] pre_tfm_desc dimensions: [%ld, %ld, %ld, %ld]\n",
// __func__, weights_extra->name, And as logged, the buffer usage is
|
I did some digging as well and found out that setting
|
That's expected, of course you cannot enable user mapped buffers if you need to modify the tensor data. |
Got it. Will create a separate PR by tomorrow to fix it. Do let me know if I need to make any changes to this PR |
@@ -593,27 +603,6 @@ static ggml_guid_t ggml_backend_zdnn_guid(void) { | |||
return reinterpret_cast<ggml_guid_t>((void *)guid_str); | |||
} | |||
|
|||
// TODO: remove in the future | |||
ggml_backend_t ggml_backend_zdnn_init(void) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is still in the header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Fixed in latest push.
Signed-off-by: Aaron Teo <[email protected]>
fixes #15414
Not sure if my
.supports_buft
is implemented inaccurately but the weights tensor are not going through the.set_tensor
function, and thus, will have to re-initialise the weight zTensors on-the-fly during matmul. Not ideal though.Activates the following data types:
Fixes:
LLAMA_SET_ROWS=1
causing the inference to be incorrect (see: Eval bug: zDNN backend not inferencing correctly after LLAMA_SET_ROWS enablement #15414)llama-bench
was used with more than 1 model.init_tensor
for performance improvementsPerformance
Note
Tests were conducted on an IBM z17 Mainframe with 40 IFLs (cores) and 128 GB Memory on a shared R&D LPAR.
test-backend-ops