ggml-zdnn: fix #15414, activate FP16 and BF16 acceleration and incorrect zTensor free #15839

taronaeo · 2025-09-06T14:15:52Z

Not sure if my .supports_buft is implemented inaccurately but the weights tensor are not going through the .set_tensor function, and thus, will have to re-initialise the weight zTensors on-the-fly during matmul. Not ideal though.

Activates the following data types:

FP16
BF16

Fixes:

LLAMA_SET_ROWS=1 causing the inference to be incorrect (see: Eval bug: zDNN backend not inferencing correctly after LLAMA_SET_ROWS enablement #15414)
zTensor not freeing correctly and would exhaust all available memory when llama-bench was used with more than 1 model
Moved bias zTensor to .init_tensor for performance improvements

Performance

model	size	params	threads	test	t/s master	t/s PR	speedup
granite 3B all F32	9.44 GiB	2.53 B	1	pp512	52.14	51.92	1.00
granite 3B all F32	9.44 GiB	2.53 B	1	tg128	3.92	3.86	0.98
granite 3B all F32	9.44 GiB	2.53 B	2	pp512	92.60	81.92	0.88
granite 3B all F32	9.44 GiB	2.53 B	2	tg128	4.44	4.48	1.01
granite 3B all F32	9.44 GiB	2.53 B	4	pp512	141.14	144.85	1.03
granite 3B all F32	9.44 GiB	2.53 B	4	tg128	4.83	4.86	1.01
granite 3B all F32	9.44 GiB	2.53 B	8	pp512	216.55	215.82	1.00
granite 3B all F32	9.44 GiB	2.53 B	8	tg128	4.97	4.95	1.00
granite 3B F16	4.72 GiB	2.53 B	1	pp512	10.42	51.68	4.96
granite 3B F16	4.72 GiB	2.53 B	1	tg128	0.45	3.43	7.62
granite 3B F16	4.72 GiB	2.53 B	2	pp512	19.61	81.78	4.17
granite 3B F16	4.72 GiB	2.53 B	2	tg128	0.89	4.17	4.69
granite 3B F16	4.72 GiB	2.53 B	4	pp512	38.99	138.58	3.55
granite 3B F16	4.72 GiB	2.53 B	4	tg128	1.73	4.67	2.70
granite 3B F16	4.72 GiB	2.53 B	8	pp512	74.60	213.83	2.87
granite 3B F16	4.72 GiB	2.53 B	8	tg128	3.17	4.9	1.55
granite 3B BF16	4.72 GiB	2.53 B	1	pp512	11.30	51.6	4.57
granite 3B BF16	4.72 GiB	2.53 B	1	tg128	0.31	3.08	9.94
granite 3B BF16	4.72 GiB	2.53 B	2	pp512	21.40	82.45	3.85
granite 3B BF16	4.72 GiB	2.53 B	2	tg128	0.61	3.88	6.36
granite 3B BF16	4.72 GiB	2.53 B	4	pp512	42.28	142.97	3.38
granite 3B BF16	4.72 GiB	2.53 B	4	tg128	1.22	4.41	3.61
granite 3B BF16	4.72 GiB	2.53 B	8	pp512	80.90	213.85	2.64
granite 3B BF16	4.72 GiB	2.53 B	8	tg128	2.40	4.79	2.00

Note

Tests were conducted on an IBM z17 Mainframe with 40 IFLs (cores) and 128 GB Memory on a shared R&D LPAR.

`test-backend-ops`

build/bin/test-backend-ops -b zDNN | grep -v "not supported"
ggml_zdnn_init: allocating
ggml_zdnn_init: found 1 device
ggml_zdnn_init: picking default device: zDNN
ggml_zdnn_init: NNPA name: zDNN
ggml_zdnn_init: NNPA_PARMBLKFORMAT_0 = true
ggml_zdnn_init: NNPA_PARMBLKFORMAT_1 = true
Testing 3 devices

Backend 1/3: zDNN
  Device description: IBM Z Neural Network Processing Assist (NNPA)
  Device memory: 0 MB (0 MB free)

  MUL_MAT(type_a=f32,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=9,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=9,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=9,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=9,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=1,k=4,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=16,k=4,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=1,k=4,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=16,k=4,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=1,k=4,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=16,k=4,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=1,k=1,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
ggml_zdnn_free: deallocating
  12353/12353 tests passed
  Backend zDNN: OK
Backend 2/3: BLAS
  Skipping
Backend 3/3: CPU
  Skipping
3/3 backends passed
OK

Signed-off-by: Aaron Teo <[email protected]>

…al if guard" This reverts commit 6e780a4. Signed-off-by: Aaron Teo <[email protected]>

Signed-off-by: Aaron Teo <[email protected]>

This reverts commit 0da4b6a. Signed-off-by: Aaron Teo <[email protected]>

Signed-off-by: Aaron Teo <[email protected]>

slaren · 2025-09-08T11:50:34Z

Not sure if my .supports_buft is implemented inaccurately but the weights tensor are not going through the .set_tensor function, and thus, will have to re-initialise the weight zTensors on-the-fly during matmul. Not ideal though.

Are you sure that you are looking at a weight? It might be part of the attention computation.

taronaeo · 2025-09-09T18:04:46Z

Sorry I missed this. Yep I can confirm that I am looking at a weight tensor, unless my debugging code is wrong.

Debug Patch

diff --git a/ggml/src/ggml-zdnn/ggml-zdnn.cpp b/ggml/src/ggml-zdnn/ggml-zdnn.cpp
index 7947aab87..bd04beb2d 100644
--- a/ggml/src/ggml-zdnn/ggml-zdnn.cpp
+++ b/ggml/src/ggml-zdnn/ggml-zdnn.cpp
@@ -130,7 +130,11 @@ static void ggml_zdnn_mul_mat_op(ggml_backend_zdnn_context * ctx, const ggml_ten
     // TODO: Weights are somehow not going through `ggml_backend_zdnn_buffer_set_tensor` during model loading.
     //       So we need to load the weights here. Remove this when the issue is fixed.
     //       Problem might be residing in `ggml_backend_zdnn_device_supports_buft`.
-    if (weights_extra->ztensor.is_transformed == false) ggml_zdnn_load_tensor(weights_extra->ztensor, weights->data);
+    if (weights_extra->ztensor.is_transformed == false) {
+       GGML_LOG_INFO("%s: tensor->name = %s | tensor->buffer->usage = %d\n", __func__, weights->name, weights->buffer->usage);
+       ggml_zdnn_load_tensor(weights_extra->ztensor, weights->data);
+       std::raise(SIGINT);
+    }
 
     // GGML_LOG_INFO("%s: tensor '%s' tensor dimensions: [%ld, %ld, %ld, %ld] pre_tfm_desc dimensions: [%ld, %ld, %ld, %ld]\n",
     //               __func__, weights_extra->name,

And as logged, the buffer usage is 1, which equates to GGML_BACKEND_BUFFER_USAGE_WEIGHTS.

$ gdb --args build/bin/llama-cli -m hf_models/granite-3.3-2b-instruct-be.F32.gguf -t 8 -n 25 -p "Write me a dog walking business idea 1. " -no-cnv -ngl -1 --seed 1568795874

ggml_zdnn_mul_mat_op: tensor->name = blk.0.attn_q.weight | tensor->buffer->usage = 1

Thread 1 "llama-cli" received signal SIGINT, Interrupt.
0x000003fff6b98c26 in __pthread_kill_implementation () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-168.el9_6.23.s390x

taronaeo · 2025-09-09T18:22:40Z

I did some digging as well and found out that setting .buffer_from_host_ptr = false allows the weight tensors to go through .set_tensor whereas before, only the compute tensors were going through.

`.buffer_from_host_ptr = false`

diff --git a/ggml/src/ggml-zdnn/ggml-zdnn.cpp b/ggml/src/ggml-zdnn/ggml-zdnn.cpp
index 7947aab87..d6d1d06c8 100644
--- a/ggml/src/ggml-zdnn/ggml-zdnn.cpp
+++ b/ggml/src/ggml-zdnn/ggml-zdnn.cpp
@@ -432,9 +432,14 @@ static void ggml_backend_zdnn_buffer_set_tensor(ggml_backend_buffer_t buffer, gg
     memcpy((char *)tensor->data + offset, data, size);
 
     ggml_backend_zdnn_buffer * extra = (ggml_backend_zdnn_buffer *)tensor->extra;
+    GGML_LOG_INFO("%s: tensor->name = %s | tensor->buffer->usage = %d | tensor->extra->ztensor.is_transformed = %d\n", __func__, tensor->name, tensor->buffer->usage, extra->ztensor.is_transformed);
+
     if (extra->ztensor.is_transformed) zdnn_reset_ztensor(&extra->ztensor);
     ggml_zdnn_load_tensor(extra->ztensor, tensor->data);
 
+    GGML_LOG_INFO("%s: tensor->name = %s | tensor->buffer->usage = %d | tensor->extra->ztensor.is_transformed = %d\n", __func__, tensor->name, tensor->buffer->usage, extra->ztensor.is_transformed);
+    std::raise(SIGINT);
+
     GGML_UNUSED(buffer);
 }
 
@@ -647,7 +652,7 @@ static void ggml_backend_zdnn_device_get_props(ggml_backend_dev_t dev, ggml_back
     props->caps = (ggml_backend_dev_caps) {
         /* .async                = */ false,
         /* .host_buffer          = */ false,
-        /* .buffer_from_host_ptr = */ true,
+        /* .buffer_from_host_ptr = */ false,
         /* .events               = */ false
     };
 }

First tensor to call .set_tensor

ggml_backend_zdnn_buffer_set_tensor: tensor->name = blk.0.attn_q.weight | tensor->buffer->usage = 1 | tensor->extra->ztensor.is_transformed = 0
ggml_backend_zdnn_buffer_set_tensor: tensor->name = blk.0.attn_q.weight | tensor->buffer->usage = 1 | tensor->extra->ztensor.is_transformed = 1

`.buffer_from_host_ptr = true` (Current PR)

First tensor to call .set_tensor

ggml_backend_zdnn_buffer_set_tensor: tensor->name = zDNN#attn_norm-0#0 | tensor->buffer->usage = 2 | tensor->extra->ztensor.is_transformed = 0
ggml_backend_zdnn_buffer_set_tensor: tensor->name = zDNN#attn_norm-0#0 | tensor->buffer->usage = 2 | tensor->extra->ztensor.is_transformed = 1

Do let me know if this is weird. I intend on fixing the weight tensor problem in another PR, while this PR is mainly to fix the issues that have been preventing zDNN from inferencing correctly using the latest upstream code.

slaren · 2025-09-09T18:28:08Z

That's expected, of course you cannot enable user mapped buffers if you need to modify the tensor data.

taronaeo · 2025-09-09T18:37:08Z

Got it. Will create a separate PR by tomorrow to fix it. Do let me know if I need to make any changes to this PR

slaren · 2025-09-09T22:24:32Z

ggml/src/ggml-zdnn/ggml-zdnn.cpp

@@ -593,27 +603,6 @@ static ggml_guid_t ggml_backend_zdnn_guid(void) {
    return reinterpret_cast<ggml_guid_t>((void *)guid_str);
 }

-// TODO: remove in the future
-ggml_backend_t ggml_backend_zdnn_init(void) {


This function is still in the header.

Good catch. Fixed in latest push.

Signed-off-by: Aaron Teo <[email protected]>

taronaeo added 13 commits September 6, 2025 17:56

ggml-zdnn: first rev to fix ggml-org#15414

47509d4

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: trying to fix set_tensors without needing additional if guard

6e780a4

Signed-off-by: Aaron Teo <[email protected]>

Revert "ggml-zdnn: trying to fix set_tensors without needing addition…

bf285e0

…al if guard" This reverts commit 6e780a4. Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: clean up set_tensor

f4ec752

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: remove old code

e0bae5d

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: attempt at init bias in init_tensor

7de719a

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: add assert

81e2025

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: attempt at fixing sigsegv

5c31f9b

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: attempt at fixing double free

1a6d62b

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: fix incorrect ztensor free

8279a1c

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: add comments to look back

0a08b1d

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: activate fp16 and bf16

9ed5947

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: clean up matmul codepath

99311bc

Signed-off-by: Aaron Teo <[email protected]>

github-actions bot added ggml changes relating to the ggml tensor library for machine learning IBM zDNN issues specific to IBM zDNN Accelerator labels Sep 6, 2025

taronaeo added 3 commits September 7, 2025 02:00

ggml-zdnn: fix compiler warnings

d5b32ff

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: fix more compiler warnings

53b2ad9

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: rm origtensor from .get_tensor

4f6be46

Signed-off-by: Aaron Teo <[email protected]>

taronaeo requested a review from slaren September 6, 2025 18:25

taronaeo added 3 commits September 7, 2025 17:28

ggml-cpu: clean up s390x simd

0da4b6a

Signed-off-by: Aaron Teo <[email protected]>

Revert "ggml-cpu: clean up s390x simd"

5542933

This reverts commit 0da4b6a. Signed-off-by: Aaron Teo <[email protected]>

docs: update ops and build for zDNN

f231a9d

Signed-off-by: Aaron Teo <[email protected]>

github-actions bot added the documentation Improvements or additions to documentation label Sep 7, 2025

slaren reviewed Sep 9, 2025

View reviewed changes

ggml-zdnn: rm ggml_backend_zdnn_init from header files

326ce81

Signed-off-by: Aaron Teo <[email protected]>

taronaeo requested a review from slaren September 11, 2025 15:24

slaren approved these changes Sep 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml-zdnn: fix #15414, activate FP16 and BF16 acceleration and incorrect zTensor free #15839

ggml-zdnn: fix #15414, activate FP16 and BF16 acceleration and incorrect zTensor free #15839

taronaeo commented Sep 6, 2025 •

edited

Loading

Uh oh!

slaren commented Sep 8, 2025

Uh oh!

taronaeo commented Sep 9, 2025

Uh oh!

taronaeo commented Sep 9, 2025

Uh oh!

slaren commented Sep 9, 2025 •

edited

Loading

Uh oh!

taronaeo commented Sep 9, 2025

Uh oh!

slaren Sep 9, 2025

Uh oh!

taronaeo Sep 10, 2025

Uh oh!

Uh oh!

ggml-zdnn: fix #15414, activate FP16 and BF16 acceleration and incorrect zTensor free #15839

Are you sure you want to change the base?

ggml-zdnn: fix #15414, activate FP16 and BF16 acceleration and incorrect zTensor free #15839

Conversation

taronaeo commented Sep 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance

test-backend-ops

Uh oh!

slaren commented Sep 8, 2025

Uh oh!

taronaeo commented Sep 9, 2025

Uh oh!

taronaeo commented Sep 9, 2025

.buffer_from_host_ptr = false

.buffer_from_host_ptr = true (Current PR)

Uh oh!

slaren commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taronaeo commented Sep 9, 2025

Uh oh!

slaren Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

taronaeo Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

taronaeo commented Sep 6, 2025 •

edited

Loading

`test-backend-ops`

`.buffer_from_host_ptr = false`

`.buffer_from_host_ptr = true` (Current PR)

slaren commented Sep 9, 2025 •

edited

Loading