opencl: tiled mul_mat with local memory for f16 and f32 #14809

lhez · 2025-07-22T07:08:26Z

This PR adds another variant of tiled matmul for f32 and f16. They also use local memory for tiling and pretty much follow the standard pattern. The main difference from #14535 is that tiles from both src0 and src1 are transposed.

On Adreno 830,

master

model	size	params	backend	ngl	test	t/s
qwen2 1.5B F16	2.88 GiB	1.54 B	OpenCL	99	pp512	145.45 ± 2.56
qwen2 1.5B F16	2.88 GiB	1.54 B	OpenCL	99	tg128	17.68 ± 0.14

this PR

model	size	params	backend	ngl	test	t/s
qwen2 1.5B F16	2.88 GiB	1.54 B	OpenCL	99	pp512	174.30 ± 10.32
qwen2 1.5B F16	2.88 GiB	1.54 B	OpenCL	99	tg128	17.73 ± 0.12

opencl: add mul_mat_f32_f32_l4_lm and mul_mat_f16_f32_l4_lm

5666ed9

github-actions bot added ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend labels Jul 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

opencl: tiled mul_mat with local memory for f16 and f32 #14809

opencl: tiled mul_mat with local memory for f16 and f32 #14809

lhez commented Jul 22, 2025

Uh oh!

Uh oh!

opencl: tiled mul_mat with local memory for f16 and f32 #14809

Are you sure you want to change the base?

opencl: tiled mul_mat with local memory for f16 and f32 #14809

Conversation

lhez commented Jul 22, 2025

Uh oh!

Uh oh!