vulkan: add environment variable to avoid VRAM allocation #11592

wbruna · 2025-02-02T12:27:24Z

With Vulkan on my PC (Ryzen 5 3400G APU, DDR4-3000, Debian 12), I noticed big performance drops (~2x or ~3x) associated with buffer allocations on VRAM.

It's easier to test with stable-diffusion.cpp: the VAE step on a 512x512 sd1.5 generation usually takes around 40 seconds with the default 2G dedicated VRAM. But if I restrict VRAM to a very small value (64M-80M), that timing drops to around 13 seconds.

I noticed a similar performance drop on LLMs, but it's harder to pinpoint. For instance, prompt processing on smaller models running nearly twice as slow as larger ones, performance changing right after a koboldcpp restart, or inconsistent results between benchmarks and generation.

Checking with GGML_VULKAN_MEMORY_DEBUG, the slower behavior seems to be always associated with allocations on device memory, so I added this env var to confirm. And forcing host memory allocations seems to fix the performance drop.

OTOH, I don't see the original performance issue on a 4500U laptop (Ubuntu 24.04, DDR4-3200), so this would benefit from testing on different iGPU+OS combinations.

…VRAM allocation

vulkan: add environment variable GGML_VK_PREFER_HOST_MEMORY to avoid …

27df617

…VRAM allocation

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Feb 2, 2025

0cc4m self-requested a review February 3, 2025 09:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vulkan: add environment variable to avoid VRAM allocation #11592

vulkan: add environment variable to avoid VRAM allocation #11592

wbruna commented Feb 2, 2025

vulkan: add environment variable to avoid VRAM allocation #11592

Are you sure you want to change the base?

vulkan: add environment variable to avoid VRAM allocation #11592

Conversation

wbruna commented Feb 2, 2025