Eval bug: Error running multiple contexts from multiple threads at the same time with Vulkan #11371

charlesrwest · 2025-01-23T13:32:49Z

Name and Version

This appears to be the same bug as noted in this issue:
#7575

We are trying to do inference from multiple threads with some contexts having LORAs loaded and others not (so batched inference isn't going to work). If I may ask, has there been any progress on this issue? We are currently using a build from mid September 2024.

Operating systems

Windows

GGML backends

Vulkan

Hardware

2x Nvidia RTX 3090s.

Models

Meta Llama 3.2 3B 8 bit quant.

Problem description & steps to reproduce

When we run llama_decode with different contexts in different threads, we get a crash. The only way around this appears to be to strictly control access to llama_decode and LORA loading via a mutex.

First Bad Commit

No response

Relevant log output

It appears to be an error in vkQueueSubmit, line 1101.

charlesrwest added the bug-unconfirmed label Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: Error running multiple contexts from multiple threads at the same time with Vulkan #11371

Eval bug: Error running multiple contexts from multiple threads at the same time with Vulkan #11371

charlesrwest commented Jan 23, 2025

Eval bug: Error running multiple contexts from multiple threads at the same time with Vulkan #11371

Eval bug: Error running multiple contexts from multiple threads at the same time with Vulkan #11371

Comments

charlesrwest commented Jan 23, 2025

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output