You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This appears to be the same bug as noted in this issue: #7575
We are trying to do inference from multiple threads with some contexts having LORAs loaded and others not (so batched inference isn't going to work). If I may ask, has there been any progress on this issue? We are currently using a build from mid September 2024.
Operating systems
Windows
GGML backends
Vulkan
Hardware
2x Nvidia RTX 3090s.
Models
Meta Llama 3.2 3B 8 bit quant.
Problem description & steps to reproduce
When we run llama_decode with different contexts in different threads, we get a crash. The only way around this appears to be to strictly control access to llama_decode and LORA loading via a mutex.
First Bad Commit
No response
Relevant log output
It appears to be an error in vkQueueSubmit, line 1101.
The text was updated successfully, but these errors were encountered:
Name and Version
This appears to be the same bug as noted in this issue:
#7575
We are trying to do inference from multiple threads with some contexts having LORAs loaded and others not (so batched inference isn't going to work). If I may ask, has there been any progress on this issue? We are currently using a build from mid September 2024.
Operating systems
Windows
GGML backends
Vulkan
Hardware
2x Nvidia RTX 3090s.
Models
Meta Llama 3.2 3B 8 bit quant.
Problem description & steps to reproduce
When we run llama_decode with different contexts in different threads, we get a crash. The only way around this appears to be to strictly control access to llama_decode and LORA loading via a mutex.
First Bad Commit
No response
Relevant log output
It appears to be an error in vkQueueSubmit, line 1101.
The text was updated successfully, but these errors were encountered: