Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eval bug: Error running multiple contexts from multiple threads at the same time with Vulkan #11371

Open
charlesrwest opened this issue Jan 23, 2025 · 0 comments

Comments

@charlesrwest
Copy link

Name and Version

This appears to be the same bug as noted in this issue:
#7575

We are trying to do inference from multiple threads with some contexts having LORAs loaded and others not (so batched inference isn't going to work). If I may ask, has there been any progress on this issue? We are currently using a build from mid September 2024.

Operating systems

Windows

GGML backends

Vulkan

Hardware

2x Nvidia RTX 3090s.

Models

Meta Llama 3.2 3B 8 bit quant.

Problem description & steps to reproduce

When we run llama_decode with different contexts in different threads, we get a crash. The only way around this appears to be to strictly control access to llama_decode and LORA loading via a mutex.

First Bad Commit

No response

Relevant log output

It appears to be an error in vkQueueSubmit, line 1101.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant