-
Notifications
You must be signed in to change notification settings - Fork 13k
webgpu : fix build on emscripten #15826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@@ -148,3 +148,6 @@ poetry.toml | |||
/run-vim.sh | |||
/run-chat.sh | |||
.ccache/ | |||
|
|||
# emscripten | |||
a.out.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just a.out*
?
@ggerganov @slaren Quick question, I'm building So I'm wondering, is there any ways to completely disable threadpool? Edit: I'm referring to this code: llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c Lines 3124 to 3130 in c4df49a
|
I think a threadpool is currently required - don't think there is an easy workaround. @max-krasnyansky any thoughts? |
Hmm ok that means both wllama and whisper.cpp single-thread wasm builds are currently broken. Having single-thread support would be nice, but it's not urgent though. |
Yes, we should support to launch a single-thread compute without invoking synchronization primitives and spawning threads so that thread-less WASM works. Shouldn't be hard to implement. Looking at the implementation, I think almost everything is inplace for that. Where does the single-thread WASM fail when you call ggml compute with |
It currently fails at this line: llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c Line 3091 in c4df49a
I don't have the stack trace due to some difficulty debugging in-browser, but it's very likely invoked by this line: llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c Line 3129 in c4df49a
Where we try to create a threadpool of one single thread |
Also just want to note that atomic ops like |
To enter that loop, it would mean that for (int j = 1; j < tpp->n_threads; j++) {
ggml_thread_cpumask_next(tpp->cpumask, workers[j].cpumask, tpp->strict_cpu, &cpumask_iter);
int32_t rc = ggml_thread_create(&workers[j].thrd, NULL, ggml_graph_compute_secondary_thread, &workers[j]);
GGML_ASSERT(rc == 0);
}
Could the calling program be using |
Yeah you're right, the |
Ref original webgpu PR: #14978
Example command:
# install emscripten: brew install emscripten emcmake cmake -B build-wasm -DGGML_WEBGPU=ON -DLLAMA_CURL=OFF -DGGML_WEBGPU_DEBUG=ON cmake --build build-wasm --target test-backend-ops