Skip to content

Conversation

@l3utterfly
Copy link
Contributor

@l3utterfly l3utterfly commented Nov 4, 2025

Older SoCs like the Gen2 does not seem to support obtaining session URIs via FASTRPC_GET_URI. Additionally, they do not support the new rpcmem_alloc2, and should use rpcmem_alloc instead.

This PR adds graceful fallback for older SoCs:

  1. dynamically load rpcmem_alloc2 instead of statically loading, fall back to rpcmem_alloc if unable to find alloc2
  2. added fallback for older SoCs which do not support FASTRPC_GET_URI, falls back to single session uri

Previously, ggml-hexagon quits when running on an SD8 Gen2, which does not support alloc2 nor allow creating multiple sessions.

This PR is tested to work on both consumer S23 Ultra (SD8 Gen2), and S25+ Ultra (SD8 Gen4).

@max-krasnyansky I would appreciate your direction on if this PR is the way to go forward providing graceful fallback for older SoCs?

  1. libcdsprpc.so library name is hard-coded, is this a problem?
  2. undefined results if you enable multiple devices (i.e. HTP0, HTP1) on Gen2, all the opened session URIs are the same, but ggml-hexagon still thinks multiple session are open, should this be the end-users responsibility to not open multiple sessions on devices that do not support it?
  3. function pointer type to rpcmem_allow[2] is hardcoded, and declared as static, dlopen and dlclose happens in ggml_hexagon_registry, from my understanding, it seems it's guaranteed to only initialise once
  4. I do not have a Windows Snapdragon device to test, but theoretically porting this to windows should be a simple as conditionally compiling the dlopen and dlclose part to use .dll instead

EDIT: thanks to @chraac in here: #16911 (comment), updated PR to use weak symbols instead of dynamically loading libcdsprpc.so, which is a much more elegant solution

@max-krasnyansky
Copy link
Collaborator

@l3utterfly @chraac
Sorry for the delayed reply. Thanks for the updates!

Let's move that weak symbol define into htp-utils.h. There are already some other weak syms defined in there.

I was hoping to avoid having to deal with dlopen. The weak symbol version is probably better for now.
However, on Windows cdsprpc.dll is not in the system path and is kind of buried in the driver specific path.
We're going to need to explicitly dlopen it after all. There are some existing wrappers for dlopen and LoadSharedLibrary calls in ggml/src/ggml-backend.c that we can reuse.
Also we're going to need to lookup a bunch of symbols explicitly (i.e all the other rpcmem and dspqueue calls).
Anyway, no need to worry about that now. This is just a heads-up :)

Regarding the multi-session support. It'd be better to not create multiple GGML devices on the systems that do not support multi-session. We can add some simple logic to htp-utils to detect that and force ndev = 1. I believe we can assume that v75 and above on Android/Linux support multi-session, on Windows it's be v73 and above.

@chraac
Copy link
Contributor

chraac commented Nov 5, 2025

Regarding the multi-session support. It'd be better to not create multiple GGML devices on the systems that do not support multi-session. We can add some simple logic to htp-utils to detect that and force ndev = 1. I believe we can assume that v75 and above on Android/Linux support multi-session, on Windows it's be v73 and above.

Sounds like we could create an dev caps array that store if multi session was supported, and then use opt_arch as index.

@max-krasnyansky
Copy link
Collaborator

Regarding the multi-session support. It'd be better to not create multiple GGML devices on the systems that do not support multi-session. We can add some simple logic to htp-utils to detect that and force ndev = 1. I believe we can assume that v75 and above on Android/Linux support multi-session, on Windows it's be v73 and above.

Sounds like we could create an dev caps array that store if multi session was supported, and then use opt_arch as index.

Probably overkill. ie will need to keep adding versions as they come out.
Simple '>= v75' check (on Android/Linux) should be sufficient.

Removed weak declaration for rpcmem_alloc2.
Force ndev to 1 for SoCs architectures lower than v75.
@l3utterfly
Copy link
Contributor Author

@max-krasnyansky updated PR according to your suggestions!

@l3utterfly l3utterfly marked this pull request as ready for review November 6, 2025 03:05
@max-krasnyansky max-krasnyansky merged commit 6db3d1f into ggml-org:master Nov 6, 2025
67 of 71 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants