Skip to content

Qualcomm AI Engine Direct - gpu support part1 #12165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

haowhsu-quic
Copy link
Collaborator

Summary

  • rename folders in backends/qualcomm/runtime/backends
  • add gpu infra

Test plan

python backends/qualcomm/tests/test_qnn_delegate.py TestQNNFloatingPointOperator.test_qnn_backend_conv2d -b build-android/ -m SM8750 -s 5f396958 --online_prepare --backend gpu

- rename folders in backends/qualcomm/runtime/backends
- add gpu infra
Copy link

pytorch-bot bot commented Jul 2, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12165

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 1 Cancelled Job

As of commit f21b2b8 with merge base 929ec94 (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOB - The following job was cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 2, 2025
@haowhsu-quic
Copy link
Collaborator Author

@pytorchbot label "release notes: qualcomm"

@pytorch-bot pytorch-bot bot added the release notes: qualcomm Changes to the Qualcomm backend delegate label Jul 2, 2025
#include <executorch/backends/qualcomm/runtime/backends/htpbackend/HtpContext.h>
#include <executorch/backends/qualcomm/runtime/backends/htpbackend/HtpDevice.h>
#include <executorch/backends/qualcomm/runtime/backends/htpbackend/HtpGraph.h>
#include <executorch/backends/qualcomm/runtime/backends/gpu/GpuBackend.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly worried about the runtime size increase, that usually is a requirement for production. Do we know how much size increase with this PR? If I have a model runs on HTP only, can the runtime include HTP only?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The libqnn_executorch_backend.so grows from 630984 to 652672 bytes. We'll deprecate few files in next PR, hopefully it could further reduce the number.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What files will be deprecated in next PR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will be aot/ir and runtime/backend/CustomProtocol*. We now switch to QNN IR backend (DLC) for online-prepare path, the qcir and the legacy code for multi-method compilation can be fully deprecated.
But it would break backward compatibility since we used to wrap preprocess result with custom protocol. Probably will let you to decide when will be the right time to apply the change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I was thinking wrong about the impact of deprecating files. We still need to keep the custom protocol implementation to make multi-graph path work.
The change is in #12583 now and will guarantee BC.

@cccclai
Copy link
Contributor

cccclai commented Jul 11, 2025

Sorry I need to spend a bit more time on this, because we don't have CI to test the pllm model and I'm worried it will cause breakage

@haowhsu-quic
Copy link
Collaborator Author

Sorry I need to spend a bit more time on this, because we don't have CI to test the pllm model and I'm worried it will cause breakage

No worries, I think GA decoder models is way more important than this. This PR is mainly a proof of concept that we can extend the capability of QNN backend.

@cccclai
Copy link
Contributor

cccclai commented Jul 18, 2025

Can we prioritize the stories.pte as part of CI to prevent BC breakage? Otherwise it's hard to catch failure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: qualcomm Changes to the Qualcomm backend delegate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants