-
Notifications
You must be signed in to change notification settings - Fork 12.9k
CUDA: add set #14980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
CUDA: add set #14980
Conversation
Part of #14909 |
Hi, @JohannesGaessler Could you please review the changes when you have a chance? Thank you in advance! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of my current goals is to consolidate and deduplicate the code around copying data in the CUDA backend. As such I think rather than adding new kernels here it would be better to re-use the existing code. If the operation is not inlace you can use cudaMemsetAsync
to set dst
with the contents of src0
. Afterwards you can use ggml_cpy_flt_cuda
in cpy.cu
to do the copy. That kernel does not have an argument for the offset but it's not needed as you can simply apply the offset in host code.
I have already used
If I’m wrong, please correct me. Thanks for your help! |
Hi @am17an, thanks again for your previous review. Since @JohannesGaessler hasn’t had a chance to respond for a few weeks, would it be possible to ask another maintainer or contributor to review this as well? I’d really appreciate any additional feedback to help move this forward. |
Sorry, I forgot about this PR. The code in
Set both types to float.
Use the same shape twice. |
Make sure to read the contributing guidelines before submitting a PR