-
Notifications
You must be signed in to change notification settings - Fork 180
Add ahead-of-time compilation support to cuda builder #186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This post from Nvidia provides more context. Today, CudaBuilder uses the nvvm backend to compile crates to PTX. The host then loads and JITs through the driver API. Either the backend or CudaBuilder could pass the generated PTX to |
I think it would be more idiomatic to treat this this as a different target or a a feature of the target like |
I personally think that it's just a part of pipeline where Rust - PTX - fatbin. Maybe we should support a more complete pipeline and let user devide to what part should the builder build until? |
I agree that this is more of a build pipeline option. Our current pipeline is disjointed, and users have to glue the ptx into their host binaries themselves. We should target getting fatbins embedded into the final host binary to match what nvcc does. That's different from what's being asked here but does get us a step towards that. |
Sure, but I'm thinking for future integration in rustc...I actually think this maps pretty close to |
Possibly useful techniques: https://github.com/calebzulawski/multiversion |
For large language model optimizations, there are a lot of kernels that are written specialized for a specific NVidia card and using CPU to select based on the user input and the card used. |
Right, that is why I said "techniques" rather than saying it is useful on its own 😁. I think it is most idiomatic to use |
From my knowledge/understanding, cuda_builder only supports JIT compilation. It would be beneficial for user adoption and performance for larger kernel sizes if we provided support for AOT compilation. Not sure if anything other than changes to cuda_builder would be necessary. Thanks @jorge-ortega for helping me flesh out this idea a little more.
The text was updated successfully, but these errors were encountered: