-
Notifications
You must be signed in to change notification settings - Fork 250
Scalarizer-checker #49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Looks like spir-v does have attributes for this, see the Uniform and UniformId decorations in the spec. |
Bigger context for other folks reading this: This is a long-term language design goal. In particular, it may involve adding new features to the rust language, and serious design work to figure out what exactly we want to do here. It likely fits into level 5 or level 6 of the roadmap - a feature that isn't critical for getting rust on the GPU, but rather something that will make rust shine on the GPU as a better language than glsl and friends. Restating the problem with my own understanding, as it might be helpful for others: You can classify GPU variables into two kinds: one that is uniform across all threads in a {device, workgroup, subgroup}, and one that is different for every thread. Telling the compiler which one a variable is boosts performance, as some hardware has special registers for each kind. So, we need some way for the user to annotate the variables, check validity, etc. |
I believe this is a specific case of that, specifically for the case of variables that are uniform over a single warp/wave. If we use lingo similar to how CPU SIMD works (which is also how AMD GCN+ literature talks about this), we have a distinction between scalar (i.e. single value) vs vector (i.e. simd, multiple value). Scalar values are consistent over a whole warp, whereas vector values change per warp "thread". CPU side this is like a regular f32 vs a SIMD f32x4, and these are stored in distinct register types and operated on by distinct ALUs... GPUs do much the same thing. Scalar values can be used to efficiently compute uniform values, but importantly they are very useful in determining control flow for a warp. A conditional on a vector value will necessarily need to execute both sides of the conditional for all vector lanes/threads, which is often very wasteful. However, we can use a scalar register/comparison to change control flow for the entire warp and only have to exectue one side of the conditional as you'd expect. |
As a bit of additional input here, since I forgot to put it on before, there are optimisations which some vendors can take advantage of when knowing values are uniform at other scopes too - e.g. uniform across the device, across a workgroup, across a triangle... SGPR/VGPR distinctions as on AMD hardware are but the tip of the iceberg. Being able to know/validate the scope of uniformity at compile time (and validate at runtime when necessary) would be incredibly useful. The analogy to the rust borrow checker is quite apt I think. I suspect this falls into a similar design space as say value range checking, so might run into some of the same design issues discussed on this long running issue: rust-lang/rfcs#671 - albeit applied differently. |
This sort of compiler tech is way too far in the future (at least a year, probably two, if we're actively working on chugging through compiler issues) and not on our roadmap of use cases, so I don't think it's useful to track this work right now. |
Much like the borrow checker checks lifetimes, I think it might be important for a modern, safe GPU language to be able to check and verify access to scalar registers and to validate scalar control flow.
Right now the concept of scalars isn't exposed in SPIR-V as a first class citizen (and it might never be), however experiments in ISPC and discussions with IHVs have show that this is indeed an issue. Doing this properly can mean both a performance win (more scalarized code is typically better) and a safety win (e.g. preventing writing to vector registers from scalar branches would be a simple example).
https://ispc.github.io/ispc.html#uniform-and-varying-qualifiers
https://ispc.github.io/ispc.html#uniform-control-flow
To be clear, this isn't about the distinction from float and Vec4, this is about sgpr vs vgrp.
Examples include GCN/RDNA's
s_
namespaced registers and instructions (as opposed to thev
ones https://developer.amd.com/wp-content/resources/RDNA_Shader_ISA.pdf). And NVIDIA's URX vs RX registers in Turing: https://docs.nvidia.com/cuda/cuda-binary-utilities/index.htmlThe text was updated successfully, but these errors were encountered: