Scalarizer-checker #49

Jasper-Bekkers · 2020-10-12T13:18:28Z

Much like the borrow checker checks lifetimes, I think it might be important for a modern, safe GPU language to be able to check and verify access to scalar registers and to validate scalar control flow.

Right now the concept of scalars isn't exposed in SPIR-V as a first class citizen (and it might never be), however experiments in ISPC and discussions with IHVs have show that this is indeed an issue. Doing this properly can mean both a performance win (more scalarized code is typically better) and a safety win (e.g. preventing writing to vector registers from scalar branches would be a simple example).

https://ispc.github.io/ispc.html#uniform-and-varying-qualifiers
https://ispc.github.io/ispc.html#uniform-control-flow

To be clear, this isn't about the distinction from float and Vec4, this is about sgpr vs vgrp.

Examples include GCN/RDNA's s_ namespaced registers and instructions (as opposed to the v ones https://developer.amd.com/wp-content/resources/RDNA_Shader_ISA.pdf). And NVIDIA's URX vs RX registers in Turing: https://docs.nvidia.com/cuda/cuda-binary-utilities/index.html

The text was updated successfully, but these errors were encountered:

khyperia · 2020-10-15T07:33:53Z

Looks like spir-v does have attributes for this, see the Uniform and UniformId decorations in the spec.

khyperia · 2020-10-16T11:18:35Z

Bigger context for other folks reading this: This is a long-term language design goal. In particular, it may involve adding new features to the rust language, and serious design work to figure out what exactly we want to do here. It likely fits into level 5 or level 6 of the roadmap - a feature that isn't critical for getting rust on the GPU, but rather something that will make rust shine on the GPU as a better language than glsl and friends.

Restating the problem with my own understanding, as it might be helpful for others:

You can classify GPU variables into two kinds: one that is uniform across all threads in a {device, workgroup, subgroup}, and one that is different for every thread. Telling the compiler which one a variable is boosts performance, as some hardware has special registers for each kind. So, we need some way for the user to annotate the variables, check validity, etc.

fu5ha · 2020-10-20T23:24:26Z

You can classify GPU variables into two kinds: one that is uniform across all threads in a {device, workgroup, subgroup}, and one that is different for every thread. Telling the compiler which one a variable is boosts performance, as some hardware has special registers for each kind. So, we need some way for the user to annotate the variables, check validity, etc.

I believe this is a specific case of that, specifically for the case of variables that are uniform over a single warp/wave. If we use lingo similar to how CPU SIMD works (which is also how AMD GCN+ literature talks about this), we have a distinction between scalar (i.e. single value) vs vector (i.e. simd, multiple value). Scalar values are consistent over a whole warp, whereas vector values change per warp "thread". CPU side this is like a regular f32 vs a SIMD f32x4, and these are stored in distinct register types and operated on by distinct ALUs... GPUs do much the same thing. Scalar values can be used to efficiently compute uniform values, but importantly they are very useful in determining control flow for a warp. A conditional on a vector value will necessarily need to execute both sides of the conditional for all vector lanes/threads, which is often very wasteful. However, we can use a scalar register/comparison to change control flow for the entire warp and only have to exectue one side of the conditional as you'd expect.

Tobski · 2020-10-22T15:22:00Z

As a bit of additional input here, since I forgot to put it on before, there are optimisations which some vendors can take advantage of when knowing values are uniform at other scopes too - e.g. uniform across the device, across a workgroup, across a triangle...

SGPR/VGPR distinctions as on AMD hardware are but the tip of the iceberg. Being able to know/validate the scope of uniformity at compile time (and validate at runtime when necessary) would be incredibly useful. The analogy to the rust borrow checker is quite apt I think.

I suspect this falls into a similar design space as say value range checking, so might run into some of the same design issues discussed on this long running issue: rust-lang/rfcs#671 - albeit applied differently.

khyperia · 2021-04-01T13:15:00Z

This sort of compiler tech is way too far in the future (at least a year, probably two, if we're actively working on chugging through compiler issues) and not on our roadmap of use cases, so I don't think it's useful to track this work right now.

Jasper-Bekkers self-assigned this Oct 14, 2020

khyperia added the t: design Design of our rust-gpu language and std label Oct 16, 2020

khyperia mentioned this issue Oct 20, 2020

LDS access proposal #29

Closed

khyperia closed this as completed Apr 1, 2021

Jasper-Bekkers mentioned this issue Dec 11, 2021

NonUniform decoration support #756

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scalarizer-checker #49

Scalarizer-checker #49

Jasper-Bekkers commented Oct 12, 2020

khyperia commented Oct 15, 2020

khyperia commented Oct 16, 2020

fu5ha commented Oct 20, 2020 •

edited

Loading

Tobski commented Oct 22, 2020 •

edited

Loading

khyperia commented Apr 1, 2021

Scalarizer-checker #49

Scalarizer-checker #49

Comments

Jasper-Bekkers commented Oct 12, 2020

khyperia commented Oct 15, 2020

khyperia commented Oct 16, 2020

fu5ha commented Oct 20, 2020 • edited Loading

Tobski commented Oct 22, 2020 • edited Loading

khyperia commented Apr 1, 2021

fu5ha commented Oct 20, 2020 •

edited

Loading

Tobski commented Oct 22, 2020 •

edited

Loading