Skip to content

[SYCL][matrix] Update the query interface with the latest joint matrix approved syntax #11004

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Sep 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -741,12 +741,12 @@ descriptors that can be queried using `get_info` API.
[frame="none",options="header"]
|======================
| Device descriptors | Return type| Description
|`ext::oneapi::experimental::info::device::matrix::combinations` |
|`ext::oneapi::experimental::info::device::matrix_combinations` |
`std::vector<combination>`| tells the set of supported matrix sizes
and types on this device
|======================

The runtime query returns a vector of `combinations` of `combination`
The runtime query returns a vector of `matrix_combinations` of `combination`
type. Each combination includes the sizes and the types for the
matrices A, B, C, and D. Note that for each matrix hardware,
the query returns `max_msize, max_nsize, max_ksize` or `msize, nsize,
Expand Down Expand Up @@ -790,7 +790,7 @@ struct combination {
} // namespace sycl::ext::oneapi::experimental::matrix
```

Each combination of the `combinations` vector composes the types and
Each combination of the `matrix_combinations` vector composes the types and
sizes of A, B, C, and D matrices supported by the device
implementation. The table below provides a description of each member
of the `combination` struct.
Expand Down Expand Up @@ -832,7 +832,7 @@ the `T` template parameter as follows: +
```c++
// Ta, Tb, Tc, and Td are the types used in applications
std::vector<combination> combinations =
device.get_info<info::device::matrix::combinations>();
device.get_info<info::device::matrix_combinations>();
for (int i = 0; sizeof(combinations); i++) {
if (Ta == combinations[i].atype &&
Tb == combinations[i].btype &&
Expand All @@ -849,7 +849,7 @@ for (int i = 0; sizeof(combinations); i++) {
The table below provides a list of the combinations that
`joint_matrix` implementations support on each of Intel AMX and Intel
XMX hardware. Note that these can be returned using
`ext::oneapi::experimental::info::device::matrix::combinations`.
`ext::oneapi::experimental::info::device::matrix_combinations`.

==== Intel AMX Supported Combinations
This is currently available in devices with the architecture
Expand All @@ -863,44 +863,52 @@ table below.
| A type | B type | C and D type | M | N | K
| `matrix_type::uint8` | `matrix_type::uint8` |
`matrix_type::sint32` | +<=+ 16 | +<=+ 16 | +<=+ 64
| `matrix_type::uint8` | `matrix_type::int8` |
| `matrix_type::uint8` | `matrix_type::sint8` |
`matrix_type::sint32` | +<=+ 16 | +<=+ 16 | +<=+ 64
| `matrix_type::int8` | `matrix_type::uint8` |
| `matrix_type::sint8` | `matrix_type::uint8` |
`matrix_type::sint32` | +<=+ 16 | +<=+ 16 | +<=+ 64
| `matrix_type::int8` | `matrix_type::int8` |
| `matrix_type::sint8` | `matrix_type::sint8` |
`matrix_type::sint32` | +<=+ 16 | +<=+ 16 | +<=+ 64
| `matrix_type::bf16` | `matrix_type::bf16` |
`matrix_type::fp32` | +<=+ 16 | +<=+ 16 | +<=+ 32
|======================

==== Intel XMX Supported Combinations
This is currently available in devices with the architecture
`architecture::intel_gpu_pvc` and `architecture::intel_gpu_dg2`. In
these architectures' implementation, the type of the C matrix must be
the same as the type of the D matrix. Therefore, that common type is
shown in a single column in the table below.
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_dg2_g10`,
`architecture::intel_gpu_dg2_g11`, and
`architecture::intel_gpu_dg2_g12`. In these architectures'
implementation, the type of the C matrix must be the same as the type
of the D matrix. Therefore, that common type is shown in a single
column in the table below.

[frame="none",options="header"]
|======================
| A type | B type | C and D type | M | N | K | device
| `matrix_type::uint8` | `matrix_type::uint8` |
`matrix_type::int32` | +<=+ 8 | 16 | 32 | architecture::intel_gpu_pvc
| | | | |8||architecture::intel_gpu_dg2
| `matrix_type::uint8` | `matrix_type::int8` |
`matrix_type::int32` | +<=+ 8 | 16 | 32 | architecture::intel_gpu_pvc
| | | | |8||architecture::intel_gpu_dg2
| `matrix_type::int8` | `matrix_type::uint8` |
`matrix_type::int32` | +<=+ 8 | 16 | 32 | architecture::intel_gpu_pvc
| | | | |8||architecture::intel_gpu_dg2
| `matrix_type::int8` | `matrix_type::int8` |
`matrix_type::int32` | +<=+ 8 | 16 | 32 | architecture::intel_gpu_pvc
| | | | |8||architecture::intel_gpu_dg2
| `matrix_type::fp16` | `matrix_type::fp16` |
`matrix_type::fp32` | +<=+ 8 | 16 | 16 | architecture::intel_gpu_pvc
| | | | |8|| architecture::intel_gpu_dg2
| `matrix_type::bf16` | `matrix_type::bf16` |
`matrix_type::fp32` | +<=+ 8 | 16 | 16 | architecture::intel_gpu_pvc
| | | | |8|| architecture::intel_gpu_dg2
.2+| `matrix_type::uint8` .2+| `matrix_type::uint8` .2+|
`matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32
|`architecture::intel_gpu_pvc`|8|`architecture::intel_gpu_dg2_g10,
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
.2+| `matrix_type::uint8` .2+| `matrix_type::sint8` .2+|
`matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32 |
`architecture::intel_gpu_pvc`|8|`architecture::intel_gpu_dg2_g10,
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
.2+| `matrix_type::sint8` .2+| `matrix_type::uint8` .2+|
`matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32 |
`architecture::intel_gpu_pvc`|8|`architecture::intel_gpu_dg2_g10,
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
.2+| `matrix_type::sint8` .2+| `matrix_type::sint8` .2+|
`matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32 |
`architecture::intel_gpu_pvc`|8|`architecture::intel_gpu_dg2_g10,
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
.2+|`matrix_type::fp16` .2+| `matrix_type::fp16` .2+|
`matrix_type::fp32` .2+| +<=+ 8 | 16 .2+| 16 |
`architecture::intel_gpu_pvc`|8| `architecture::intel_gpu_dg2_g10,
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
.2+| `matrix_type::bf16` .2+| `matrix_type::bf16` .2+|
`matrix_type::fp32` .2+| +<=+ 8 | 16 .2+| 16 |
`architecture::intel_gpu_pvc` |8| `architecture::intel_gpu_dg2_g10,
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
|======================

==== Nvidia Tensor Cores Supported Combinations
Expand Down Expand Up @@ -932,11 +940,11 @@ supported parameter combination is specified in the following table.
|16 |16 |16
|8 |32 |16
|32 |8 |16
.3+| `matrix_type::int8` .3+| `matrix_type::int32`
.3+| `matrix_type::sint8` .3+| `matrix_type::sint32`
|16 |16 |16 .6+| sm_72
|8 |32 |16
|32 |8 |16
.3+|`matrix_type::uint8` .3+|`matrix_type::int32`
.3+|`matrix_type::uint8` .3+|`matrix_type::sint32`
|16 |16 |16
|8 |32 |16
|32 |8 |16
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@ namespace sycl::ext::oneapi::experimental {

enum class architecture : /* unspecified */ {
x86_64,
intel_cpu_spr,
intel_gpu_bdw,
intel_gpu_skl,
intel_gpu_kbl,
Expand Down Expand Up @@ -195,6 +196,12 @@ of these enumerators, and it provides a brief description of their meanings.
|-
|Any CPU device with the x86_64 instruction set.

|`intel_cpu_spr`
|-
|Intel Xeon processor codenamed Sapphire Rapids. The utility of this
enumeration is currently limited. See the section "Limitations with
the experimental version" for details.

|`intel_gpu_bdw`
|-
|Broadwell Intel graphics architecture.
Expand Down Expand Up @@ -246,7 +253,7 @@ of these enumerators, and it provides a brief description of their meanings.
|`intel_gpu_adl_s` +
`intel_gpu_rpl_s`
|-
|Alder Lake S Intel graphics architecture or Raptor Lake Intel graphics
|Alder Lake S Intel graphics architecture or Raptor Lake Intel graphics
architecture.

|`intel_gpu_adl_p`
Expand Down Expand Up @@ -589,6 +596,15 @@ feature, the application must be compiled in ahead-of-time (AOT) mode using
description of the `-fsycl-targets` option. These are the target names of the
form "intel_gpu_*", "nvidia_gpu_*", or "amd_gpu_*".

The architecture enumeration `intel_cpu_spr` does not currently work
with any of the APIs described in this extension. It cannot be used
with the `if_architecture_is` function, the
`device::ext_oneapi_architecture_is` function, or the
`info::device::architecture` query descriptor. It currently exists
only for use with the
link:sycl_ext_matrix/sycl_ext_oneapi_matrix.asciidoc[sycl_ext_oneapi_matrix]
extension.

== Future direction

This experimental extension is still evolving. We expect that future versions
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ namespace ext::oneapi::experimental {

enum class architecture {
x86_64,
intel_cpu_spr,
intel_gpu_bdw,
intel_gpu_skl,
intel_gpu_kbl,
Expand Down
Loading