Skip to content

Commit 56de25d

Browse files
authored
[SYCL][matrix] Update the query interface with the latest joint matrix approved syntax (#11004)
1 parent f0d0f0c commit 56de25d

File tree

8 files changed

+473
-534
lines changed

8 files changed

+473
-534
lines changed

sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_oneapi_matrix.asciidoc

Lines changed: 40 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -742,12 +742,12 @@ descriptors that can be queried using `get_info` API.
742742
[frame="none",options="header"]
743743
|======================
744744
| Device descriptors | Return type| Description
745-
|`ext::oneapi::experimental::info::device::matrix::combinations` |
745+
|`ext::oneapi::experimental::info::device::matrix_combinations` |
746746
`std::vector<combination>`| tells the set of supported matrix sizes
747747
and types on this device
748748
|======================
749749

750-
The runtime query returns a vector of `combinations` of `combination`
750+
The runtime query returns a vector of `matrix_combinations` of `combination`
751751
type. Each combination includes the sizes and the types for the
752752
matrices A, B, C, and D. Note that for each matrix hardware,
753753
the query returns `max_msize, max_nsize, max_ksize` or `msize, nsize,
@@ -791,7 +791,7 @@ struct combination {
791791
} // namespace sycl::ext::oneapi::experimental::matrix
792792
```
793793

794-
Each combination of the `combinations` vector composes the types and
794+
Each combination of the `matrix_combinations` vector composes the types and
795795
sizes of A, B, C, and D matrices supported by the device
796796
implementation. The table below provides a description of each member
797797
of the `combination` struct.
@@ -833,7 +833,7 @@ the `T` template parameter as follows: +
833833
```c++
834834
// Ta, Tb, Tc, and Td are the types used in applications
835835
std::vector<combination> combinations =
836-
device.get_info<info::device::matrix::combinations>();
836+
device.get_info<info::device::matrix_combinations>();
837837
for (int i = 0; sizeof(combinations); i++) {
838838
if (Ta == combinations[i].atype &&
839839
Tb == combinations[i].btype &&
@@ -850,7 +850,7 @@ for (int i = 0; sizeof(combinations); i++) {
850850
The table below provides a list of the combinations that
851851
`joint_matrix` implementations support on each of Intel AMX and Intel
852852
XMX hardware. Note that these can be returned using
853-
`ext::oneapi::experimental::info::device::matrix::combinations`.
853+
`ext::oneapi::experimental::info::device::matrix_combinations`.
854854

855855
==== Intel AMX Supported Combinations
856856
This is currently available in devices with the architecture
@@ -864,44 +864,52 @@ table below.
864864
| A type | B type | C and D type | M | N | K
865865
| `matrix_type::uint8` | `matrix_type::uint8` |
866866
`matrix_type::sint32` | +<=+ 16 | +<=+ 16 | +<=+ 64
867-
| `matrix_type::uint8` | `matrix_type::int8` |
867+
| `matrix_type::uint8` | `matrix_type::sint8` |
868868
`matrix_type::sint32` | +<=+ 16 | +<=+ 16 | +<=+ 64
869-
| `matrix_type::int8` | `matrix_type::uint8` |
869+
| `matrix_type::sint8` | `matrix_type::uint8` |
870870
`matrix_type::sint32` | +<=+ 16 | +<=+ 16 | +<=+ 64
871-
| `matrix_type::int8` | `matrix_type::int8` |
871+
| `matrix_type::sint8` | `matrix_type::sint8` |
872872
`matrix_type::sint32` | +<=+ 16 | +<=+ 16 | +<=+ 64
873873
| `matrix_type::bf16` | `matrix_type::bf16` |
874874
`matrix_type::fp32` | +<=+ 16 | +<=+ 16 | +<=+ 32
875875
|======================
876876

877877
==== Intel XMX Supported Combinations
878878
This is currently available in devices with the architecture
879-
`architecture::intel_gpu_pvc` and `architecture::intel_gpu_dg2`. In
880-
these architectures' implementation, the type of the C matrix must be
881-
the same as the type of the D matrix. Therefore, that common type is
882-
shown in a single column in the table below.
879+
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_dg2_g10`,
880+
`architecture::intel_gpu_dg2_g11`, and
881+
`architecture::intel_gpu_dg2_g12`. In these architectures'
882+
implementation, the type of the C matrix must be the same as the type
883+
of the D matrix. Therefore, that common type is shown in a single
884+
column in the table below.
883885

884886
[frame="none",options="header"]
885887
|======================
886888
| A type | B type | C and D type | M | N | K | device
887-
| `matrix_type::uint8` | `matrix_type::uint8` |
888-
`matrix_type::int32` | +<=+ 8 | 16 | 32 | architecture::intel_gpu_pvc
889-
| | | | |8||architecture::intel_gpu_dg2
890-
| `matrix_type::uint8` | `matrix_type::int8` |
891-
`matrix_type::int32` | +<=+ 8 | 16 | 32 | architecture::intel_gpu_pvc
892-
| | | | |8||architecture::intel_gpu_dg2
893-
| `matrix_type::int8` | `matrix_type::uint8` |
894-
`matrix_type::int32` | +<=+ 8 | 16 | 32 | architecture::intel_gpu_pvc
895-
| | | | |8||architecture::intel_gpu_dg2
896-
| `matrix_type::int8` | `matrix_type::int8` |
897-
`matrix_type::int32` | +<=+ 8 | 16 | 32 | architecture::intel_gpu_pvc
898-
| | | | |8||architecture::intel_gpu_dg2
899-
| `matrix_type::fp16` | `matrix_type::fp16` |
900-
`matrix_type::fp32` | +<=+ 8 | 16 | 16 | architecture::intel_gpu_pvc
901-
| | | | |8|| architecture::intel_gpu_dg2
902-
| `matrix_type::bf16` | `matrix_type::bf16` |
903-
`matrix_type::fp32` | +<=+ 8 | 16 | 16 | architecture::intel_gpu_pvc
904-
| | | | |8|| architecture::intel_gpu_dg2
889+
.2+| `matrix_type::uint8` .2+| `matrix_type::uint8` .2+|
890+
`matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32
891+
|`architecture::intel_gpu_pvc`|8|`architecture::intel_gpu_dg2_g10,
892+
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
893+
.2+| `matrix_type::uint8` .2+| `matrix_type::sint8` .2+|
894+
`matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32 |
895+
`architecture::intel_gpu_pvc`|8|`architecture::intel_gpu_dg2_g10,
896+
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
897+
.2+| `matrix_type::sint8` .2+| `matrix_type::uint8` .2+|
898+
`matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32 |
899+
`architecture::intel_gpu_pvc`|8|`architecture::intel_gpu_dg2_g10,
900+
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
901+
.2+| `matrix_type::sint8` .2+| `matrix_type::sint8` .2+|
902+
`matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32 |
903+
`architecture::intel_gpu_pvc`|8|`architecture::intel_gpu_dg2_g10,
904+
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
905+
.2+|`matrix_type::fp16` .2+| `matrix_type::fp16` .2+|
906+
`matrix_type::fp32` .2+| +<=+ 8 | 16 .2+| 16 |
907+
`architecture::intel_gpu_pvc`|8| `architecture::intel_gpu_dg2_g10,
908+
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
909+
.2+| `matrix_type::bf16` .2+| `matrix_type::bf16` .2+|
910+
`matrix_type::fp32` .2+| +<=+ 8 | 16 .2+| 16 |
911+
`architecture::intel_gpu_pvc` |8| `architecture::intel_gpu_dg2_g10,
912+
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
905913
|======================
906914

907915
==== Nvidia Tensor Cores Supported Combinations
@@ -933,11 +941,11 @@ supported parameter combination is specified in the following table.
933941
|16 |16 |16
934942
|8 |32 |16
935943
|32 |8 |16
936-
.3+| `matrix_type::int8` .3+| `matrix_type::int32`
944+
.3+| `matrix_type::sint8` .3+| `matrix_type::sint32`
937945
|16 |16 |16 .6+| sm_72
938946
|8 |32 |16
939947
|32 |8 |16
940-
.3+|`matrix_type::uint8` .3+|`matrix_type::int32`
948+
.3+|`matrix_type::uint8` .3+|`matrix_type::sint32`
941949
|16 |16 |16
942950
|8 |32 |16
943951
|32 |8 |16

sycl/doc/extensions/experimental/sycl_ext_oneapi_device_architecture.asciidoc

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,7 @@ namespace sycl::ext::oneapi::experimental {
9999

100100
enum class architecture : /* unspecified */ {
101101
x86_64,
102+
intel_cpu_spr,
102103
intel_gpu_bdw,
103104
intel_gpu_skl,
104105
intel_gpu_kbl,
@@ -195,6 +196,12 @@ of these enumerators, and it provides a brief description of their meanings.
195196
|-
196197
|Any CPU device with the x86_64 instruction set.
197198

199+
|`intel_cpu_spr`
200+
|-
201+
|Intel Xeon processor codenamed Sapphire Rapids. The utility of this
202+
enumeration is currently limited. See the section "Limitations with
203+
the experimental version" for details.
204+
198205
|`intel_gpu_bdw`
199206
|-
200207
|Broadwell Intel graphics architecture.
@@ -246,7 +253,7 @@ of these enumerators, and it provides a brief description of their meanings.
246253
|`intel_gpu_adl_s` +
247254
`intel_gpu_rpl_s`
248255
|-
249-
|Alder Lake S Intel graphics architecture or Raptor Lake Intel graphics
256+
|Alder Lake S Intel graphics architecture or Raptor Lake Intel graphics
250257
architecture.
251258

252259
|`intel_gpu_adl_p`
@@ -589,6 +596,15 @@ feature, the application must be compiled in ahead-of-time (AOT) mode using
589596
description of the `-fsycl-targets` option. These are the target names of the
590597
form "intel_gpu_*", "nvidia_gpu_*", or "amd_gpu_*".
591598

599+
The architecture enumeration `intel_cpu_spr` does not currently work
600+
with any of the APIs described in this extension. It cannot be used
601+
with the `if_architecture_is` function, the
602+
`device::ext_oneapi_architecture_is` function, or the
603+
`info::device::architecture` query descriptor. It currently exists
604+
only for use with the
605+
link:sycl_ext_matrix/sycl_ext_oneapi_matrix.asciidoc[sycl_ext_oneapi_matrix]
606+
extension.
607+
592608
== Future direction
593609

594610
This experimental extension is still evolving. We expect that future versions

sycl/include/sycl/ext/oneapi/experimental/device_architecture.hpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ namespace ext::oneapi::experimental {
1414

1515
enum class architecture {
1616
x86_64,
17+
intel_cpu_spr,
1718
intel_gpu_bdw,
1819
intel_gpu_skl,
1920
intel_gpu_kbl,

0 commit comments

Comments
 (0)