@@ -7,6 +7,8 @@ Microbenchmark to achieve peak performance on x86_64 CPUs and NVIDIA GPUs.
7
7
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
8
8
9
9
- [ 1. Support] ( #1-support )
10
+ - [ 1.1 Software support] ( #11-software-support )
11
+ - [ 1.2 Hardware support] ( #12-hardware-support )
10
12
- [ 2. Instalation] ( #2-instalation )
11
13
- [ 2.1 Building from source] ( #21-building-from-source )
12
14
- [ 2.2 Enabling and disabling support for CPU/GPU] ( #22-enabling-and-disabling-support-for-cpugpu )
@@ -25,13 +27,33 @@ Microbenchmark to achieve peak performance on x86_64 CPUs and NVIDIA GPUs.
25
27
- [ 5. Evaluation] ( #5-evaluation )
26
28
- [ Intel] ( #intel )
27
29
- [ AMD] ( #amd )
30
+ - [ NVIDIA] ( #nvidia )
28
31
- [ 6. Microarchitecture table] ( #6-microarchitecture-table )
32
+ - [ 6.1 CPU] ( #61-cpu )
33
+ - [ 6.2 GPU] ( #62-gpu )
29
34
30
35
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
31
36
32
37
# 1. Support
38
+
39
+ ## 1.1 Software support
33
40
peakperf only works properly in Linux. peakperf under Windows / macOS has not been tested, so performance may not be optimal. Windows port may be implemented in the future (see [ Issue #1 ] ( https://github.com/Dr-Noob/peakperf/issues/1 ) )
34
41
42
+ ## 1.2 Hardware support
43
+ Supported microarchitectures are:
44
+
45
+ - ** CPU (x86_64)** : AVX support is mandatory.
46
+ - Intel: Sandy Bridge and newer.
47
+ - AMD: Zen and newer.
48
+ - ** GPU** :
49
+ - NVIDIA: Compute Capabitliy >= 2.0.
50
+
51
+ For a complete list of supported microarchitectures, see section [ 5] ( #5-evaluation ) .
52
+
53
+ NOTES:
54
+ - _ Only GPUs that support to read the freqeuncy in real time (using freq.sh) can be actually evaluated._
55
+ - _ Other microarchitectures not mentioned here may also work._
56
+
35
57
# 2. Instalation
36
58
There is a peakperf package available in Arch Linux ([ peakperf-git] ( https://aur.archlinux.org/packages/peakperf-git ) ).
37
59
@@ -220,27 +242,30 @@ This tables shows the performance of peakperf for each of the microarchitecture
220
242
| Haswell | i7-4790K | ` 3.997 GHz ` | ` 511.61 ` | ` 511.43 +- 0.01 ` | ` 0.03% ` |
221
243
| Broadwell | 2x Xeon E5-2698 v4 | ` 2.599 GHz ` | ` 3326.72 ` | ` 3269.87 +- 14.42 ` | ` 1.73% ` |
222
244
| Skylake | i5-6400 | ` 3.099 GHz ` | ` 396.67 ` | ` 396.61 +- 0.01 ` | ` 0.06 ` |
245
+ | Knights Landing | Xeon Phi 7250 | ` 1.499 GHz ` | ` 5991.69 ` | ` 5390.84 +- 7.83 ` | ` 3.72% ` |
223
246
| Kaby Lake | i5-8250U | ` 2.700 GHz ` | ` 345.60 ` | ` 343.57 +- 1.38 ` | ` 0.59% ` |
224
247
| Coffee Lake | i9-9900K | ` 3.600 GHz ` | ` 921.60 ` | ` 918.72 +- 1.13 ` | ` 0.31% ` |
225
- | Comet Lake | i5-10400 | ` 3.999 GHz` | ` 768.80 ` | ` 766.97 +- 0.25 ` | ` 0.23 % ` |
226
- | Cascade Lake | 2x Xeon Gold 6238 | ` 2.099 GHz ` | ` 5910.78 ` | ` 5851.60 +- 2.69 ` | ` 1.01% ` |
248
+ | Comet Lake | i9-10900KF | ` 4.100 GHz` | ` 1312.00 ` | ` 1308.24 +- 0.30 ` | ` 0.30 % ` |
249
+ | Cascade Lake | 2x Xeon Gold 6238 | ` 2.099 GHz ` | ` 5910.78 ` | ` 5851.60 +- 2.69 ` | ` 1.01% ` |
227
250
| Ice Lake | i5-1035G1 | ` 2.990 GHz ` | ` 382.72 ` | ` 382.22 +- 0.18 ` | ` 0.13% ` |
228
- | Knights Landing | Xeon Phi 7250 | ` 1.499 GHz ` | ` 5991.69 ` | ` 5390.84 +- 7.83 ` | ` 3.72% ` |
229
-
251
+ | Tiger Lake | - | - | - | - | - |
252
+ | Rocket Lake | - | - | - | - | - |
230
253
231
254
## AMD
232
255
| uarch | CPU | AVX Clock | PP (Formula) | PP (Experimental) | Loss |
233
256
| :-----:| :----------------:| :------------:| :------------:| :------------------:| :-------:|
234
257
| Zen | - | - | - | - | - |
235
258
| Zen+ | AMD Ryzen 5 2600 | ` 3.724 GHz ` | ` 357.50 ` | ` 357.08 +- 0.03 ` | ` 0.11% ` |
236
259
| Zen 2 | - | - | - | - | - |
260
+ | Zen 3 | - | - | - | - | - |
237
261
238
262
## NVIDIA
239
263
| C.C | uarch | GPU | Clock | PP (Formula) | PP (Experimental) | Loss |
240
264
| :---:| :-------:| :-----------:| :------------:| :------------:| :-------------------:| :-------:|
241
265
| 5.2 | Maxwell | GTX 970 | ` 1.341 GHz ` | ` 4462.84 ` | ` 4333.92 +- 0.90 ` | ` 2.97% ` |
242
266
| 6.1 | Pascal | GTX 1080 | ` 1.860 GHz ` | ` 9523.20 ` | ` 9397.97 +- 0.10 ` | ` 1.33% ` |
243
267
| 7.5 | Turing | RTX 2080 Ti | ` 1.905 GHz ` | ` 16581.12 ` | ` 16373.28 +- 16.07 ` | ` 1.26% ` |
268
+ | 8.6 | Ampere | - | - | - | - | - |
244
269
245
270
_ NOTE 1_ : Performance measured on simple precision and GFLOP/s (gigaflops per second).
246
271
@@ -252,23 +277,25 @@ _NOTE 4_: Sandy Bridge and Ivy Bridge have ADD and MUL VPUs that can be used in
252
277
253
278
# 6. Microarchitecture table
254
279
255
- The following table acts as a summary of all supported microarchitectures with their characteristics:
256
-
257
- | uArch | AVX | FMA | AVX512 | Slots | FPUs | Latency | Tested | Refs |
258
- | :---------------:| :----------------:| :----------------:| :------------------:| :-----:| :---------------:| :---------------:| :----------------:| :----:|
259
- | Sandy Bridge | :heavy_check_mark : | :x : | :x : | 6 | 2 (ADD+MUL AVX) | 3 (ADD) 5 (MUL) | :heavy_check_mark : | [ 1] |
260
- | Ivy Bridge | :heavy_check_mark : | :x : | :x : | 6 | 2 (ADD+MUL AVX) | 3 (ADD) 5 (MUL) | :heavy_check_mark : | [ 2] |
261
- | Haswell | :heavy_check_mark : | :heavy_check_mark : | :x : | 10 | 2 (FMA AVX2) | 5 (FMA) | :heavy_check_mark : | [ 3] |
262
- | Broadwell | :heavy_check_mark : | :heavy_check_mark : | :x : | 8 | 2 (FMA AVX2) | 4 (FMA) | :heavy_check_mark : | [ 3] |
263
- | Skylake | :heavy_check_mark : | :heavy_check_mark : | :x : | 8 | 2 (FMA AVX2) | 4 (FMA) | :heavy_check_mark : | [ 3] |
264
- | Kaby Lake | :heavy_check_mark : | :heavy_check_mark : | :x : | 8 | 2 (FMA AVX2) | 4 (FMA) | :heavy_check_mark : | [ 4] |
265
- | Coffee Lake | :heavy_check_mark : | :heavy_check_mark : | :x : | 8 | 2 (FMA AVX2) | 4 (FMA) | :heavy_check_mark : | [ 5] |
266
- | Comet Lake | :heavy_check_mark : | :heavy_check_mark : | :x : | 8 | 2 (FMA AVX2) | 4 (FMA) | :heavy_check_mark : | [ 10] |
267
- | Ice Lake | :heavy_check_mark : | :heavy_check_mark : | :heavy_check_mark : | 8 | 2 (FMA AVX2) | 4 (FMA) | :heavy_check_mark : | [ 3] |
268
- | Knights Landing | :heavy_check_mark : | :heavy_check_mark : | :heavy_check_mark : | 12 | 2 (FMA AVX512) | 6 (FMA) | :heavy_check_mark : | [ 6] |
269
- | Ryzen ZEN | :heavy_check_mark : | :heavy_check_mark : | :x : | 5 | 1 (FMA AVX2) | 5 (FMA) | :x : | [ 7] |
270
- | Ryzen ZEN+ | :heavy_check_mark : | :heavy_check_mark : | :x : | 5 | 1 (FMA AVX2) | 5 (FMA) | :heavy_check_mark : | [ 8] |
271
- | Ryzen ZEN 2 | :heavy_check_mark : | :heavy_check_mark : | :x : | 10 | 2 (FMA AVX2) | 5 (FMA) | :x : | [ 9] |
280
+ The following tables act as a summary of all supported microarchitectures with their characteristics.
281
+
282
+ ## 6.1 CPU
283
+ | uarch | FMA | AVX512 | Slots | FPUs | Latency | Tested | Refs |
284
+ | :---------------:| :----------------:| :------------------:| :-----:| :---------------:| :---------------:| :----------------:| :----:|
285
+ | Sandy Bridge | :x : | :x : | 6 | 2 (ADD+MUL AVX) | 3 (ADD) 5 (MUL) | :heavy_check_mark : | [ 1] |
286
+ | Ivy Bridge | :x : | :x : | 6 | 2 (ADD+MUL AVX) | 3 (ADD) 5 (MUL) | :heavy_check_mark : | [ 2] |
287
+ | Haswell | :heavy_check_mark : | :x : | 10 | 2 (FMA AVX2) | 5 (FMA) | :heavy_check_mark : | [ 3] |
288
+ | Broadwell | :heavy_check_mark : | :x : | 8 | 2 (FMA AVX2) | 4 (FMA) | :heavy_check_mark : | [ 3] |
289
+ | Skylake | :heavy_check_mark : | :x : | 8 | 2 (FMA AVX2) | 4 (FMA) | :heavy_check_mark : | [ 3] |
290
+ | Kaby Lake | :heavy_check_mark : | :x : | 8 | 2 (FMA AVX2) | 4 (FMA) | :heavy_check_mark : | [ 4] |
291
+ | Coffee Lake | :heavy_check_mark : | :x : | 8 | 2 (FMA AVX2) | 4 (FMA) | :heavy_check_mark : | [ 5] |
292
+ | Comet Lake | :heavy_check_mark : | :x : | 8 | 2 (FMA AVX2) | 4 (FMA) | :heavy_check_mark : | [ 10] |
293
+ | Ice Lake | :heavy_check_mark : | :heavy_check_mark : | 8 | 2 (FMA AVX2) | 4 (FMA) | :heavy_check_mark : | [ 3] |
294
+ | Knights Landing | :heavy_check_mark : | :heavy_check_mark : | 12 | 2 (FMA AVX512) | 6 (FMA) | :heavy_check_mark : | [ 6] |
295
+ | Ryzen ZEN | :heavy_check_mark : | :x : | 5 | 1 (FMA AVX2) | 5 (FMA) | :x : | [ 7] |
296
+ | Ryzen ZEN+ | :heavy_check_mark : | :x : | 5 | 1 (FMA AVX2) | 5 (FMA) | :heavy_check_mark : | [ 8] |
297
+ | Ryzen ZEN 2 | :heavy_check_mark : | :x : | 10 | 2 (FMA AVX2) | 5 (FMA) | :x : | [ 9] |
298
+ | Ryzen ZEN 3 | :heavy_check_mark : | :x : | 8 | 2 (FMA AVX2) | 4 (FMA) | :x : | [ 11] |
272
299
273
300
References:
274
301
- [ 1] [ Agner Fog Instruction Tables (Page 199, VADDPS)] ( https://www.agner.org/optimize/instruction_tables.pdf )
@@ -281,6 +308,15 @@ References:
281
308
- [ 8] [ Wikichip] ( https://en.wikichip.org/wiki/amd/microarchitectures/zen%2B#Pipeline )
282
309
- [ 9] [ Agner Fog Instruction Tables (Page 111, VFMADD132PS)] ( https://www.agner.org/optimize/instruction_tables.pdf )
283
310
- [ 10] [ Wikichip] ( https://en.wikichip.org/wiki/intel/microarchitectures/comet_lake )
311
+ - [ 11] [ Agner Fog Instruction Tables (Page 124, VFMADD132PS)] ( https://www.agner.org/optimize/instruction_tables.pdf )
312
+
313
+ ## 6.2 GPU
314
+ | uarch | Latency | Tested | Refs |
315
+ | :-------:| :--------:| :----------------:| :----:|
316
+ | Maxwell | 6 | :heavy_check_mark : | [ ] |
317
+ | Pascal | 6 | :heavy_check_mark : | [ ] |
318
+ | Turing | 4 | :heavy_check_mark : | [ ] |
319
+ | Ampere | ? | :x : | [ ] |
284
320
285
321
_ NOTES:_
286
322
- Older microarchitectures may be added in the future. If I have not added olds architecture is because I can't test peakperf on them since I have not access to this hardware.
0 commit comments