Skip to content

Commit b885cb8

Browse files
committed
[v1.14] Update README and BENCHMARKS
1 parent 3687478 commit b885cb8

File tree

2 files changed

+59
-22
lines changed

2 files changed

+59
-22
lines changed

BENCHMARKS.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,11 @@
88
| Intel i5-6400 | Skylake | `3.099 GHz` | `396.67` | `396.61 +- 0.01 ` | `0.06%` |
99
| Intel i5-8250U | Kaby Lake | `2.700 GHz` | `345.60` | `343.57 +- 1.38` | `0.59%` |
1010
| Intel i5-7400 | Kaby Lake | `3.299 GHz` | `422.27` | `420.62 +- 0.40` | `0.39%` |
11-
| Intel i5-9400 | Coffee Lake | `3.899 GHz` | `748.60` | `747.52 +- 0.09` | `0.14%` |
1211
| Intel i7-8700 | Coffee Lake | `4.300 GHz` | `825.60` | `823.83 +- 0.01` | `0.21%` |
12+
| Intel i5-9400 | Coffee Lake | `3.899 GHz` | `748.60` | `747.52 +- 0.09` | `0.14%` |
1313
| Intel i9-9900K | Coffee Lake | `3.600 GHz` | `921.60` | `918.72 +- 1.13` | `0.31%` |
1414
| Intel i5-10400 | Comet Lake | `3.999 GHz` | `768.80` | `766.97 +- 0.25` | `0.23%` |
15+
| Intel i9-10900KF | Comet Lake | `4.100 GHz` | `1312.00` | `1308.24 +- 0.30` | `0.30%` |
1516
| Intel i5-1035G1 | Ice Lake | `2.990 GHz` | `382.72` | `382.22 +- 0.18` | `0.13%` |
1617
| AMD Ryzen 5 2600 | Zen+ | `3.724 GHz` | `357.50` | `357.08 +- 0.03` | `0.11%` |
1718

README.md

+57-21
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@ Microbenchmark to achieve peak performance on x86_64 CPUs and NVIDIA GPUs.
77
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
88

99
- [1. Support](#1-support)
10+
- [1.1 Software support](#11-software-support)
11+
- [1.2 Hardware support](#12-hardware-support)
1012
- [2. Instalation](#2-instalation)
1113
- [2.1 Building from source](#21-building-from-source)
1214
- [2.2 Enabling and disabling support for CPU/GPU](#22-enabling-and-disabling-support-for-cpugpu)
@@ -25,13 +27,33 @@ Microbenchmark to achieve peak performance on x86_64 CPUs and NVIDIA GPUs.
2527
- [5. Evaluation](#5-evaluation)
2628
- [Intel](#intel)
2729
- [AMD](#amd)
30+
- [NVIDIA](#nvidia)
2831
- [6. Microarchitecture table](#6-microarchitecture-table)
32+
- [6.1 CPU](#61-cpu)
33+
- [6.2 GPU](#62-gpu)
2934

3035
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
3136

3237
# 1. Support
38+
39+
## 1.1 Software support
3340
peakperf only works properly in Linux. peakperf under Windows / macOS has not been tested, so performance may not be optimal. Windows port may be implemented in the future (see [Issue #1](https://github.com/Dr-Noob/peakperf/issues/1))
3441

42+
## 1.2 Hardware support
43+
Supported microarchitectures are:
44+
45+
- **CPU (x86_64)**: AVX support is mandatory.
46+
- Intel: Sandy Bridge and newer.
47+
- AMD: Zen and newer.
48+
- **GPU**:
49+
- NVIDIA: Compute Capabitliy >= 2.0.
50+
51+
For a complete list of supported microarchitectures, see section [5](#5-evaluation).
52+
53+
NOTES:
54+
- _Only GPUs that support to read the freqeuncy in real time (using freq.sh) can be actually evaluated._
55+
- _Other microarchitectures not mentioned here may also work._
56+
3557
# 2. Instalation
3658
There is a peakperf package available in Arch Linux ([peakperf-git](https://aur.archlinux.org/packages/peakperf-git)).
3759

@@ -220,27 +242,30 @@ This tables shows the performance of peakperf for each of the microarchitecture
220242
| Haswell | i7-4790K | `3.997 GHz` | `511.61` | `511.43 +- 0.01` | `0.03%` |
221243
| Broadwell | 2x Xeon E5-2698 v4 | `2.599 GHz` | `3326.72` | `3269.87 +- 14.42` | `1.73%` |
222244
| Skylake | i5-6400 | `3.099 GHz` | `396.67` | `396.61 +- 0.01 ` | `0.06` |
245+
| Knights Landing | Xeon Phi 7250 | `1.499 GHz` | `5991.69` | `5390.84 +- 7.83` | `3.72%` |
223246
| Kaby Lake | i5-8250U | `2.700 GHz` | `345.60` | `343.57 +- 1.38` | `0.59%` |
224247
| Coffee Lake | i9-9900K | `3.600 GHz` | `921.60` | `918.72 +- 1.13` | `0.31%` |
225-
| Comet Lake | i5-10400 | `3.999 GHz` | `768.80` | `766.97 +- 0.25` | `0.23%` |
226-
| Cascade Lake | 2x Xeon Gold 6238 | `2.099 GHz` | `5910.78` | `5851.60 +- 2.69` | `1.01%` |
248+
| Comet Lake | i9-10900KF | `4.100 GHz` | `1312.00` | `1308.24 +- 0.30` | `0.30%` |
249+
| Cascade Lake | 2x Xeon Gold 6238 | `2.099 GHz` | `5910.78` | `5851.60 +- 2.69` | `1.01%` |
227250
| Ice Lake | i5-1035G1 | `2.990 GHz` | `382.72` | `382.22 +- 0.18` | `0.13%` |
228-
| Knights Landing | Xeon Phi 7250 | `1.499 GHz` | `5991.69` | `5390.84 +- 7.83` | `3.72%` |
229-
251+
| Tiger Lake | - | - | - | - | - |
252+
| Rocket Lake | - | - | - | - | - |
230253

231254
## AMD
232255
| uarch | CPU | AVX Clock | PP (Formula) | PP (Experimental) | Loss |
233256
|:-----:|:----------------:|:------------:|:------------:|:------------------:|:-------:|
234257
| Zen | - | - | - | - | - |
235258
| Zen+ | AMD Ryzen 5 2600 | `3.724 GHz` | `357.50` | `357.08 +- 0.03` | `0.11%` |
236259
| Zen 2 | - | - | - | - | - |
260+
| Zen 3 | - | - | - | - | - |
237261

238262
## NVIDIA
239263
| C.C | uarch | GPU | Clock | PP (Formula) | PP (Experimental) | Loss |
240264
|:---:|:-------:|:-----------:|:------------:|:------------:|:-------------------:|:-------:|
241265
| 5.2 | Maxwell | GTX 970 | `1.341 GHz` | `4462.84` | `4333.92 +- 0.90` | `2.97%` |
242266
| 6.1 | Pascal | GTX 1080 | `1.860 GHz` | `9523.20` | `9397.97 +- 0.10` | `1.33%` |
243267
| 7.5 | Turing | RTX 2080 Ti | `1.905 GHz` | `16581.12` | `16373.28 +- 16.07` | `1.26%` |
268+
| 8.6 | Ampere | - | - | - | - | - |
244269

245270
_NOTE 1_: Performance measured on simple precision and GFLOP/s (gigaflops per second).
246271

@@ -252,23 +277,25 @@ _NOTE 4_: Sandy Bridge and Ivy Bridge have ADD and MUL VPUs that can be used in
252277

253278
# 6. Microarchitecture table
254279

255-
The following table acts as a summary of all supported microarchitectures with their characteristics:
256-
257-
| uArch | AVX | FMA | AVX512 | Slots | FPUs | Latency | Tested | Refs |
258-
|:---------------:|:----------------:|:----------------:|:------------------:|:-----:|:---------------:|:---------------:|:----------------:|:----:|
259-
| Sandy Bridge |:heavy_check_mark:| :x: | :x: | 6 | 2 (ADD+MUL AVX) | 3 (ADD) 5 (MUL) |:heavy_check_mark:| [1] |
260-
| Ivy Bridge |:heavy_check_mark:| :x: | :x: | 6 | 2 (ADD+MUL AVX) | 3 (ADD) 5 (MUL) |:heavy_check_mark:| [2] |
261-
| Haswell |:heavy_check_mark:|:heavy_check_mark:| :x: | 10 | 2 (FMA AVX2) | 5 (FMA) |:heavy_check_mark:| [3] |
262-
| Broadwell |:heavy_check_mark:|:heavy_check_mark:| :x: | 8 | 2 (FMA AVX2) | 4 (FMA) |:heavy_check_mark:| [3] |
263-
| Skylake |:heavy_check_mark:|:heavy_check_mark:| :x: | 8 | 2 (FMA AVX2) | 4 (FMA) |:heavy_check_mark:| [3] |
264-
| Kaby Lake |:heavy_check_mark:|:heavy_check_mark:| :x: | 8 | 2 (FMA AVX2) | 4 (FMA) |:heavy_check_mark:| [4] |
265-
| Coffee Lake |:heavy_check_mark:|:heavy_check_mark:| :x: | 8 | 2 (FMA AVX2) | 4 (FMA) |:heavy_check_mark:| [5] |
266-
| Comet Lake |:heavy_check_mark:|:heavy_check_mark:| :x: | 8 | 2 (FMA AVX2) | 4 (FMA) |:heavy_check_mark:| [10] |
267-
| Ice Lake |:heavy_check_mark:|:heavy_check_mark:| :heavy_check_mark: | 8 | 2 (FMA AVX2) | 4 (FMA) |:heavy_check_mark:| [3] |
268-
| Knights Landing |:heavy_check_mark:|:heavy_check_mark:| :heavy_check_mark: | 12 | 2 (FMA AVX512) | 6 (FMA) |:heavy_check_mark:| [6] |
269-
| Ryzen ZEN |:heavy_check_mark:|:heavy_check_mark:| :x: | 5 | 1 (FMA AVX2) | 5 (FMA) |:x: | [7] |
270-
| Ryzen ZEN+ |:heavy_check_mark:|:heavy_check_mark:| :x: | 5 | 1 (FMA AVX2) | 5 (FMA) |:heavy_check_mark:| [8] |
271-
| Ryzen ZEN 2 |:heavy_check_mark:|:heavy_check_mark:| :x: | 10 | 2 (FMA AVX2) | 5 (FMA) |:x: | [9] |
280+
The following tables act as a summary of all supported microarchitectures with their characteristics.
281+
282+
## 6.1 CPU
283+
| uarch | FMA | AVX512 | Slots | FPUs | Latency | Tested | Refs |
284+
|:---------------:|:----------------:|:------------------:|:-----:|:---------------:|:---------------:|:----------------:|:----:|
285+
| Sandy Bridge | :x: | :x: | 6 | 2 (ADD+MUL AVX) | 3 (ADD) 5 (MUL) |:heavy_check_mark:| [1] |
286+
| Ivy Bridge | :x: | :x: | 6 | 2 (ADD+MUL AVX) | 3 (ADD) 5 (MUL) |:heavy_check_mark:| [2] |
287+
| Haswell |:heavy_check_mark:| :x: | 10 | 2 (FMA AVX2) | 5 (FMA) |:heavy_check_mark:| [3] |
288+
| Broadwell |:heavy_check_mark:| :x: | 8 | 2 (FMA AVX2) | 4 (FMA) |:heavy_check_mark:| [3] |
289+
| Skylake |:heavy_check_mark:| :x: | 8 | 2 (FMA AVX2) | 4 (FMA) |:heavy_check_mark:| [3] |
290+
| Kaby Lake |:heavy_check_mark:| :x: | 8 | 2 (FMA AVX2) | 4 (FMA) |:heavy_check_mark:| [4] |
291+
| Coffee Lake |:heavy_check_mark:| :x: | 8 | 2 (FMA AVX2) | 4 (FMA) |:heavy_check_mark:| [5] |
292+
| Comet Lake |:heavy_check_mark:| :x: | 8 | 2 (FMA AVX2) | 4 (FMA) |:heavy_check_mark:| [10] |
293+
| Ice Lake |:heavy_check_mark:| :heavy_check_mark: | 8 | 2 (FMA AVX2) | 4 (FMA) |:heavy_check_mark:| [3] |
294+
| Knights Landing |:heavy_check_mark:| :heavy_check_mark: | 12 | 2 (FMA AVX512) | 6 (FMA) |:heavy_check_mark:| [6] |
295+
| Ryzen ZEN |:heavy_check_mark:| :x: | 5 | 1 (FMA AVX2) | 5 (FMA) |:x: | [7] |
296+
| Ryzen ZEN+ |:heavy_check_mark:| :x: | 5 | 1 (FMA AVX2) | 5 (FMA) |:heavy_check_mark:| [8] |
297+
| Ryzen ZEN 2 |:heavy_check_mark:| :x: | 10 | 2 (FMA AVX2) | 5 (FMA) |:x: | [9] |
298+
| Ryzen ZEN 3 |:heavy_check_mark:| :x: | 8 | 2 (FMA AVX2) | 4 (FMA) |:x: | [11] |
272299

273300
References:
274301
- [1] [Agner Fog Instruction Tables (Page 199, VADDPS)](https://www.agner.org/optimize/instruction_tables.pdf)
@@ -281,6 +308,15 @@ References:
281308
- [8] [Wikichip](https://en.wikichip.org/wiki/amd/microarchitectures/zen%2B#Pipeline)
282309
- [9] [Agner Fog Instruction Tables (Page 111, VFMADD132PS)](https://www.agner.org/optimize/instruction_tables.pdf)
283310
- [10] [Wikichip](https://en.wikichip.org/wiki/intel/microarchitectures/comet_lake)
311+
- [11] [Agner Fog Instruction Tables (Page 124, VFMADD132PS)](https://www.agner.org/optimize/instruction_tables.pdf)
312+
313+
## 6.2 GPU
314+
| uarch | Latency | Tested | Refs |
315+
|:-------:|:--------:|:----------------:|:----:|
316+
| Maxwell | 6 |:heavy_check_mark:| [] |
317+
| Pascal | 6 |:heavy_check_mark:| [] |
318+
| Turing | 4 |:heavy_check_mark:| [] |
319+
| Ampere | ? |:x: | [] |
284320

285321
_NOTES:_
286322
- Older microarchitectures may be added in the future. If I have not added olds architecture is because I can't test peakperf on them since I have not access to this hardware.

0 commit comments

Comments
 (0)