Skip to content

Commit 58fa4dd

Browse files
committed
[v1.17] Add GPU formula in README
1 parent ae06bd7 commit 58fa4dd

File tree

1 file changed

+20
-10
lines changed

1 file changed

+20
-10
lines changed

README.md

+20-10
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ Microbenchmark to achieve peak performance on x86_64 CPUs and NVIDIA GPUs.
66
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
77
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
88

9+
910
- [1. Support](#1-support)
1011
- [1.1 Software support](#11-software-support)
1112
- [1.2 Hardware support](#12-hardware-support)
@@ -21,9 +22,10 @@ Microbenchmark to achieve peak performance on x86_64 CPUs and NVIDIA GPUs.
2122
- [3.4. Options](#34-options)
2223
- [4. Understanding the microbenchmark](#4-understanding-the-microbenchmark)
2324
- [4.1 What is "peak performance" anyway?](#41-what-is-peak-performance-anyway)
24-
- [4.2 The formula](#42-the-formula)
25-
- [4.3 About the frequency to use in the formula](#43-about-the-frequency-to-use-in-the-formula)
26-
- [4.4 What can I do if I do not get the expected results?](#44-what-can-i-do-if-i-do-not-get-the-expected-results)
25+
- [4.2 The formula (CPU)](#42-the-formula-cpu)
26+
- [4.3 The formula (GPU)](#43-the-formula-gpu)
27+
- [4.4 About the frequency to use in the formula](#44-about-the-frequency-to-use-in-the-formula)
28+
- [4.5 What can I do if I do not get the expected results?](#45-what-can-i-do-if-i-do-not-get-the-expected-results)
2729
- [5. Evaluation](#5-evaluation)
2830
- [Intel](#intel)
2931
- [AMD](#amd)
@@ -183,14 +185,14 @@ _NOTE_: Some options are available only on CPU or GPU
183185
## 4.1 What is "peak performance" anyway?
184186
Peak performance refers to the maximum performance that a chip (a CPU) can achieve. The more powerful the CPU is, the greater the peak performance can achieve. This performance is a theoretical limit, computed using a formula (see next section), measured in floating point operation per seconds (FLOP/s or GFLOP/s, which stands for gigaflops). This value establishes a performance limit that the CPU is unable to overcome. However, achieving the peak performance (the maximum performance for a given CPU) is a very hard (but also interesting) task. To do so, the software must take advantage of the full power of the CPU. peakperf is a microbenchmark that achieves peak performance on many different x86_64 microarchitectures.
185187

186-
## 4.2 The formula
188+
## 4.2 The formula (CPU)
187189

188190
```
189191
N_CORES * FREQUENCY * FMA * UNITS * (SIZE_OF_VECTOR/32)
190192
```
191193

192194
- N_CORES: The number of physical cores. In our example, it is **4**
193-
- FREQUENCY: The freqeuncy of the CPU measured in GHz. To measure this frequency is a bit tricky, see next section for more details. In our example, it is **3.997**.
195+
- FREQUENCY: The freqeuncy of the CPU measured in GHz. To measure this frequency is a bit tricky, see next section for more details. In our example, it is **3.997** (see where does this value come from in the next section).
194196
- FMA: If CPU supports FMA, the peak performance is multipled by 2. If not, it is multiplied by 1. In our example, it is **2**.
195197
- UNITS: CPUs can provide 1 or 2 functional units per core. Modern Intel CPUs usually provide 2, while AMD CPUs usually provide 1. In our example, it is **2**.
196198
- SIZE_OF_VECTOR: If CPU supports AVX, the size is 256 (because AVX is 256 bits long). If CPU supports AVX512, the size is 512. In our example, the size is **256**.
@@ -201,15 +203,23 @@ For the example of a i7-4790K, we have:
201203
4 * 3.997 * 10^9 * 2 * 2 * (256/32) = 511.61 GFLOP/s
202204
```
203205

204-
And, as you can see in the previous test, we got 511.43 GFLOP/S, which tell us that peakperf is working properly and our CPU is behaving exactly as we expected. But, why did I chosse 3.997 GHz as the frequency?
206+
And, as you can see in the previous test, we got 511.43 GFLOP/S, which tell us that peakperf is working properly and our CPU is behaving exactly as we expected.
207+
208+
## 4.3 The formula (GPU)
209+
210+
```
211+
N_CORES * FREQUENCY * FMA
212+
```
213+
214+
The GPU formula is simpler. `N_CORES` in this case is simply the number of CUDA cores (in the case of NVIDIA GPUs). Modern GPUs usually support FMA.
205215

206-
## 4.3 About the frequency to use in the formula
216+
## 4.4 About the frequency to use in the formula
207217

208218
While running this microbenchmark, your CPU will be executing AVX code, so the frequency of your CPU running this code is neither your base nor your turbo frequency. Please, have a look at [this document](http://www.dolbeau.name/dolbeau/publications/peak.pdf) (on section IV.B) for more information.
209219

210-
The AVX frequency for a specific CPU is sometimes available online. The most effective way I know to get this frequency is to to actually measure your CPU frequency on real time while running AVX code. You can use the script I crafted for this task, [freq.sh](https://github.com/Dr-Noob/peakperf/freq.sh), to achieve this:
220+
The AVX frequency for a specific CPU is sometimes available online. The most effective way I know to get this frequency is to to actually measure your CPU frequency on real time while running AVX code. You can use the script [freq.sh](https://github.com/Dr-Noob/peakperf/freq.sh) to achieve this:
211221
1. Run the microbenchmark in background (`./peakperf -r 4 -w 0 > /dev/null &`)
212-
2. Run the script (`./freq.sh`) which will fetch your CPU frequency in real time. In my case, I get:
222+
2. Run the script (`./freq.sh`) which will fetch your CPU frequency in real time (use `.req.sh gpu` for measuring the GPU). In my case, I get:
213223

214224
```
215225
Every 0,2s: grep 'MHz' /proc/cpuinfo
@@ -228,7 +238,7 @@ As you can see, i7-4790K's frequency while running AVX code is ~3997.630 MHz, wh
228238
1. The microbenchmark is not working correctly. Please create a [issue in github](https://github.com/Dr-Noob/peakperf/issues)
229239
2. Your CPU is not able to keep a stable frequency. This often happens if it's to hot, so the CPU is forced to low the frequency to not to melt itself.
230240

231-
## 4.4 What can I do if I do not get the expected results?
241+
## 4.5 What can I do if I do not get the expected results?
232242
Please create a [issue in github](https://github.com/Dr-Noob/peakperf/issues), posting the output of peakperf.
233243

234244
# 5. Evaluation

0 commit comments

Comments
 (0)