You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+20-10
Original file line number
Diff line number
Diff line change
@@ -6,6 +6,7 @@ Microbenchmark to achieve peak performance on x86_64 CPUs and NVIDIA GPUs.
6
6
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
7
7
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
8
8
9
+
9
10
-[1. Support](#1-support)
10
11
-[1.1 Software support](#11-software-support)
11
12
-[1.2 Hardware support](#12-hardware-support)
@@ -21,9 +22,10 @@ Microbenchmark to achieve peak performance on x86_64 CPUs and NVIDIA GPUs.
21
22
-[3.4. Options](#34-options)
22
23
-[4. Understanding the microbenchmark](#4-understanding-the-microbenchmark)
23
24
-[4.1 What is "peak performance" anyway?](#41-what-is-peak-performance-anyway)
24
-
-[4.2 The formula](#42-the-formula)
25
-
-[4.3 About the frequency to use in the formula](#43-about-the-frequency-to-use-in-the-formula)
26
-
-[4.4 What can I do if I do not get the expected results?](#44-what-can-i-do-if-i-do-not-get-the-expected-results)
25
+
-[4.2 The formula (CPU)](#42-the-formula-cpu)
26
+
-[4.3 The formula (GPU)](#43-the-formula-gpu)
27
+
-[4.4 About the frequency to use in the formula](#44-about-the-frequency-to-use-in-the-formula)
28
+
-[4.5 What can I do if I do not get the expected results?](#45-what-can-i-do-if-i-do-not-get-the-expected-results)
27
29
-[5. Evaluation](#5-evaluation)
28
30
-[Intel](#intel)
29
31
-[AMD](#amd)
@@ -183,14 +185,14 @@ _NOTE_: Some options are available only on CPU or GPU
183
185
## 4.1 What is "peak performance" anyway?
184
186
Peak performance refers to the maximum performance that a chip (a CPU) can achieve. The more powerful the CPU is, the greater the peak performance can achieve. This performance is a theoretical limit, computed using a formula (see next section), measured in floating point operation per seconds (FLOP/s or GFLOP/s, which stands for gigaflops). This value establishes a performance limit that the CPU is unable to overcome. However, achieving the peak performance (the maximum performance for a given CPU) is a very hard (but also interesting) task. To do so, the software must take advantage of the full power of the CPU. peakperf is a microbenchmark that achieves peak performance on many different x86_64 microarchitectures.
185
187
186
-
## 4.2 The formula
188
+
## 4.2 The formula (CPU)
187
189
188
190
```
189
191
N_CORES * FREQUENCY * FMA * UNITS * (SIZE_OF_VECTOR/32)
190
192
```
191
193
192
194
- N_CORES: The number of physical cores. In our example, it is **4**
193
-
- FREQUENCY: The freqeuncy of the CPU measured in GHz. To measure this frequency is a bit tricky, see next section for more details. In our example, it is **3.997**.
195
+
- FREQUENCY: The freqeuncy of the CPU measured in GHz. To measure this frequency is a bit tricky, see next section for more details. In our example, it is **3.997** (see where does this value come from in the next section).
194
196
- FMA: If CPU supports FMA, the peak performance is multipled by 2. If not, it is multiplied by 1. In our example, it is **2**.
195
197
- UNITS: CPUs can provide 1 or 2 functional units per core. Modern Intel CPUs usually provide 2, while AMD CPUs usually provide 1. In our example, it is **2**.
196
198
- SIZE_OF_VECTOR: If CPU supports AVX, the size is 256 (because AVX is 256 bits long). If CPU supports AVX512, the size is 512. In our example, the size is **256**.
@@ -201,15 +203,23 @@ For the example of a i7-4790K, we have:
And, as you can see in the previous test, we got 511.43 GFLOP/S, which tell us that peakperf is working properly and our CPU is behaving exactly as we expected. But, why did I chosse 3.997 GHz as the frequency?
206
+
And, as you can see in the previous test, we got 511.43 GFLOP/S, which tell us that peakperf is working properly and our CPU is behaving exactly as we expected.
207
+
208
+
## 4.3 The formula (GPU)
209
+
210
+
```
211
+
N_CORES * FREQUENCY * FMA
212
+
```
213
+
214
+
The GPU formula is simpler. `N_CORES` in this case is simply the number of CUDA cores (in the case of NVIDIA GPUs). Modern GPUs usually support FMA.
205
215
206
-
## 4.3 About the frequency to use in the formula
216
+
## 4.4 About the frequency to use in the formula
207
217
208
218
While running this microbenchmark, your CPU will be executing AVX code, so the frequency of your CPU running this code is neither your base nor your turbo frequency. Please, have a look at [this document](http://www.dolbeau.name/dolbeau/publications/peak.pdf) (on section IV.B) for more information.
209
219
210
-
The AVX frequency for a specific CPU is sometimes available online. The most effective way I know to get this frequency is to to actually measure your CPU frequency on real time while running AVX code. You can use the script I crafted for this task, [freq.sh](https://github.com/Dr-Noob/peakperf/freq.sh), to achieve this:
220
+
The AVX frequency for a specific CPU is sometimes available online. The most effective way I know to get this frequency is to to actually measure your CPU frequency on real time while running AVX code. You can use the script [freq.sh](https://github.com/Dr-Noob/peakperf/freq.sh) to achieve this:
211
221
1. Run the microbenchmark in background (`./peakperf -r 4 -w 0 > /dev/null &`)
212
-
2. Run the script (`./freq.sh`) which will fetch your CPU frequency in real time. In my case, I get:
222
+
2. Run the script (`./freq.sh`) which will fetch your CPU frequency in real time (use `.req.sh gpu` for measuring the GPU). In my case, I get:
213
223
214
224
```
215
225
Every 0,2s: grep 'MHz' /proc/cpuinfo
@@ -228,7 +238,7 @@ As you can see, i7-4790K's frequency while running AVX code is ~3997.630 MHz, wh
228
238
1. The microbenchmark is not working correctly. Please create a [issue in github](https://github.com/Dr-Noob/peakperf/issues)
229
239
2. Your CPU is not able to keep a stable frequency. This often happens if it's to hot, so the CPU is forced to low the frequency to not to melt itself.
230
240
231
-
## 4.4 What can I do if I do not get the expected results?
241
+
## 4.5 What can I do if I do not get the expected results?
232
242
Please create a [issue in github](https://github.com/Dr-Noob/peakperf/issues), posting the output of peakperf.
0 commit comments