Skip to content

Commit 0358604

Browse files
committed
design: add 14313-benchmark-format.md
For golang/go#14313. Change-Id: Ib9483714bbd004ff2be6cfa0d6e730d2d7f5da42 Reviewed-on: https://go-review.googlesource.com/19490 Run-TryBot: Russ Cox <[email protected]> Reviewed-by: Russ Cox <[email protected]>
1 parent ac3baa5 commit 0358604

File tree

1 file changed

+314
-0
lines changed

1 file changed

+314
-0
lines changed

design/14313-benchmark-format.md

Lines changed: 314 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,314 @@
1+
# Proposal: Go Benchmark Data Format
2+
3+
Authors: Russ Cox, Austin Clements
4+
5+
Last updated: February 2016
6+
7+
Discussion at [golang.org/issue/14313](https://golang.org/issue/14313).
8+
9+
## Abstract
10+
11+
We propose to make the current output of `go test -bench` the defined format for recording all Go benchmark data.
12+
Having a defined format allows benchmark measurement programs
13+
and benchmark analysis programs to interoperate while
14+
evolving independently.
15+
16+
## Background
17+
18+
### Benchmark data formats
19+
20+
We are unaware of any standard formats for recording raw benchmark data,
21+
and we've been unable to find any using web searches.
22+
One might expect that a standard benchmark suite such as SPEC CPU2006 would have
23+
defined a format for raw results, but that appears not to be the case.
24+
The [collection of published results](https://www.spec.org/cpu2006/results/)
25+
includes only analyzed data ([example](https://www.spec.org/cpu2006/results/res2011q3/cpu2006-20110620-17230.txt)), not raw data.
26+
27+
Go has a de facto standard format for benchmark data:
28+
the lines generated by the testing package when using `go test -bench`.
29+
For example, running compress/flate's benchmarks produces this output:
30+
31+
BenchmarkDecodeDigitsSpeed1e4-8 100 154125 ns/op 64.88 MB/s 40418 B/op 7 allocs/op
32+
BenchmarkDecodeDigitsSpeed1e5-8 10 1367632 ns/op 73.12 MB/s 41356 B/op 14 allocs/op
33+
BenchmarkDecodeDigitsSpeed1e6-8 1 13879794 ns/op 72.05 MB/s 52056 B/op 94 allocs/op
34+
BenchmarkDecodeDigitsDefault1e4-8 100 147551 ns/op 67.77 MB/s 40418 B/op 8 allocs/op
35+
BenchmarkDecodeDigitsDefault1e5-8 10 1197672 ns/op 83.50 MB/s 41508 B/op 13 allocs/op
36+
BenchmarkDecodeDigitsDefault1e6-8 1 11808775 ns/op 84.68 MB/s 53800 B/op 80 allocs/op
37+
BenchmarkDecodeDigitsCompress1e4-8 100 143348 ns/op 69.76 MB/s 40417 B/op 8 allocs/op
38+
BenchmarkDecodeDigitsCompress1e5-8 10 1185527 ns/op 84.35 MB/s 41508 B/op 13 allocs/op
39+
BenchmarkDecodeDigitsCompress1e6-8 1 11740304 ns/op 85.18 MB/s 53800 B/op 80 allocs/op
40+
BenchmarkDecodeTwainSpeed1e4-8 100 143665 ns/op 69.61 MB/s 40849 B/op 15 allocs/op
41+
BenchmarkDecodeTwainSpeed1e5-8 10 1390359 ns/op 71.92 MB/s 45700 B/op 31 allocs/op
42+
BenchmarkDecodeTwainSpeed1e6-8 1 12128469 ns/op 82.45 MB/s 89336 B/op 221 allocs/op
43+
BenchmarkDecodeTwainDefault1e4-8 100 141916 ns/op 70.46 MB/s 40849 B/op 15 allocs/op
44+
BenchmarkDecodeTwainDefault1e5-8 10 1076669 ns/op 92.88 MB/s 43820 B/op 28 allocs/op
45+
BenchmarkDecodeTwainDefault1e6-8 1 10106485 ns/op 98.95 MB/s 71096 B/op 172 allocs/op
46+
BenchmarkDecodeTwainCompress1e4-8 100 138516 ns/op 72.19 MB/s 40849 B/op 15 allocs/op
47+
BenchmarkDecodeTwainCompress1e5-8 10 1227964 ns/op 81.44 MB/s 43316 B/op 25 allocs/op
48+
BenchmarkDecodeTwainCompress1e6-8 1 10040347 ns/op 99.60 MB/s 72120 B/op 173 allocs/op
49+
BenchmarkEncodeDigitsSpeed1e4-8 30 482808 ns/op 20.71 MB/s
50+
BenchmarkEncodeDigitsSpeed1e5-8 5 2685455 ns/op 37.24 MB/s
51+
BenchmarkEncodeDigitsSpeed1e6-8 1 24966055 ns/op 40.05 MB/s
52+
BenchmarkEncodeDigitsDefault1e4-8 20 655592 ns/op 15.25 MB/s
53+
BenchmarkEncodeDigitsDefault1e5-8 1 13000839 ns/op 7.69 MB/s
54+
BenchmarkEncodeDigitsDefault1e6-8 1 136341747 ns/op 7.33 MB/s
55+
BenchmarkEncodeDigitsCompress1e4-8 20 668083 ns/op 14.97 MB/s
56+
BenchmarkEncodeDigitsCompress1e5-8 1 12301511 ns/op 8.13 MB/s
57+
BenchmarkEncodeDigitsCompress1e6-8 1 137962041 ns/op 7.25 MB/s
58+
59+
The testing package always reports ns/op, and each benchmark can request the addition of MB/s (throughput) and also B/op and allocs/op (allocation rates).
60+
61+
### Benchmark processors
62+
63+
Multiple tools have been written that process this format,
64+
most notably [benchcmp](https://godoc.org/golang.org/x/tools/cmd/benchcmp)
65+
and its more statistically valid successor [benchstat](https://godoc.org/rsc.io/benchstat).
66+
There is also [benchmany](https://godoc.org/github.com/aclements/go-misc/benchmany)'s plot subcommand
67+
and likely more unpublished programs.
68+
69+
### Benchmark runners
70+
71+
Multiple tools have also been written that process this format.
72+
In addition to the standard Go testing package,
73+
[compilebench](https://godoc.org/rsc.io/compilebench)
74+
generates this data format based on runs of the Go compiler,
75+
and Austin's unpublished shellbench generates this data format
76+
after running an arbitrary shell command.
77+
78+
The [golang.org/x/benchmarks/bench](https://golang.org/x/benchmarks/bench) benchmarks
79+
are notable for _not_ generating this format,
80+
which has made all analysis of those results
81+
more complex than we believe it should be.
82+
We intend to update those benchmarks to generate the standard format,
83+
once a standard format is defined.
84+
Part of the motivation for the proposal is to avoid
85+
the need to process custom output formats in future benchmarks.
86+
87+
## Proposal
88+
89+
A Go benchmark data file is a textual file consisting of a sequence of lines.
90+
Configuration lines and benchmark result lines, described below,
91+
have semantic meaning in the reporting of benchmark results.
92+
93+
All other lines in the data file, including but not limited to
94+
blank lines and lines beginning with a # character, are ignored.
95+
For example, the testing package prints test results above benchmark data,
96+
usually the text `PASS`. that line is neither a configuration line nor a benchmark
97+
result line, so it is ignored.
98+
99+
### Configuration Lines
100+
101+
A configuration line is a key-value pair of the form
102+
103+
key: value
104+
105+
where key contains no space characters (as defined by `unicode.IsSpace`)
106+
nor upper case characters (as defined by `unicode.IsUpper`),
107+
and space characters separate “key:” from “value.”
108+
Conventionally, multiword keys are written with the words
109+
There are no restrictions on value, except that it cannot contain a newline character.
110+
Value can be omitted entirely but the colon must still be present.
111+
112+
The interpretation of a key/value pair is up to tooling, but the key/value pair
113+
is considered to describe all benchmark results that follow,
114+
until overwritten by a configuration line with the same key.
115+
116+
### Benchmark Results
117+
118+
A benchmark result line has the general form
119+
120+
<name> <iterations> <value> <unit> [<value> <unit>...]
121+
122+
The fields are separated by runs of space characters (as defined by `unicode.IsSpace`),
123+
so the line can be parsed with `strings.Fields`.
124+
The line must have an even number of fields, and at least four.
125+
126+
The first field is the benchmark name, which must begin with `Benchmark`
127+
and is typically followed by a capital letter, as in `BenchmarkReverseString`.
128+
Tools displaying benchmark data conventionally omit the `Benchmark` prefix.
129+
The same benchmark name can appear on multiple result lines,
130+
indicating that the benchmark was run multiple times.
131+
132+
The second field gives the number of iterations run.
133+
For most processing this number can be ignored, although
134+
it may give some indication of the expected accuracy
135+
of the measurements that follow.
136+
137+
The remaining fields report value/unit pairs in which the value
138+
is a float64 that can be parsed by `strconv.ParseFloat`
139+
and the unit explains the value, as in “64.88 MB/s”.
140+
The units reported are typically normalized so that they can be
141+
interpreted without considering to the number of iterations.
142+
In the example, the CPU cost is reported per-operation and the
143+
throughput is reported per-second; neither is a total that
144+
depends on the number of iterations.
145+
146+
### Value Units
147+
148+
A value's unit string is expected to specify not only the measurement unit
149+
but also, as needed, a description of what is being measured.
150+
For example, a benchmark might report its overall execution time
151+
as well as cache miss times with three units “ns/op,” “L1-miss-ns/op,”and “L2-miss-ns/op.”
152+
153+
Tooling can expect that the unit strings are identical for all runs to be compared;
154+
for example, a result reporting “ns/op” need not be considered comparable
155+
to one reporting “µs/op.”
156+
157+
However, tooling may assume that the measurement unit is the final
158+
of the hyphen-separated words in the unit string and may recognize
159+
and rescale known measurement units.
160+
For example, consistently large “ns/op” or “L1-miss-ns/op”
161+
might be rescaled to “ms/op” or “L1-miss-ms/op” for display.
162+
163+
### Benchmark Name Configuration
164+
165+
In the current testing package, benchmark names correspond to Go identifiers:
166+
each benchmark must be written as a different Go function.
167+
[Work targeted for Go 1.7](https://github.com/golang/proposal/blob/master/design/12166-subtests.md) will allow tests and benchmarks
168+
to define sub-tests and sub-benchmarks programatically,
169+
in particular to vary interesting parameters both when
170+
testing and when benchmarking.
171+
That work uses a slash to separate the name of a benchmark
172+
collection from the description of a sub-benchmark.
173+
174+
We propose that sub-benchmarks adopt the convention of
175+
choosing names that are key:value pairs;
176+
that slash-prefixed key:value pairs in the benchmark name are
177+
treated by benchmark data processors as per-benchmark
178+
configuration values;
179+
and that for sub-benchmarks the -N suffix to describe the
180+
GOMAXPROCS value is expanded to /gomaxprocs:N.
181+
182+
### Example
183+
184+
The benchmark output given in the background section above
185+
is already in the format proposed here.
186+
That is a key feature of the proposal.
187+
188+
However, a future run of the benchmark might add configuration lines,
189+
and the benchmark might be rewritten to use sub-benchmarks,
190+
producing this output:
191+
192+
commit: 7cd9055
193+
commit-time: 2016-02-11T13:25:45-0500
194+
goos: darwin
195+
goarch: amd64
196+
cpu: Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
197+
cpu-count: 8
198+
cpu-physical-count: 4
199+
os: Mac OS X 10.11.3
200+
mem: 16 GB
201+
202+
BenchmarkDecode/text:digits/level:speed/size:1e4/gomaxprocs:8 100 154125 ns/op 64.88 MB/s 40418 B/op 7 allocs/op
203+
BenchmarkDecode/text:digits/level:speed/size:1e5/gomaxprocs:8 10 1367632 ns/op 73.12 MB/s 41356 B/op 14 allocs/op
204+
BenchmarkDecode/text:digits/level:speed/size:1e6/gomaxprocs:8 1 13879794 ns/op 72.05 MB/s 52056 B/op 94 allocs/op
205+
BenchmarkDecode/text:digits/level:default/size:1e4/gomaxprocs:8 100 147551 ns/op 67.77 MB/s 40418 B/op 8 allocs/op
206+
BenchmarkDecode/text:digits/level:default/size:1e5/gomaxprocs:8 10 1197672 ns/op 83.50 MB/s 41508 B/op 13 allocs/op
207+
BenchmarkDecode/text:digits/level:default/size:1e6/gomaxprocs:8 1 11808775 ns/op 84.68 MB/s 53800 B/op 80 allocs/op
208+
BenchmarkDecode/text:digits/level:best/size:1e4/gomaxprocs:8 100 143348 ns/op 69.76 MB/s 40417 B/op 8 allocs/op
209+
BenchmarkDecode/text:digits/level:best/size:1e5/gomaxprocs:8 10 1185527 ns/op 84.35 MB/s 41508 B/op 13 allocs/op
210+
BenchmarkDecode/text:digits/level:best/size:1e6/gomaxprocs:8 1 11740304 ns/op 85.18 MB/s 53800 B/op 80 allocs/op
211+
BenchmarkDecode/text:twain/level:speed/size:1e4/gomaxprocs:8 100 143665 ns/op 69.61 MB/s 40849 B/op 15 allocs/op
212+
BenchmarkDecode/text:twain/level:speed/size:1e5/gomaxprocs:8 10 1390359 ns/op 71.92 MB/s 45700 B/op 31 allocs/op
213+
BenchmarkDecode/text:twain/level:speed/size:1e6/gomaxprocs:8 1 12128469 ns/op 82.45 MB/s 89336 B/op 221 allocs/op
214+
BenchmarkDecode/text:twain/level:default/size:1e4/gomaxprocs:8 100 141916 ns/op 70.46 MB/s 40849 B/op 15 allocs/op
215+
BenchmarkDecode/text:twain/level:default/size:1e5/gomaxprocs:8 10 1076669 ns/op 92.88 MB/s 43820 B/op 28 allocs/op
216+
BenchmarkDecode/text:twain/level:default/size:1e6/gomaxprocs:8 1 10106485 ns/op 98.95 MB/s 71096 B/op 172 allocs/op
217+
BenchmarkDecode/text:twain/level:best/size:1e4/gomaxprocs:8 100 138516 ns/op 72.19 MB/s 40849 B/op 15 allocs/op
218+
BenchmarkDecode/text:twain/level:best/size:1e5/gomaxprocs:8 10 1227964 ns/op 81.44 MB/s 43316 B/op 25 allocs/op
219+
BenchmarkDecode/text:twain/level:best/size:1e6/gomaxprocs:8 1 10040347 ns/op 99.60 MB/s 72120 B/op 173 allocs/op
220+
BenchmarkEncode/text:digits/level:speed/size:1e4/gomaxprocs:8 30 482808 ns/op 20.71 MB/s
221+
BenchmarkEncode/text:digits/level:speed/size:1e5/gomaxprocs:8 5 2685455 ns/op 37.24 MB/s
222+
BenchmarkEncode/text:digits/level:speed/size:1e6/gomaxprocs:8 1 24966055 ns/op 40.05 MB/s
223+
BenchmarkEncode/text:digits/level:default/size:1e4/gomaxprocs:8 20 655592 ns/op 15.25 MB/s
224+
BenchmarkEncode/text:digits/level:default/size:1e5/gomaxprocs:8 1 13000839 ns/op 7.69 MB/s
225+
BenchmarkEncode/text:digits/level:default/size:1e6/gomaxprocs:8 1 136341747 ns/op 7.33 MB/s
226+
BenchmarkEncode/text:digits/level:best/size:1e4/gomaxprocs:8 20 668083 ns/op 14.97 MB/s
227+
BenchmarkEncode/text:digits/level:best/size:1e5/gomaxprocs:8 1 12301511 ns/op 8.13 MB/s
228+
BenchmarkEncode/text:digits/level:best/size:1e6/gomaxprocs:8 1 137962041 ns/op 7.25 MB/s
229+
230+
Using sub-benchmarks has benefits beyond this proposal, namely that it would
231+
avoid the current repetitive code:
232+
233+
func BenchmarkDecodeDigitsSpeed1e4(b *testing.B) { benchmarkDecode(b, digits, speed, 1e4) }
234+
func BenchmarkDecodeDigitsSpeed1e5(b *testing.B) { benchmarkDecode(b, digits, speed, 1e5) }
235+
func BenchmarkDecodeDigitsSpeed1e6(b *testing.B) { benchmarkDecode(b, digits, speed, 1e6) }
236+
func BenchmarkDecodeDigitsDefault1e4(b *testing.B) { benchmarkDecode(b, digits, default_, 1e4) }
237+
func BenchmarkDecodeDigitsDefault1e5(b *testing.B) { benchmarkDecode(b, digits, default_, 1e5) }
238+
func BenchmarkDecodeDigitsDefault1e6(b *testing.B) { benchmarkDecode(b, digits, default_, 1e6) }
239+
func BenchmarkDecodeDigitsCompress1e4(b *testing.B) { benchmarkDecode(b, digits, compress, 1e4) }
240+
func BenchmarkDecodeDigitsCompress1e5(b *testing.B) { benchmarkDecode(b, digits, compress, 1e5) }
241+
func BenchmarkDecodeDigitsCompress1e6(b *testing.B) { benchmarkDecode(b, digits, compress, 1e6) }
242+
func BenchmarkDecodeTwainSpeed1e4(b *testing.B) { benchmarkDecode(b, twain, speed, 1e4) }
243+
func BenchmarkDecodeTwainSpeed1e5(b *testing.B) { benchmarkDecode(b, twain, speed, 1e5) }
244+
func BenchmarkDecodeTwainSpeed1e6(b *testing.B) { benchmarkDecode(b, twain, speed, 1e6) }
245+
func BenchmarkDecodeTwainDefault1e4(b *testing.B) { benchmarkDecode(b, twain, default_, 1e4) }
246+
func BenchmarkDecodeTwainDefault1e5(b *testing.B) { benchmarkDecode(b, twain, default_, 1e5) }
247+
func BenchmarkDecodeTwainDefault1e6(b *testing.B) { benchmarkDecode(b, twain, default_, 1e6) }
248+
func BenchmarkDecodeTwainCompress1e4(b *testing.B) { benchmarkDecode(b, twain, compress, 1e4) }
249+
func BenchmarkDecodeTwainCompress1e5(b *testing.B) { benchmarkDecode(b, twain, compress, 1e5) }
250+
func BenchmarkDecodeTwainCompress1e6(b *testing.B) { benchmarkDecode(b, twain, compress, 1e6) }
251+
252+
More importantly for this proposal, using sub-benchmarks also makes the possible
253+
comparison axes clear: digits vs twait, speed vs default vs best, size 1e4 vs 1e5 vs 1e6.
254+
255+
## Rationale
256+
257+
As discussed in the background section,
258+
we have already developed a number of analysis programs
259+
that assume this proposal's format,
260+
as well as a number of programs that generate this format.
261+
Standardizing the format should encourage additional work
262+
on both kinds of programs.
263+
264+
[Issue 12826](https://golang.org/issue/12826) suggests a different approach,
265+
namely the addition of a new `go test` option `-benchformat`, to control
266+
the format of benchmark output. In fact it gives the lack of standardization
267+
as the main justification for a new option:
268+
269+
> Currently `go test -bench .` prints out benchmark results in a
270+
> certain format, but there is no guarantee that this format will not
271+
> change. Thus a tool that parses go test output may break if an
272+
> incompatible change to the output format is made.
273+
274+
Our approach is instead to guarantee that the format will not change,
275+
or rather that it will only change in ways allowed by this design.
276+
An analysis tool that parses the output specified here will not break
277+
in future versions of Go,
278+
and a tool that generates the output specified here will work
279+
with all such analysis tools.
280+
Having one agreed-upon format enables broad interoperation;
281+
the ability for one tool to generate arbitrarily many different formats
282+
does not achieve the same result.
283+
284+
The proposed format also seems to be extensible enough to accommodate
285+
anticipated future work on benchmark reporting.
286+
287+
The main known issue with the current `go test -bench` is that
288+
we'd like to emit finer-grained detail about runs, for linearity testing
289+
and more robust statistics.
290+
This proposal allows that by simply printing more result lines.
291+
292+
Another known issue is that we may want to add custom outputs
293+
such as garbage collector statistics to certain benchmark runs.
294+
This proposal allows that by adding more value-unit pairs.
295+
296+
## Compatibility
297+
298+
Tools consuming existing benchmark format may need trivial changes
299+
to ignore non-benchmark result lines or to cope with additional value-unit pairs
300+
in benchmark results.
301+
302+
## Implementation
303+
304+
The benchmark format described here is already generated by `go test -bench`
305+
and expected by tools like `benchcmp` and `benchstat`.
306+
307+
The format is trivial to generate, and it is
308+
straightforward but not quite trivial to parse.
309+
310+
We anticipate that the [new x/perf subrepo](https://github.com/golang/go/issues/14304) will include a library for loading
311+
benchmark data from files, although the format is also simple enough that
312+
tools that want a different in-memory representation might reasonably
313+
write separate parsers.
314+

0 commit comments

Comments
 (0)