|
| 1 | +# Proposal: Go Benchmark Data Format |
| 2 | + |
| 3 | +Authors: Russ Cox, Austin Clements |
| 4 | + |
| 5 | +Last updated: February 2016 |
| 6 | + |
| 7 | +Discussion at [golang.org/issue/14313](https://golang.org/issue/14313). |
| 8 | + |
| 9 | +## Abstract |
| 10 | + |
| 11 | +We propose to make the current output of `go test -bench` the defined format for recording all Go benchmark data. |
| 12 | +Having a defined format allows benchmark measurement programs |
| 13 | +and benchmark analysis programs to interoperate while |
| 14 | +evolving independently. |
| 15 | + |
| 16 | +## Background |
| 17 | + |
| 18 | +### Benchmark data formats |
| 19 | + |
| 20 | +We are unaware of any standard formats for recording raw benchmark data, |
| 21 | +and we've been unable to find any using web searches. |
| 22 | +One might expect that a standard benchmark suite such as SPEC CPU2006 would have |
| 23 | +defined a format for raw results, but that appears not to be the case. |
| 24 | +The [collection of published results](https://www.spec.org/cpu2006/results/) |
| 25 | +includes only analyzed data ([example](https://www.spec.org/cpu2006/results/res2011q3/cpu2006-20110620-17230.txt)), not raw data. |
| 26 | + |
| 27 | +Go has a de facto standard format for benchmark data: |
| 28 | +the lines generated by the testing package when using `go test -bench`. |
| 29 | +For example, running compress/flate's benchmarks produces this output: |
| 30 | + |
| 31 | + BenchmarkDecodeDigitsSpeed1e4-8 100 154125 ns/op 64.88 MB/s 40418 B/op 7 allocs/op |
| 32 | + BenchmarkDecodeDigitsSpeed1e5-8 10 1367632 ns/op 73.12 MB/s 41356 B/op 14 allocs/op |
| 33 | + BenchmarkDecodeDigitsSpeed1e6-8 1 13879794 ns/op 72.05 MB/s 52056 B/op 94 allocs/op |
| 34 | + BenchmarkDecodeDigitsDefault1e4-8 100 147551 ns/op 67.77 MB/s 40418 B/op 8 allocs/op |
| 35 | + BenchmarkDecodeDigitsDefault1e5-8 10 1197672 ns/op 83.50 MB/s 41508 B/op 13 allocs/op |
| 36 | + BenchmarkDecodeDigitsDefault1e6-8 1 11808775 ns/op 84.68 MB/s 53800 B/op 80 allocs/op |
| 37 | + BenchmarkDecodeDigitsCompress1e4-8 100 143348 ns/op 69.76 MB/s 40417 B/op 8 allocs/op |
| 38 | + BenchmarkDecodeDigitsCompress1e5-8 10 1185527 ns/op 84.35 MB/s 41508 B/op 13 allocs/op |
| 39 | + BenchmarkDecodeDigitsCompress1e6-8 1 11740304 ns/op 85.18 MB/s 53800 B/op 80 allocs/op |
| 40 | + BenchmarkDecodeTwainSpeed1e4-8 100 143665 ns/op 69.61 MB/s 40849 B/op 15 allocs/op |
| 41 | + BenchmarkDecodeTwainSpeed1e5-8 10 1390359 ns/op 71.92 MB/s 45700 B/op 31 allocs/op |
| 42 | + BenchmarkDecodeTwainSpeed1e6-8 1 12128469 ns/op 82.45 MB/s 89336 B/op 221 allocs/op |
| 43 | + BenchmarkDecodeTwainDefault1e4-8 100 141916 ns/op 70.46 MB/s 40849 B/op 15 allocs/op |
| 44 | + BenchmarkDecodeTwainDefault1e5-8 10 1076669 ns/op 92.88 MB/s 43820 B/op 28 allocs/op |
| 45 | + BenchmarkDecodeTwainDefault1e6-8 1 10106485 ns/op 98.95 MB/s 71096 B/op 172 allocs/op |
| 46 | + BenchmarkDecodeTwainCompress1e4-8 100 138516 ns/op 72.19 MB/s 40849 B/op 15 allocs/op |
| 47 | + BenchmarkDecodeTwainCompress1e5-8 10 1227964 ns/op 81.44 MB/s 43316 B/op 25 allocs/op |
| 48 | + BenchmarkDecodeTwainCompress1e6-8 1 10040347 ns/op 99.60 MB/s 72120 B/op 173 allocs/op |
| 49 | + BenchmarkEncodeDigitsSpeed1e4-8 30 482808 ns/op 20.71 MB/s |
| 50 | + BenchmarkEncodeDigitsSpeed1e5-8 5 2685455 ns/op 37.24 MB/s |
| 51 | + BenchmarkEncodeDigitsSpeed1e6-8 1 24966055 ns/op 40.05 MB/s |
| 52 | + BenchmarkEncodeDigitsDefault1e4-8 20 655592 ns/op 15.25 MB/s |
| 53 | + BenchmarkEncodeDigitsDefault1e5-8 1 13000839 ns/op 7.69 MB/s |
| 54 | + BenchmarkEncodeDigitsDefault1e6-8 1 136341747 ns/op 7.33 MB/s |
| 55 | + BenchmarkEncodeDigitsCompress1e4-8 20 668083 ns/op 14.97 MB/s |
| 56 | + BenchmarkEncodeDigitsCompress1e5-8 1 12301511 ns/op 8.13 MB/s |
| 57 | + BenchmarkEncodeDigitsCompress1e6-8 1 137962041 ns/op 7.25 MB/s |
| 58 | + |
| 59 | +The testing package always reports ns/op, and each benchmark can request the addition of MB/s (throughput) and also B/op and allocs/op (allocation rates). |
| 60 | + |
| 61 | +### Benchmark processors |
| 62 | + |
| 63 | +Multiple tools have been written that process this format, |
| 64 | +most notably [benchcmp](https://godoc.org/golang.org/x/tools/cmd/benchcmp) |
| 65 | +and its more statistically valid successor [benchstat](https://godoc.org/rsc.io/benchstat). |
| 66 | +There is also [benchmany](https://godoc.org/github.com/aclements/go-misc/benchmany)'s plot subcommand |
| 67 | +and likely more unpublished programs. |
| 68 | + |
| 69 | +### Benchmark runners |
| 70 | + |
| 71 | +Multiple tools have also been written that process this format. |
| 72 | +In addition to the standard Go testing package, |
| 73 | +[compilebench](https://godoc.org/rsc.io/compilebench) |
| 74 | +generates this data format based on runs of the Go compiler, |
| 75 | +and Austin's unpublished shellbench generates this data format |
| 76 | +after running an arbitrary shell command. |
| 77 | + |
| 78 | +The [golang.org/x/benchmarks/bench](https://golang.org/x/benchmarks/bench) benchmarks |
| 79 | +are notable for _not_ generating this format, |
| 80 | +which has made all analysis of those results |
| 81 | +more complex than we believe it should be. |
| 82 | +We intend to update those benchmarks to generate the standard format, |
| 83 | +once a standard format is defined. |
| 84 | +Part of the motivation for the proposal is to avoid |
| 85 | +the need to process custom output formats in future benchmarks. |
| 86 | + |
| 87 | +## Proposal |
| 88 | + |
| 89 | +A Go benchmark data file is a textual file consisting of a sequence of lines. |
| 90 | +Configuration lines and benchmark result lines, described below, |
| 91 | +have semantic meaning in the reporting of benchmark results. |
| 92 | + |
| 93 | +All other lines in the data file, including but not limited to |
| 94 | +blank lines and lines beginning with a # character, are ignored. |
| 95 | +For example, the testing package prints test results above benchmark data, |
| 96 | +usually the text `PASS`. that line is neither a configuration line nor a benchmark |
| 97 | +result line, so it is ignored. |
| 98 | + |
| 99 | +### Configuration Lines |
| 100 | + |
| 101 | +A configuration line is a key-value pair of the form |
| 102 | + |
| 103 | + key: value |
| 104 | + |
| 105 | +where key contains no space characters (as defined by `unicode.IsSpace`) |
| 106 | +nor upper case characters (as defined by `unicode.IsUpper`), |
| 107 | +and space characters separate “key:” from “value.” |
| 108 | +Conventionally, multiword keys are written with the words |
| 109 | +There are no restrictions on value, except that it cannot contain a newline character. |
| 110 | +Value can be omitted entirely but the colon must still be present. |
| 111 | + |
| 112 | +The interpretation of a key/value pair is up to tooling, but the key/value pair |
| 113 | +is considered to describe all benchmark results that follow, |
| 114 | +until overwritten by a configuration line with the same key. |
| 115 | + |
| 116 | +### Benchmark Results |
| 117 | + |
| 118 | +A benchmark result line has the general form |
| 119 | + |
| 120 | + <name> <iterations> <value> <unit> [<value> <unit>...] |
| 121 | + |
| 122 | +The fields are separated by runs of space characters (as defined by `unicode.IsSpace`), |
| 123 | +so the line can be parsed with `strings.Fields`. |
| 124 | +The line must have an even number of fields, and at least four. |
| 125 | + |
| 126 | +The first field is the benchmark name, which must begin with `Benchmark` |
| 127 | +and is typically followed by a capital letter, as in `BenchmarkReverseString`. |
| 128 | +Tools displaying benchmark data conventionally omit the `Benchmark` prefix. |
| 129 | +The same benchmark name can appear on multiple result lines, |
| 130 | +indicating that the benchmark was run multiple times. |
| 131 | + |
| 132 | +The second field gives the number of iterations run. |
| 133 | +For most processing this number can be ignored, although |
| 134 | +it may give some indication of the expected accuracy |
| 135 | +of the measurements that follow. |
| 136 | + |
| 137 | +The remaining fields report value/unit pairs in which the value |
| 138 | +is a float64 that can be parsed by `strconv.ParseFloat` |
| 139 | +and the unit explains the value, as in “64.88 MB/s”. |
| 140 | +The units reported are typically normalized so that they can be |
| 141 | +interpreted without considering to the number of iterations. |
| 142 | +In the example, the CPU cost is reported per-operation and the |
| 143 | +throughput is reported per-second; neither is a total that |
| 144 | +depends on the number of iterations. |
| 145 | + |
| 146 | +### Value Units |
| 147 | + |
| 148 | +A value's unit string is expected to specify not only the measurement unit |
| 149 | +but also, as needed, a description of what is being measured. |
| 150 | +For example, a benchmark might report its overall execution time |
| 151 | +as well as cache miss times with three units “ns/op,” “L1-miss-ns/op,”and “L2-miss-ns/op.” |
| 152 | + |
| 153 | +Tooling can expect that the unit strings are identical for all runs to be compared; |
| 154 | +for example, a result reporting “ns/op” need not be considered comparable |
| 155 | +to one reporting “µs/op.” |
| 156 | + |
| 157 | +However, tooling may assume that the measurement unit is the final |
| 158 | +of the hyphen-separated words in the unit string and may recognize |
| 159 | +and rescale known measurement units. |
| 160 | +For example, consistently large “ns/op” or “L1-miss-ns/op” |
| 161 | +might be rescaled to “ms/op” or “L1-miss-ms/op” for display. |
| 162 | + |
| 163 | +### Benchmark Name Configuration |
| 164 | + |
| 165 | +In the current testing package, benchmark names correspond to Go identifiers: |
| 166 | +each benchmark must be written as a different Go function. |
| 167 | +[Work targeted for Go 1.7](https://github.com/golang/proposal/blob/master/design/12166-subtests.md) will allow tests and benchmarks |
| 168 | +to define sub-tests and sub-benchmarks programatically, |
| 169 | +in particular to vary interesting parameters both when |
| 170 | +testing and when benchmarking. |
| 171 | +That work uses a slash to separate the name of a benchmark |
| 172 | +collection from the description of a sub-benchmark. |
| 173 | + |
| 174 | +We propose that sub-benchmarks adopt the convention of |
| 175 | +choosing names that are key:value pairs; |
| 176 | +that slash-prefixed key:value pairs in the benchmark name are |
| 177 | +treated by benchmark data processors as per-benchmark |
| 178 | +configuration values; |
| 179 | +and that for sub-benchmarks the -N suffix to describe the |
| 180 | +GOMAXPROCS value is expanded to /gomaxprocs:N. |
| 181 | + |
| 182 | +### Example |
| 183 | + |
| 184 | +The benchmark output given in the background section above |
| 185 | +is already in the format proposed here. |
| 186 | +That is a key feature of the proposal. |
| 187 | + |
| 188 | +However, a future run of the benchmark might add configuration lines, |
| 189 | +and the benchmark might be rewritten to use sub-benchmarks, |
| 190 | +producing this output: |
| 191 | + |
| 192 | + commit: 7cd9055 |
| 193 | + commit-time: 2016-02-11T13:25:45-0500 |
| 194 | + goos: darwin |
| 195 | + goarch: amd64 |
| 196 | + cpu: Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz |
| 197 | + cpu-count: 8 |
| 198 | + cpu-physical-count: 4 |
| 199 | + os: Mac OS X 10.11.3 |
| 200 | + mem: 16 GB |
| 201 | + |
| 202 | + BenchmarkDecode/text:digits/level:speed/size:1e4/gomaxprocs:8 100 154125 ns/op 64.88 MB/s 40418 B/op 7 allocs/op |
| 203 | + BenchmarkDecode/text:digits/level:speed/size:1e5/gomaxprocs:8 10 1367632 ns/op 73.12 MB/s 41356 B/op 14 allocs/op |
| 204 | + BenchmarkDecode/text:digits/level:speed/size:1e6/gomaxprocs:8 1 13879794 ns/op 72.05 MB/s 52056 B/op 94 allocs/op |
| 205 | + BenchmarkDecode/text:digits/level:default/size:1e4/gomaxprocs:8 100 147551 ns/op 67.77 MB/s 40418 B/op 8 allocs/op |
| 206 | + BenchmarkDecode/text:digits/level:default/size:1e5/gomaxprocs:8 10 1197672 ns/op 83.50 MB/s 41508 B/op 13 allocs/op |
| 207 | + BenchmarkDecode/text:digits/level:default/size:1e6/gomaxprocs:8 1 11808775 ns/op 84.68 MB/s 53800 B/op 80 allocs/op |
| 208 | + BenchmarkDecode/text:digits/level:best/size:1e4/gomaxprocs:8 100 143348 ns/op 69.76 MB/s 40417 B/op 8 allocs/op |
| 209 | + BenchmarkDecode/text:digits/level:best/size:1e5/gomaxprocs:8 10 1185527 ns/op 84.35 MB/s 41508 B/op 13 allocs/op |
| 210 | + BenchmarkDecode/text:digits/level:best/size:1e6/gomaxprocs:8 1 11740304 ns/op 85.18 MB/s 53800 B/op 80 allocs/op |
| 211 | + BenchmarkDecode/text:twain/level:speed/size:1e4/gomaxprocs:8 100 143665 ns/op 69.61 MB/s 40849 B/op 15 allocs/op |
| 212 | + BenchmarkDecode/text:twain/level:speed/size:1e5/gomaxprocs:8 10 1390359 ns/op 71.92 MB/s 45700 B/op 31 allocs/op |
| 213 | + BenchmarkDecode/text:twain/level:speed/size:1e6/gomaxprocs:8 1 12128469 ns/op 82.45 MB/s 89336 B/op 221 allocs/op |
| 214 | + BenchmarkDecode/text:twain/level:default/size:1e4/gomaxprocs:8 100 141916 ns/op 70.46 MB/s 40849 B/op 15 allocs/op |
| 215 | + BenchmarkDecode/text:twain/level:default/size:1e5/gomaxprocs:8 10 1076669 ns/op 92.88 MB/s 43820 B/op 28 allocs/op |
| 216 | + BenchmarkDecode/text:twain/level:default/size:1e6/gomaxprocs:8 1 10106485 ns/op 98.95 MB/s 71096 B/op 172 allocs/op |
| 217 | + BenchmarkDecode/text:twain/level:best/size:1e4/gomaxprocs:8 100 138516 ns/op 72.19 MB/s 40849 B/op 15 allocs/op |
| 218 | + BenchmarkDecode/text:twain/level:best/size:1e5/gomaxprocs:8 10 1227964 ns/op 81.44 MB/s 43316 B/op 25 allocs/op |
| 219 | + BenchmarkDecode/text:twain/level:best/size:1e6/gomaxprocs:8 1 10040347 ns/op 99.60 MB/s 72120 B/op 173 allocs/op |
| 220 | + BenchmarkEncode/text:digits/level:speed/size:1e4/gomaxprocs:8 30 482808 ns/op 20.71 MB/s |
| 221 | + BenchmarkEncode/text:digits/level:speed/size:1e5/gomaxprocs:8 5 2685455 ns/op 37.24 MB/s |
| 222 | + BenchmarkEncode/text:digits/level:speed/size:1e6/gomaxprocs:8 1 24966055 ns/op 40.05 MB/s |
| 223 | + BenchmarkEncode/text:digits/level:default/size:1e4/gomaxprocs:8 20 655592 ns/op 15.25 MB/s |
| 224 | + BenchmarkEncode/text:digits/level:default/size:1e5/gomaxprocs:8 1 13000839 ns/op 7.69 MB/s |
| 225 | + BenchmarkEncode/text:digits/level:default/size:1e6/gomaxprocs:8 1 136341747 ns/op 7.33 MB/s |
| 226 | + BenchmarkEncode/text:digits/level:best/size:1e4/gomaxprocs:8 20 668083 ns/op 14.97 MB/s |
| 227 | + BenchmarkEncode/text:digits/level:best/size:1e5/gomaxprocs:8 1 12301511 ns/op 8.13 MB/s |
| 228 | + BenchmarkEncode/text:digits/level:best/size:1e6/gomaxprocs:8 1 137962041 ns/op 7.25 MB/s |
| 229 | + |
| 230 | +Using sub-benchmarks has benefits beyond this proposal, namely that it would |
| 231 | +avoid the current repetitive code: |
| 232 | + |
| 233 | + func BenchmarkDecodeDigitsSpeed1e4(b *testing.B) { benchmarkDecode(b, digits, speed, 1e4) } |
| 234 | + func BenchmarkDecodeDigitsSpeed1e5(b *testing.B) { benchmarkDecode(b, digits, speed, 1e5) } |
| 235 | + func BenchmarkDecodeDigitsSpeed1e6(b *testing.B) { benchmarkDecode(b, digits, speed, 1e6) } |
| 236 | + func BenchmarkDecodeDigitsDefault1e4(b *testing.B) { benchmarkDecode(b, digits, default_, 1e4) } |
| 237 | + func BenchmarkDecodeDigitsDefault1e5(b *testing.B) { benchmarkDecode(b, digits, default_, 1e5) } |
| 238 | + func BenchmarkDecodeDigitsDefault1e6(b *testing.B) { benchmarkDecode(b, digits, default_, 1e6) } |
| 239 | + func BenchmarkDecodeDigitsCompress1e4(b *testing.B) { benchmarkDecode(b, digits, compress, 1e4) } |
| 240 | + func BenchmarkDecodeDigitsCompress1e5(b *testing.B) { benchmarkDecode(b, digits, compress, 1e5) } |
| 241 | + func BenchmarkDecodeDigitsCompress1e6(b *testing.B) { benchmarkDecode(b, digits, compress, 1e6) } |
| 242 | + func BenchmarkDecodeTwainSpeed1e4(b *testing.B) { benchmarkDecode(b, twain, speed, 1e4) } |
| 243 | + func BenchmarkDecodeTwainSpeed1e5(b *testing.B) { benchmarkDecode(b, twain, speed, 1e5) } |
| 244 | + func BenchmarkDecodeTwainSpeed1e6(b *testing.B) { benchmarkDecode(b, twain, speed, 1e6) } |
| 245 | + func BenchmarkDecodeTwainDefault1e4(b *testing.B) { benchmarkDecode(b, twain, default_, 1e4) } |
| 246 | + func BenchmarkDecodeTwainDefault1e5(b *testing.B) { benchmarkDecode(b, twain, default_, 1e5) } |
| 247 | + func BenchmarkDecodeTwainDefault1e6(b *testing.B) { benchmarkDecode(b, twain, default_, 1e6) } |
| 248 | + func BenchmarkDecodeTwainCompress1e4(b *testing.B) { benchmarkDecode(b, twain, compress, 1e4) } |
| 249 | + func BenchmarkDecodeTwainCompress1e5(b *testing.B) { benchmarkDecode(b, twain, compress, 1e5) } |
| 250 | + func BenchmarkDecodeTwainCompress1e6(b *testing.B) { benchmarkDecode(b, twain, compress, 1e6) } |
| 251 | + |
| 252 | +More importantly for this proposal, using sub-benchmarks also makes the possible |
| 253 | +comparison axes clear: digits vs twait, speed vs default vs best, size 1e4 vs 1e5 vs 1e6. |
| 254 | + |
| 255 | +## Rationale |
| 256 | + |
| 257 | +As discussed in the background section, |
| 258 | +we have already developed a number of analysis programs |
| 259 | +that assume this proposal's format, |
| 260 | +as well as a number of programs that generate this format. |
| 261 | +Standardizing the format should encourage additional work |
| 262 | +on both kinds of programs. |
| 263 | + |
| 264 | +[Issue 12826](https://golang.org/issue/12826) suggests a different approach, |
| 265 | +namely the addition of a new `go test` option `-benchformat`, to control |
| 266 | +the format of benchmark output. In fact it gives the lack of standardization |
| 267 | +as the main justification for a new option: |
| 268 | + |
| 269 | +> Currently `go test -bench .` prints out benchmark results in a |
| 270 | +> certain format, but there is no guarantee that this format will not |
| 271 | +> change. Thus a tool that parses go test output may break if an |
| 272 | +> incompatible change to the output format is made. |
| 273 | +
|
| 274 | +Our approach is instead to guarantee that the format will not change, |
| 275 | +or rather that it will only change in ways allowed by this design. |
| 276 | +An analysis tool that parses the output specified here will not break |
| 277 | +in future versions of Go, |
| 278 | +and a tool that generates the output specified here will work |
| 279 | +with all such analysis tools. |
| 280 | +Having one agreed-upon format enables broad interoperation; |
| 281 | +the ability for one tool to generate arbitrarily many different formats |
| 282 | +does not achieve the same result. |
| 283 | + |
| 284 | +The proposed format also seems to be extensible enough to accommodate |
| 285 | +anticipated future work on benchmark reporting. |
| 286 | + |
| 287 | +The main known issue with the current `go test -bench` is that |
| 288 | +we'd like to emit finer-grained detail about runs, for linearity testing |
| 289 | +and more robust statistics. |
| 290 | +This proposal allows that by simply printing more result lines. |
| 291 | + |
| 292 | +Another known issue is that we may want to add custom outputs |
| 293 | +such as garbage collector statistics to certain benchmark runs. |
| 294 | +This proposal allows that by adding more value-unit pairs. |
| 295 | + |
| 296 | +## Compatibility |
| 297 | + |
| 298 | +Tools consuming existing benchmark format may need trivial changes |
| 299 | +to ignore non-benchmark result lines or to cope with additional value-unit pairs |
| 300 | +in benchmark results. |
| 301 | + |
| 302 | +## Implementation |
| 303 | + |
| 304 | +The benchmark format described here is already generated by `go test -bench` |
| 305 | +and expected by tools like `benchcmp` and `benchstat`. |
| 306 | + |
| 307 | +The format is trivial to generate, and it is |
| 308 | +straightforward but not quite trivial to parse. |
| 309 | + |
| 310 | +We anticipate that the [new x/perf subrepo](https://github.com/golang/go/issues/14304) will include a library for loading |
| 311 | +benchmark data from files, although the format is also simple enough that |
| 312 | +tools that want a different in-memory representation might reasonably |
| 313 | +write separate parsers. |
| 314 | + |
0 commit comments