Benchmarking
How to write benchmarks in Go.
The Go testing package supports benchmarks to test the performance of code.
Test functions look like this:
And benchmark functions look like this:
How it works
A benchmark has some “code under test”:
The code under test in the benchmark will be executed b.N
times, where Go automatically adjusts the value of b.N
until the benchmark lasts long enough to be timed reliably.
Running benchmarks
Like test functions, benchmark functions live in *_test.go
files, and are run via the Go test command. But to run benchmarks, the -bench
test flag must be provided.
For example:
By default tests also run when running benchmarks. To prevent this use the -run ^$
test flag to run “no tests”.
For example:
About Go test modes
Go tests can be run in two different “modes”:
- Directory mode is in effect when
go test
is run without package arguments (e.g.go test
). Here Go compiles source and test files in the current directory.- Package list mode is in effect when
go test
is run with package arguments (e.g.go test ./...
). Here Go compiles source and test files for the listed packages.Only in mode 2 will Go cache successful package test results to avoid running tests unnecessarily in repeated tests. When tests are cached,
go test
prints(cached)
instead of the elapsed time in the summary. To disable caching use the flag-count=1
.
Controlling count
By default a benchmark runs once. But this can be controlled with the -count
test flag. It can be useful to run a benchmark multiple times to (better) verify it produces consistent results.
For example:
Alternatively, the actual number of iteration can be controlled by using the syntax Nx
. For example, this runs the benchmark for exactly 100 iterations:
Controlling duration
By default b.N
iterations for a benchmark are run for a duration of 1 second. But this may not be enough to produce a good enough sample size.
To increase benchmark duration use the -benchtime
test flag. It guarantees that a benchmark will run for at least that amount of time.
For example:
How to read benchmark results
Benchmark result have the following format:
For example:
- Column 1 shows the benchmark name, which always begins with
Benchmark
. - Column 2 shows the total number of iterations run during the benchmark.
- Column 3 shows the measured value. For example,
ns/op
indicates the average amount of time in nanoseconds it took one iteration to complete.
Comparing benchmark results
The benchstat command can be used to compare multiple benchmark results.
Important to keep in mind
- Each benchmark should be run at least 10 times to gather a statistically significant sample of results.
- Pick a number of benchmark runs (at least 10, ideally 20) and stick to it.
- Reducing noise and/or increasing the number of benchmark runs makes
benchstat
see smaller changes as “statistically significant”.
- To reduce noise, run benchmarks on an idle machine (i.e. close apps) and connect to a power source.
First install the command with:
Then save benchmark results to a text file. For example:
And compare them:
How to read comparison results
This example output:
Can be interpreted as follows:
±
percentage indicates “variation”. The lower the better: a high variation means unreliable samples, and that the benchmark needs to be re-run.- A negative percentage (
-17.20%
) means a benchmark was faster. A positive percentage means slower. p=
value measures how likely the differences were due to random chance.~
means there was no statistically significant difference between the two inputs.geomean
shows the geometric mean of each column.
Profiling benchmarks
Memory allocations can be printed in the results by providing the -benchmem
test flag. For example:
But it’s also possible to produce pprof
compatible profiles. For example:
The output .prof
file can then be used to generate a report with go tool pprof
.
Tips
Control the timer when doing setup
By default the entire run time of a benchmark function is measured. Go executes the benchmark many times, and divides total execution time by b.N
. This means that doing some sort of (expensive) setup can affect benchmark results.
To prevent misleading benchmark results, the timer can be controlled with the following functions:
For example:
Benchmark with multiple inputs
Like with regular test functions, you can use table-driven benchmarks and sub-benchmarks by invoking b.Run(name, f). Each b.Run
call creates and runs a separate benchmark.
For example:
To only run certain sub-benchmarks, provide a /
separated list of benchmark and sub-benchmark names to the -bench
test flag.
For example:
Gotcha’s
1. Compiler optimizations
It may happen that the compiler optimizes code under test in a benchmark. When this happens, the benchmark will seem faster that it really is.
This may happen with non-changing function inputs, and/or unused values.
For example:
Ways to mitigate this are by:
- Using runtime.KeepAlive().
- Assigning to a global exported value.
For example:
Or: