In short, the enclosed binary:
- In some circumstances1, provisions machines.
- Places scripts in
scripts/gen
onto remote machines, and executes them. - Downloads the scripts' output to
results
. - Converts the output into CSV.
- In some circumstances2, posts the results to Google Sheets.
1: When running on roachprod
-compatible platform, you can have roachprod
create and manage the VMs for you.
2: When running with the on-prem
flag, results are not stored in Google Sheets.
- Place a
.sh
script inscripts/gen
that executes the benchmark you want, and stores its output (through output redirection, i.e.>
). Whatever you name this will be the "artifact" you end up pulling down and parsing when the benchmark completes. - Create a script that parses the raw output of the benchmark into a complete CSV file, i.e. with a header, and should place the run's UUID as the first element. Add this script to
scripts/parse
. And then add a call to this script toscripts/parse/parse-dir.sh
. For inspiration, check any of the existing scripts inscripts/parse
. The initial scripts rely heavily onpcregrep
--sorry. - Add a new sheet to the tracking spreadsheet and identify the range where the values should be posted. Add this range so that it matches the format found in
filenameToSSRange
ingoogleSheets.go
. - Add a new
benchmark
to the globalbenchmarks
struct inmain.go
that references this file as abenchmarkRoutine
. Identify the output inbenchmark.artifact
.- If the benchmark requires an argument, check out
argConstArr
inmain.go
to see if it already exists. If not, please add the argument you need as a const with the other args (e.g.argCloudName
), and add it toargConstArr
. You can specify arguments within eachbenchmarkRoutine
.
- If the benchmark requires an argument, check out
How does this binary work?
It heavily relies on shelling out to bash scripts.
*What's up with passing an os.File to every function?
I wanted to parallelize the execution of all of the roachprod
-compatible runs, but I wanted to keep the high-fidelity logging I had for individual runs. To handle this, everything logs to some individual file that gets instantiated in one of the <platform>Run()
functions. I'm sure there's a better way to do this, but it was the path of least resistance at the time.
Why can't I successfully build n2-
class machines in GCP?
See roachprod: Add provisioning of GCP n2-class machines w/ local SSDs. Once that's merged, your roachprod
builds need to include this commit.
README.md
: High-level overview.cloudDetails
: Contains.json
files that describe the cloud providers and VMs to run the benchmarks on. Is unmarshalled into[]CloudDetails
.deployment-steps.md
: Instructions for provisioning VMs, or a high-level overview for what we expect to do viaroachprod
.googleSheets.go
: Contains all of the code to actually post results to Google Sheets.init.sh
: An initialization script placed on machines to prep them for the benchmarks. Because it contains instructions to unzip thescripts
dir, it should be left outside thescripts
dir.logs
: Initialization and run logs of the VMs. Automatically populated by the binary.- Structured as
logs/<cloud>/<machine type>/<YYYMMDD>/<run|init>/<run ID>
- Structured as
main.go
: Primary implementation of the binary.reproduction-steps.md
: Directions for using the binary to generate results.results
: The results of the benchmark runs collected from the machines.- Structured as
results/<cloud>/<machine type>/<YYYMMDD>/<run ID>
results/aggregate
contains a concatenation of all CSVs in theresults/<cloud>
dirs.
- Structured as
run-azure.sh
: A cludgy script to parallelize running on Azure. Generates a bunch of.txt
files to track the PIDs of the running processes.scripts
:aggregate
:aggregate_csv.sh
takes all of the CSVs in theresults/<cloud>
dirs, aggregates them, and places them inresults/aggregate
azure
: Contains scripts to run on Azure machines to collect metadata about the VMs. This will be obviated by running Azure inroachprod
, but provides a useful template for any platform we want to use that is ever outside ofroachprod
.gen
: Contains scripts that actually run the benchmarks and generate output.parse
: Contains scripts that convert raw output from the benchmarks and converts them into CSVs.
Note: For the binary to work, you should also have a credentials.json
and token.json
file in the root directory of this project. Both of which are part of setting up Google Sheets access.