-
Notifications
You must be signed in to change notification settings - Fork 117
Description
Many HPC performance tests feature a rich set of parameters. Consider, for example, test such as OSU, HPL or NCCL, which can be executed at different message sizes or problem sizes.
To gain insight it is useful to execute parameter sweeps and analyze the multidimensional result space, for example by plotting the performance vs size, or group values by parameters.
ReFrame's parameterization system allows for clean abstraction of test variants.
However, during large parameter sweeps, the current workflow generates a separate job per parameter combination.
This can be inefficient when the tests are executed frequently and the parameter space is large (the scheduler overhead is often larger than the test itself). Hence, it would be beneficial to have a mechanism to execute sequentially parameterized tests in a single job allocation.
Example use case:
- a linear algebra test (e.g. DGEMM, HPL, or DGEMM with cuBLAS) on different system sizes and plot the performance vs. input size.
- repeat a GPU test for different precisions for comparision
- run OSU or NCCL bandwidth tests and analyze bandwidth vs. message size
A possible enhancement would be to allow test developers to indicate which parameters are safe to bundle together into a single scheduler allocation.
For example, the code
message_sizes = parameter([128, 256, 512], bundle=True)
precision = parameter(['Single', 'Double', 'Half'], bundle=True)
could generate a job with two dimensional loop over all combinations instead of 9 independent jobs.
We’d appreciate your consideration and look forward to your thoughts.
Metadata
Metadata
Assignees
Type
Projects
Status