v12: Use RESTART_BY_OSERVER=YES for MPT small jobs #778
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Tests at NAS shows that single-node jobs with MPT are very slow with restart writes (sometimes 20 minutes at c24!).
The issue is that MPT has trouble with lots of MPI_GatherV calls in a row on a single node. One solution is to add MPI_Barrier calls but that means going deep into MAPL.
A "simpler" solution (no code change needed) is to just use our "write restart by oserver" functionality which doesn't use MPI_GatherV.
So this PR says "if you are a low-res job and running with MPT, just set
WRITE_RESTART_BY_OSERVER: YES. It's not perfect, but it's faster than fixing up MAPL for now.NOTE: @sshakoor1 this will probably need added to the python setup.