v12: Use RESTART_BY_OSERVER=YES for MPT small jobs #778

mathomp4 · 2025-10-31T17:38:24Z

Tests at NAS shows that single-node jobs with MPT are very slow with restart writes (sometimes 20 minutes at c24!).

The issue is that MPT has trouble with lots of MPI_GatherV calls in a row on a single node. One solution is to add MPI_Barrier calls but that means going deep into MAPL.

A "simpler" solution (no code change needed) is to just use our "write restart by oserver" functionality which doesn't use MPI_GatherV.

So this PR says "if you are a low-res job and running with MPT, just set WRITE_RESTART_BY_OSERVER: YES. It's not perfect, but it's faster than fixing up MAPL for now.

NOTE: @sshakoor1 this will probably need added to the python setup.

v12: Use RESTART_BY_OSERVER=YES for MPT small jobs

8941cae

mathomp4 self-assigned this Oct 31, 2025

mathomp4 requested a review from a team as a code owner October 31, 2025 17:38

mathomp4 added the 0 diff The changes in this pull request have verified to be zero-diff with the target branch. label Oct 31, 2025

Merge branch 'feature/sdrabenh/gcm_v12' into feature/v12-mpt-single-node

9cbf4fc

sdrabenh approved these changes Nov 24, 2025

View reviewed changes

sdrabenh merged commit 5ca3876 into feature/sdrabenh/gcm_v12 Nov 24, 2025
11 of 13 checks passed

sdrabenh deleted the feature/v12-mpt-single-node branch November 24, 2025 17:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v12: Use RESTART_BY_OSERVER=YES for MPT small jobs #778

v12: Use RESTART_BY_OSERVER=YES for MPT small jobs #778

Uh oh!

mathomp4 commented Oct 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

v12: Use RESTART_BY_OSERVER=YES for MPT small jobs #778

v12: Use RESTART_BY_OSERVER=YES for MPT small jobs #778

Uh oh!

Conversation

mathomp4 commented Oct 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants