recompile MOM6 with latest Intel compilers and dependencies? #26

aekiss · 2023-02-22T03:24:31Z

From NCI's announcement today re. mid-cycle upgrade to Gadi included (emphasis mine):

The new system will contain 74,880 cores from Intel's newest fourth-generation Sapphire Rapids processors. A total of 720 nodes, each containing two 52-core CPUs, make up this latest upgrade. ... Users are recommended to use the latest versions of their software to maximise compatibility with the new hardware. Recompiling using the latest versions of the Intel compilers is also recommended to get the most out of this new CPU architecture.

As far as I can see /g/data/ik11/inputs/mom6/bin/symmetric_FMS2-e7d09b7 was compiled with intel-compiler/2019.3.199, which is actually the oldest one on NCI - it's now up to intel-compiler-llvm/2022.2.0 (or intel-compiler/2021.7.0 if we don't want llvm).

Should we recompile for the 1/40° run?

And while we're at it, should we upgrade the dependencies, e.g. we're using openmpi/4.1.2 and netcdf/4.7.4p but there are modules for openmpi/4.1.4 and netcdf/4.9.0p available.

The text was updated successfully, but these errors were encountered:

PaulSpence · 2023-03-27T01:28:21Z

Rui gave awesome presentation today (March 27) on profiling. 3 month, 15 or 5 day tests of global and panan on cascade and sapphire nodes with different layouts and different compilers. He has a big spreadsheet of numbers/data. Post it somewhere please?

MPI wait time is big waster in MOM6. Lot of wait time = load imbalance. Different layouts doesn't seem to improve the load imbalance. I'O needs to be looked at with a different tool. Sea ice is there as well. So each rank has ocean and sea ice that complicates the diagnostics.

This analysis includes initialization time for model. Maybe should just evaluate main loop for MPI imbalance. But 15 day vs 5 day runs don't dramatically change the MPI wait times -so initialisation not a big deal

Found the best compiler flags 🙂 but not a big difference.

Better performance on new saphire nodes. 8% better. MPI wait time problems are the same though. e.g. 50% MPT time total and 30% of that time is wasted.

aekiss · 2023-03-27T01:32:14Z

Rui also showed AVX256 is fastest; older non-LLVM compiler generates faster code; Sapphire Rapids is faster, probably due to larger cache memory and DDR5

micaeljtoliveira · 2023-04-03T02:50:14Z

I believe in the end the conclusion is that there's no need to recompile the code. So maybe we can close this issue? Note that there's another issue (#20) specifically for the scaling and optimization

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

recompile MOM6 with latest Intel compilers and dependencies? #26

recompile MOM6 with latest Intel compilers and dependencies? #26

aekiss commented Feb 22, 2023 •

edited

Loading

PaulSpence commented Mar 27, 2023

aekiss commented Mar 27, 2023

micaeljtoliveira commented Apr 3, 2023

recompile MOM6 with latest Intel compilers and dependencies? #26

recompile MOM6 with latest Intel compilers and dependencies? #26

Comments

aekiss commented Feb 22, 2023 • edited Loading

PaulSpence commented Mar 27, 2023

aekiss commented Mar 27, 2023

micaeljtoliveira commented Apr 3, 2023

aekiss commented Feb 22, 2023 •

edited

Loading