Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recompile MOM6 with latest Intel compilers and dependencies? #26

Open
aekiss opened this issue Feb 22, 2023 · 3 comments
Open

recompile MOM6 with latest Intel compilers and dependencies? #26

aekiss opened this issue Feb 22, 2023 · 3 comments

Comments

@aekiss
Copy link

aekiss commented Feb 22, 2023

From NCI's announcement today re. mid-cycle upgrade to Gadi included (emphasis mine):

The new system will contain 74,880 cores from Intel's newest fourth-generation Sapphire Rapids processors. A total of 720 nodes, each containing two 52-core CPUs, make up this latest upgrade. ... Users are recommended to use the latest versions of their software to maximise compatibility with the new hardware. Recompiling using the latest versions of the Intel compilers is also recommended to get the most out of this new CPU architecture.

As far as I can see /g/data/ik11/inputs/mom6/bin/symmetric_FMS2-e7d09b7 was compiled with intel-compiler/2019.3.199, which is actually the oldest one on NCI - it's now up to intel-compiler-llvm/2022.2.0 (or intel-compiler/2021.7.0 if we don't want llvm).

Should we recompile for the 1/40° run?

And while we're at it, should we upgrade the dependencies, e.g. we're using openmpi/4.1.2 and netcdf/4.7.4p but there are modules for openmpi/4.1.4 and netcdf/4.9.0p available.

@PaulSpence
Copy link

Rui gave awesome presentation today (March 27) on profiling. 3 month, 15 or 5 day tests of global and panan on cascade and sapphire nodes with different layouts and different compilers. He has a big spreadsheet of numbers/data. Post it somewhere please?

MPI wait time is big waster in MOM6. Lot of wait time = load imbalance. Different layouts doesn't seem to improve the load imbalance. I'O needs to be looked at with a different tool. Sea ice is there as well. So each rank has ocean and sea ice that complicates the diagnostics.

This analysis includes initialization time for model. Maybe should just evaluate main loop for MPI imbalance. But 15 day vs 5 day runs don't dramatically change the MPI wait times -so initialisation not a big deal

Found the best compiler flags 🙂 but not a big difference.

Better performance on new saphire nodes. 8% better. MPI wait time problems are the same though. e.g. 50% MPT time total and 30% of that time is wasted.

@aekiss
Copy link
Author

aekiss commented Mar 27, 2023

Rui also showed AVX256 is fastest; older non-LLVM compiler generates faster code; Sapphire Rapids is faster, probably due to larger cache memory and DDR5

@micaeljtoliveira
Copy link

I believe in the end the conclusion is that there's no need to recompile the code. So maybe we can close this issue? Note that there's another issue (#20) specifically for the scaling and optimization

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants