-
Notifications
You must be signed in to change notification settings - Fork 0
Build with default UM7 optimisation levels set at the top level #170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
The model version in the |
|
🚀 Attempted to deploy 🖥️
|
|
🚀 Attempted to deploy 🖥️
|
|
🚀 Attempted to deploy 🖥️
|
|
🚀 Attempted to deploy 🖥️
|
|
🚀 Attempted to deploy 🖥️
|
|
🚀 Attempted to deploy 🖥️
|
|
!redeploy |
|
🚀 Attempted to deploy 🖥️
|
|
🚀 Attempted to deploy 🖥️
|
|
🚀 Attempted to deploy 🖥️
|
|
🚀 Attempted to deploy 🖥️
|
|
🚀 Attempted to deploy 🖥️
|
|
The last build with |
|
🚀 Attempted to deploy 🖥️
|
…th qopt-zmm-high and always vectorise
|
🚀 Attempted to deploy 🖥️
|
|
Interestingly, this last build (with these changes to UM7 SPR) crashed with this error: gc_abort (Processor 192): over-writing due to dim_e_out size
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 192 in communicator MPI_COMM_WORLD
Proc: [[62655,1],192]
Errorcode: 9
<snip>
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
libpthread-2.28.s 000015336D23D990 Unknown Unknown Unknown
libopen-pal.so.80 000015336A5B9494 opal_progress Unknown Unknown
libmpi.so.40.40.7 000015336A69FEC2 Unknown Unknown Unknown
libmpi.so.40.40.7 000015336A704481 Unknown Unknown Unknown
libmpi.so.40.40.7 000015336A7463EA Unknown Unknown Unknown
libmpi.so.40.40.7 000015336A6BD824 PMPI_Bcast Unknown Unknown
libmpi_mpifh_Inte 000015336DE19765 PMPI_BCAST Unknown Unknown
um_hg3.exe 0000000001202FFC mpl_bcast 59 mpl_bcast.F90
um_hg3.exe 0000000001202E4B gcg_rvecsumr 218 gcg_rvecsumr.F90Since the source file only contains 128 lines, and the traceback shows a line number of 218, I made an educated guess that the offending line is this one. (The traceback shows the call to |
…se, and add unroll
|
🚀 Attempted to deploy 🖥️
|
|
Judging by the size of the output file, build 18 has gone past the stage where build 17 crashed. |
|
All three runs finished with deployment 18, the performance is worse, and the results are not bitwise identical. |
|
🚀 Attempted to deploy 🖥️
|
|
Somewhat surprisingly, deployment 19 is by far the slowest of all the working deployments (throughput ~20.1 years/day). Going back to deployment 12 - which builds with the compiler pragma updates to UM7; and then adding in the source update for disabling bitwise-repro in GCOM4 (this branch which sets |
|
🚀 Attempted to deploy 🖥️
|
|
🚀 Attempted to deploy 🖥️
|
|
🚀 Attempted to deploy 🖥️
|
Testing for compiler flag updates to SPRs - do not merge (at least yet)
🚀 The latest prerelease
access-esm1p6/pr170-22at f32e5e8 is here: #170 (comment) 🚀