-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document running a simple example or test manually #24
Comments
I think this can be shown through the tutorial we already have in the GIS4WRF plugin. I can create a new one based on this and have it under the tutorials section in GIS4WRF to avoid creating an additional website for WRF-CMake. How does this sound @zbeekman, @letmaik? |
A brief description with a link to more detailed instructions/documentation is fine, IMO, since the main point of the WRF-CMake project is to provide a new build system for the upstream project. You're certainly welcome to add your own details, but, for someone who has never used WRF before it would be good to point them in the right direction. You're welcome to add more detailed instructions/examples but pointing a user to existing WRF documentation upstream or in other projects would be suitable (for me at least) provided that it is sufficiently clear and up-to-date with this project/work. |
@zbeekman I would normally agree that linking to official WRF docs is the right thing to do. Unfortunately they are huge and in my opinion hard to follow. There's no simple end-to-end example as far as I see. That's why @dmey was suggesting using the existing GIS4WRF tutorial and copy it to a non-GUI tutorial. As a side effect we can also point users to the GIS/visualization capabilities of GIS4WRF, which is very helpful for such tutorials. It's a tiny bit of extra work but I think worth it. |
Sounds good to me. If there are no easy to follow examples, then it may make sense to provide your own. |
@zbeekman and @letmaik I have update the README.md with some clean up, and additional information re docs and example usage as requested. See https://github.com/WRF-CMake/WRF/blob/dmey/docs/README.md. The With regards to the tutorial using WRF-CMake on the GIS4WRF website I link from the README.md, I am keeping it on a branch at https://github.com/GIS4WRF/gis4wrf.github.io/tree/dmey/tutorials just in case there are additional changes to make -- all changes from this branch are incorporated into the live website. @letmaik are you happy with the revised installation section in README.md? I think it makes it easier/clearer to follow now esp with the additional brew section as I think that most macOS and Linux end-users will probably want to install using Homebrew/Linuxbrew since it makes things very straightforward. @letmaik, @zbeekman raised a good point about testing (see 2.). I don't this we have addressed it -- let me know cos this may be a bit tricky to handle if have to support testing for end users... |
@dmey There is no separate wps formula. This gets installed together with wrf. See the formula for details why. Also, I would split this into two lines and not use Regarding unit testing, well, WRF doesn't really have tests. Running a simulation and looking at the results is the best you will get. |
@zbeekman in the online tutorial I use the |
@letmaik now moved to two lines. Re the other issue I don't think he was referring to unit testing but to what we do already in CI:
|
From the tutorial:
The qgis formula is very complex. I had to Anyway, that's not your problem. However, in the tutorial, if I had known ahead of time just how many packages and duplicate packages were needed for qgis3 via Homebrew, then I probably would have just done Despite this, your plugin and its ability to integrate with and drive WRF looks powerful. I'm hoping these 100 prerequisite packages for qgis will finish installing sometime tonight so that I can finish the tutorial and checkout the plugin. |
My bad! I should have suggested to just use |
# Note: The following involves downloading 1 GB of reference data and running simulations for 10-30min.
git clone https://github.com/WRF-CMake/wats.git
# Install Python packages, either via conda:
conda env create -n wats -f wats/environment.yml
conda activate wats
# Or via pip:
pip install -r wats/requirements.txt
# Run test cases
# E.g. for brew: --wrf-dir $(brew --cellar wrf-cmake)/4.1.0/wrf --wps-dir $(brew --cellar wrf-cmake)/4.1.0/wps
python wats/wats/main.py run --mode wrf --mpi --wrf-dir /path/to/wrf --wps-dir /path/to/wps
# Note: replace Linux with macOS/Windows as appropriate
mv wats/work/output wats_Linux_CMake_Release_dmpar
# Download reference data to compare against
# 1. Go to https://dev.azure.com/WRF-CMake/WRF/_build?definitionId=5
# 2. Select a successful build from Branch "wrf-cmake"
# 3. Click on Summary
# 4. Download wats_Linux_Make_Debug_serial build artifact (~1 GB)
# 5. Extract archive to current folder
# Plots
python wats/wats/plots.py compute wats_Linux_Make_Debug_serial wats_Linux_CMake_Release_dmpar
python wats/wats/plots.py plot --skip-detailed
ls wats/plots
# Compare magnitudes in nrmse.png and ext_boxplot.png with plots in JOSS paper. @zbeekman This replicates what the CI does. I realize it could be a bit more automated. Also, the fact of having to download 1 GB reference data is not ideal. Still, do you think think it is sufficient for the time being? Anything different would probably mean creating new test cases etc. which I'd like to avoid. |
Yes, this is fine. Sorry I've had to attend to some pressing work stuff the past few days. To finish my review, I think the only remaining tasks are:
I'll do my best to wrap this up tomorrow |
I've documented this now in bb0eb92. |
Hi @zbeekman, have you been able to look at these #24 (comment)? Thanks. Let me know if there's something not clear or that looks like it would take a lot of your time and I can simplify the process! 😃. Thanks |
Sorry for being out of touch, I got inundated. I'm testing on SGI/HPE (Intel) and Cray/Cray right now. The builds seem to get a bit bogged down, I'm hoping I don't have to go get my own node to do the builds, but we'll see. I'm not sure if you're using CMake's standard FindNetCDF or a custom one, but the SGI/HPE machine would populate Also, it looks like WRF doesn't like being compiled against Cray's libhugetlbfs, FWIW.
I've also verified the pre-built binaries seem to work well. I like what you did with packaging the dylibs. That's pretty clever. All that remains is to wait for the SGI/HPE and Cray builds to finish 🤞 and finish going through the example problem (which I'll do while I wait for the builds) |
No problems! We recently changed the way FindNetCDF finds the library as we had a few issues on some systems where NetCDF-C and NetCDF-Fortran used different directories. Are you using:
We updated Note for HPC users relying on the Modules package recently... Let me test this on the Cray at my end to see if I also get a few issue -- it has been a few weeks since I last tried on Cray. With regards to the search using |
Yes, and that works completely fine. On the cray, those are both the same directories and are set to the output of This is really not an issue at all, merely a suggestion since it is nice to find things to save the user typing and hunting down paths etc. I'm not sure what's going on on my Cray build right now. It's been stuck right after building and linking fftpack (44%). I cloned from master though, so maybe I should go back and grab a tag. Eventually the Cray compiler spit out the following cryptic and confusing message:
But the process doesn't appear to have been killed yet, so I'm gonna let it sit there a little longer. It is a Cray XC40/50 with Intel Xeon E5-2699v4 Broadwell CPUs on standard compute and head nodes, running SLES with 8GB available to users on the head nodes. Now that I've looked up the available memory, I'm guessing I'm spilling into swap or otherwise running out of memory during the build so I'll grab a batch/compute node and try the build there. |
@dmey: I'd give the cray build another shot; for me, it still seems to stall out (albeit, I am on a batch node, now, not a compute node, but I still have more memory here than the login node... I may try to kill this and restart it on the batch node proper...) Here are the modules I have loaded:
And I'm trying to build with:
I'm going to try launching the build on the compute node itself via |
@letmaik just to confirm that the results I'm seeing make sense: Is the % relative error in w high because w is close to zero? (Is w the radial/out-of-plane/vertical direction?) |
Also, unless anyone disagrees, let's close this, I'm satisfied with the tutorial and |
@zbeekman Regarding interpretation of results, w (vertical component of wind velocity) is around 0.01, so yes, but I don't remember exactly what the reason could be. The take away is that you see the same errors with the existing Makefile-based build, and we're not trying to solve that. |
Gah, I always forget that CMake doesn't maintain one... Also, newer versions of NetCDF ship a CMake build system which should be capable of installing a CMake Package Config file to export the installed targets, but, IIRC, last time I tried the NetCDF CMake build I hit some errors. In an ideal world, people would have the CMake build generate pkg-config files for NetCDF AND CMake package config files, then stop installing it with auto-tools. I'm not sure that will ever happen though. |
I just read the latest draft of the JOSS paper which reminded me of that fact shortly after I typed that, but thanks! This makes sense to me; most of the time there isn't much vertical convection and the atmosphere is often (usually) vertically stabilized due to the negative lapse rate. (If my memory from the one geophysical fluid dynamics course I took in grad school is accurate.) If the magnitude of I wonder why there is such a big difference between OSes, though. Maybe GCC needs to be compiled differently on macOS? Or maybe it's generating different/strange machine code on macOS?
|
Speculatively, I would attribute the large errors we see in w due to convection -- I have not looked into this but given the amount of parameterization involved I am not surprised to see larger errors in the vertical then in the horizontal components. The take home message is still that after t0 results deviates from each others but more due to a change in platform than a change in build system. |
FYI, build seems to be progressing (VERY SLOWLY) on a compute node on the Cray. I wonder if the Cray compiler is just slower? Or maybe it's doing some very aggressive link-time optimization? I think Cray's link statically by default most of the time, so perhaps that paired with link-time optimization makes the linker very slow. Also, I updated the JOSS issue to indicate I'm finished with my review, and recommend (enthusiastically) publication pending merging the PR with contributing guidelines etc. Hopefully this will help spur the other reviewer into action. |
Thanks @zbeekman you are too fast! 😄
Great question! |
With regards to Cray! I have tried to compile
please let me know if you also get the same. In Release, it's just too slow! Waited for 1 hour and still at 44 % so not sure what is going on there. Using Cray Fortran Version 8.5.8. @letmaik I believe that this may actually be an issue with the latest branch as when we tested this a while ago on Cray there were no issues. I do not actually think many use WRF on Cray... |
@dmey Possibly, I wouldn't be surprised if WRF 4.1 introduced new issues with Cray as they don't seem to regularly test on that. wrf/phys/module_mp_jensen_ishmael.F Lines 4515 to 4516 in 795c293
Yep, the comma in 4516 is too much... gfortran is probably more forgiving. |
At one point I saw a similar error, but I'm not 100% sure it was the same. That may actually be an out of memory issue or similar, because when I switched to the compute node, it seems to have gone away. I've been compiling for ~3 hours on a release build. Yeah, 4516 is not valid Fortran syntax. |
I fixed the syntax error. If anyone wants to retry, feel free. |
Takes too long... haha. If upstream isn't testing regularly, I'll assume the CMake build works approximately as well or better (for Cray) based on the evidence I've seen so far. |
Removal of comma in 4516 fixes the issue. I have been able to successfully build |
The JOSS review asks reviewers to verify:
While the primary purpose of this work is to extend the build system and enable and validate portability and build system simplification, which it certainly has done, it would be good to include on the readme additional things, even though they may be more the responsibility of the upstream project. (Of course you can quote with attribution, where appropriate.)
I think at a minimum, there are two pieces of information that need to be included in---or linked to from---the readme:
It's great that the CI pipeline does some assessment & acceptance testing automatically, but it would be beneficial for any human who wishes to verify that their installation is working to run some automated tests themselves, and setup and run a very simple example. Currently, there are no instructions on how to do this.
The text was updated successfully, but these errors were encountered: