-
Notifications
You must be signed in to change notification settings - Fork 428
EAMxx: Option to run SHOC with no SGS variability (1.5 closure) #7188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…r 1p5 TKE closure
…nteractively and should be kept for output purposes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor formatting issues, otherwise looks good. Preliminary approval to get the CI to run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I had a question about one of the functions.
components/eamxx/src/physics/shoc/tests/infra/shoc_test_data.cpp
Outdated
Show resolved
Hide resolved
components/eamxx/src/physics/shoc/tests/infra/shoc_test_data.cpp
Outdated
Show resolved
Hide resolved
Shoc unit tests failing on both gpu and cpu. CIME cases seem fine. |
Total bummer summer. It looks like (from what I can tell) the SHOC property tests are passing but the b4b tests are failing for the routines I modified. Example, they are all failing with the error:
Probably related to me add the shoc_1p5tke bool to the data structure for these tests? Note that this was the part I was least confident about in my changes so if someone could double check this for me... |
@tcclevenger would you be able to check on the test that Peter B. posts about up above? |
@bogensch @AaronDonahue Yeah, unit test related, should be a simple fix. I'll take a look. |
DiagSecondMomentsData(36, 72, 73, 2), | ||
DiagSecondMomentsData(72, 72, 73, 2), | ||
DiagSecondMomentsData(128, 72, 73, 2), | ||
DiagSecondMomentsData(256, 72, 73, 2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bogensch Isn't this a bool? Why is it being set to 2
?
Thanks @tcclevenger for your help. This issue should be addressed now. |
So I have triggered the fails on my machine, and they are purely due to adding a data input in the shoc unit tests, so loading baselines complains that the size of the file data doesn't match the test data struct (which is expected). We can then merge and bless baselines and we are good. The issue is that it could be that this PR is non-BFB, but because we fail before comparing baselines we miss it, and we bless non-bfb changes. So then the thing to do would be remove That's way overkill in my opinion because all the CIME cases are passing, so I vote that I merge and bless. I'll do this if I can get @AaronDonahue or @bartgol to agree. (Or they can point out a flaw in my logic) |
I'd vote against merging without fixing the underlying issue, and I'd vote to change the cmp such that it is only comparing content that's supposed to be compared. I ran into similarly annoying stuff in p3 unit tests, but there, the size of the data struct is a parameter in one of the hpp files. This obviously will come up again and again, right? |
I now see the offending code is here: https://github.com/E3SM-Project/EKAT/blob/78bdbc996838363b97adcb0427af21603d15b5e1/src/ekat/util/ekat_file_utils.hpp#L21-L25 I would entirely remove the check after fread: template<typename T>
void read (T* v, size_t sz, const FILEPtr& fid) {
size_t nread = fread(v, sizeof(T), sz, fid.get());
- EKAT_REQUIRE_MSG(nread == sz, "read: nread = " << nread << " sz = " << sz);
}
I think ... but since this is an ekat problem, I don't know how to proceed... |
Simply commenting out the check for file size being equal doesn't allow me to test. Next week I plan to manually set this new param outside of the BFB tests so that tests will pass. Later we can think about passing it to the test data structures if we want to test both true and false. |
are you sure? it passes locally for me (pm-gpu opt) |
@tcclevenger, check this out: https://github.com/E3SM-Project/E3SM/actions/runs/15059769481 (testing this ginle commit 8cbcc38 on top of @bogensch's branch) Results: you see actual DIFFs in comparison, and not just the fread stuff, and I am not sure these are expected (examples below): example 1
example 2
|
The example I was testing looked more like nonsense, so I didn't know if the file comparison was off line because of the different sizes, but yeah these look like we are just setting to 0. I'll manually add the new param in one of the tests you posted to see what happens. |
@tcclevenger if you are sure that the failure is due to the new param (that is not found in the baselines file), then I would just merge and bless.
I am not sure I understand this comment though. What happens when you comment the check? |
I'm not sure, I would have guessed that it was due to the new param, but it's not clear to me that is the case. So I want to do at least a quick check. When I comment out the check I get a baseline diff which contains wildly different values. This doesn't make sense to me since the CIME cases are BFB. Not sure if it's unit test specific, or if the files being mismatched sizes means that we are comparing the correct values. |
@mahf708 Manually setting new param to If we want to go a more rigorous route, I could make a quick PR that only adds the new param to the testing data, merge that and bless baselines, then have this PR actually use the new param. |
Yeah, tested the examples and same thing, I'm confident this BFB. Merging and blessing. |
This PR adds an option to run SHOC with no SGS variability. This essentially reduces SHOC to a 1.5 TKE closure in the vertical. If this option is activated (shoc_1p5tke= true): - The scalar variances and covariances are set to zero. - The third moment of vertical velocity is set to zero. - The above two assumptions will reduce the assumed PDF to an all-or-nothing closure. - Since an all-or-nothing closure is used this means SHOC’s buoyancy flux parameterization is now ill-posed due to the liquid water flux (w'ql') always being zero. The buoyancy flux is needed to close the buoyant production term in the TKE budget. In this case we close the buoyant production of TKE using the local moist brunt vaisalla frequency, which follows many 1.5 TKE closures. - Modifies the definition of the eddy diffusivities to align better with formulations used in a 1.5 scheme. - Modifies the length scale definition to be exactly that of SAM when a 1.5 TKE closure is activated. I have verified in several cases, and one ne30 simulation, that if this option is activate that cloud fraction will always be either 0 or 1 using instantaneous output. [BFB]
@tcclevenger Yeah, the check against baselines is quite "raw": we parse the file in binary form, and compare against expected values. Since the branch has an extra scalar, it probably gobbled down the scalars "shifted by one", causing wildly different values. |
Thanks @tcclevenger 👍 |
Looks like this PR caused some fpe errors: https://my.cdash.org/tests/273678509 |
Investigating. |
Fixes FPE introduced in #7188 from temporary values being computed which attempt to take sqrt(some negative value). These values weren't being used, so not an actual bug. [BFB]
Fixes FPE introduced in #7188 from temporary values being computed which attempt to take sqrt(some negative value). These values weren't being used, so not an actual bug. [BFB]
This PR adds an option to run SHOC with no SGS variability. This essentially reduces SHOC to a 1.5 TKE closure in the vertical.
If this option is activated (shoc_1p5tke= true):
I have verified in several cases, and one ne30 simulation, that if this option is activate that cloud fraction will always be either 0 or 1 using instantaneous output.
Since this scheme is set to false by default, this should be a b4b PR.