-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow wildcard searches when specifying fx variables in preprocessor #1082
Conversation
Hi @thomascrocker, thanks for your contribution! I had a brief look at this and found that our tests for Sorry for any inconvenience! |
Thanks, I saw that some of the tests had failed under the commit test, but didn't understand enough about them to be honest. Happy for you to add whatever you feel is necessary. |
nice work @thomascrocker 👍 |
Ok, I've added a short explanation to the documentation. I realised this functionality affects more than just land / sea masking, (i.e. anywhere that FX files are used by the pre processor) so I've added the same paragraph in all the places I think it is relevant. It could be that I may have missed some though. |
cheers @thomascrocker 👍 Indeed it does but it looks OK to me. Still I'd be happy if @schlunma could have one last look at the tests, just to be 100% sure things work the way they should, one extra pair of technical eyes is always good 🍺 |
frenly ping @schlunma for a final review (tests changed) and @thomascrocker fix the conflict pls 🍺 |
In a perfect world, we wouldn't need this as every model would have it's fx data in the correct place and we wouldn't need wildcards. More pragmatically, I've been using this code for at least 6 months and it has been massively labour saving. I have no doubt that others would find it just as helpful as I have in the short term. In the longer term, while I don't fully understand what @zklaus and @bouweandela are asking for instead of this, it sounds like it may require a significant re-write of good parts of this code. (And may be superseded by up-coming changes in iris?) I think that so long as it's passing tests, and works as described, merging this now might be helpful for a lot of people. It's a fantastic feature that is very powerful and very user friendly. |
The problem with this is that recipes that use this feature are not reproducible and you will never notice. Let me try to get the fx file specification in the recipe right for version v2.6 and if I don't manage by then you can merge this. Does that sound like a compromise? |
As I'm still in the process of using this code, (it's really the only tool for the job), can you please help with the conflicts in _data_finder at some point? There's something weird going on there that's preventing me from running. |
I can have a look at the conflicts, but probably won't have the time to test everything. |
I solved the conflicts now, let me know if the branch works again @ledm |
You're a hero. Thanks @schlunma |
I used this branch without the most recent patch (5c455ed) and with several additional fixes but no significant changes, to produce an analysis over the weekend on jasmin. The preprocessor ran to completion, and everything appears fine in the output text, however all netcdfs in preproc are fully masked and contain no data. It's really weird and I don't get it. Any ideas? Edit: This looks like a different issue. nevermind. |
Hi, was this functionality added to 2.6? I'm still very happy with @schlunma's code and using it a lot. |
@bouweandela, I think @ledm's comment refers to your earlier comment
So... what do you think? Did we make progress? Is there an issue or perhaps even already a PR to track here? |
Only a branch so far, the deadline for the release seemed so far away ;-) I'll try to make some progress today. |
Just fixed the merge conflict and turned the branch into a draft pull request: #1609 |
It seems unlikely that I'll get #1609 finished in time for v2.6, it turns out to be quite a bit of work, but I'm off to a good start so can I ask that you please wait with this for one more release? |
So we haven't merged this yet? I still need to use wildcards for FX files every time that I run ESMValTool, and probably will need to keep doing so for the foreseeable future. What is the suggested best alternative practice instead of using this method? #1609 is still a draft and this #1082 conflicts with main now. |
Yes please, I'm incredibly frustrated with this. FX files are crucial, and yet they have never worked properly or had a flexible enough interface. It's been four years that I've been raising this as an issue and it's still broken and undocumented. The solutions that I wrote for this work in 2021 and earlier no longer work, so now I'm about to submit two different papers on the basis of this PR, and neither of them can be reproduced right now (or re-run!) This is the exact reason why you initially declined this PR - reproducibility of recipes in publications. However, for obvious reasons, I'm extremely keen to not have to re-write everything at the last minute right before submission. |
Also, there's no changes to documentation in #1609 yet, so if you want me to use it, you can expect a lot of emails/whatsapp messages! (Just ask @valeriupredoi about that, lol!) |
I added an example recipe to the description of the pull request. I'm still working on it so expect things to break, especially commits called WIP (Work In Progress), but the example runs if you check out commit 98f052f and set |
Raising an issue is a good start, but in the end, someone needs to do the actual work. Note that the current functionality is actually documented, it just doesn't work very well because it assumes a certain organization of files that just isn't the reality on ESGF. |
Closing this but keeping the branch alive; this has been properly implemented in #1609. |
Description
Closes #1081
Finding data for fx variables is a bit of a pain depending on MIP project.
For CMIP5, fx variables are usually stored under a new ensemble (r0i0p0) although there are a small number of exceptions, see for example,
https://esgf-index1.ceda.ac.uk/esg-search/search/?offset=0&limit=10&type=Dataset&replica=false&latest=true&project=CMIP5&cmor_table=fx&ensemble=r0i0p1%2Cr0i0p141%2Cr0i0p142%2Cr0i0p150%2Cr0i0p151%2Cr0i0p2%2Cr0i0p3%2Cr0i0p4%2Cr0i0p5%2Cr0i0p6%2Cr1i1p1&facets=project%2Cproduct%2Cinstitute%2Cmodel%2Cexperiment%2Cexperiment_family%2Ctime_frequency%2Crealm%2Ccmor_table%2Censemble%2Cvariable%2Cvariable_long_name%2Ccf_standard_name%2Cdata_node&format=application%2Fsolr%2Bjson
For CORDEX: It appears to vary arbitrarily depending on modelling centre. "Invariant fields (frequency=fx) may have the value r0i0p0 or that of the corresponding GCMEnsembleMember attribute" http://is-enes-data.github.io/cordex_archive_specifications.pdf
For CMIP6: I can't find any specifications, but it seems fx files are mostly under individual ensemble memebers, but not sure if this is the case for all models and fx fields.
This fix essentially detects if finding fx files has failed. If it has, it will automatically search again under r0i0p0. This way, the user can still specify explicitly the fx ensemble to use (and ESMValTool will prefer the files stored under the same ensemble as the variable if the user doesn't specify anything), and CMIP6 will work correctly. For CORDEX and CMIP5 the software will happily find the correct files under r0i0p0 and writes a line to the debug logger to record as such.This fix solves the issue by allowing the user to specify FX variables in the recipe using glob style wildcards, this can then be used to allow the pre-processor to find FX data that is stored in different locations.
Before you get started
Checklist
It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.
To help with the number pull requests: