-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support wildcards in the recipe and improve support for ancillary variables and dataset versioning #1609
Conversation
- Use configuration from experimental Output file is now a Path fx file facets are now correctly set when parsing the recipe, no need to do it in data finder anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Bouwe again for this impressive PR, all these new features and the refactoring of ancient code is really awesome 🚀
I ran all my tests again, and everything works as expected. 🎉 All remaining problems are well documented, so this PR is ready to be merged from my side 🍻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the comments @bouweandela, this PR should definitely be on the highlights for this release!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for all the work and addressing all the comments Bouwe! That's really great to have this in time for v2.8 👍
Apart from the many new features that this PR offers, there are also nice examples of how to document backward incompatible changes and deprecated features in the description 👍 I have some final tests based on these explanations and all looks good to me.
Can I ask for 2 final quick things?
- removing the branch from
.github/workflows/run-tests.yml
- using the option name
use_legacy_supplementaries
everywhere in the description. Theuse-legacy-ancillaries
is not used and the options with-
are not working if specified in the config file (only_
works).
If no final comments, I'd would propose to merge this PR today at 15:00 CET (14:00 GMT) so that we can have this in main
before the weekend and see how the testing machinery reacts. This would also allow those working on the final v2.8 PRs depending on this one to merge the main
branch and push their PRs to the finish line 👍
It would be interesting to discuss in a wider circle including the @ESMValGroup/scientific-lead-development-team (maybe at a workshop?) where we want to go with these 2 new recipe formats to have a common understanding of which formats should be used for recipes in the main branch. Once this PR is merged (and v2.8 released), two new recipe formats will be allowed:
I understand these two new formats would be extremely useful to compose recipes and to ease reproducibility by recording dataset versions as stated in the docs with this PR. But I wonder about the readability of the new format, particularly the second one. For example, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
many a test was done by me too, and I find this monster to be a good monster, like Godzilla - mega well done @bouweandela and all the other peeps who tested and suggested 🍻 x 10
Thanks to everyone who helped with testing and reviewing this! 🎉 |
Description
Add a dataset class that simplifies dealing with datasets as defined in the recipe.
Goals of this pull request are to:
run
directory, e.g.run/recipe_example_filled.yml
when running a recipe calledrecipe_example.yml
.esmvalcore.dataset.Dataset
class added in Addesmvalcore.dataset
module #1877Example recipe demonstrating the new wildcard features
this will expand to all CMIP6 datasets and attach the
areacella
variable that is most similar to the main variable.Documentation
Temporary
use_legacy_supplementaries
command-line/config-user optionWhen the
fx_variables
keyword is used in a preprocessor function, the tool will automatically run with--use_legacy_supplementaries=True
and use these definitions to define the supplementary datasets in the recipe. To try out the new behaviour, run the tool with--use_legacy_supplementaries=False
.--use_legacy_supplementaries=False
is the default when nofx_variables
are defined in the recipe. This option will be removed inv2.10.0
. To upgrade your recipes, remove all entriesfx_variables
and if necessary, define the ancillaries as described inBackward incompatible changes
There is a new and much smarter system for automatically defining the supplementary variables (ancillary variables and cell measures) needed by preprocessor functions in the recipe. In v2.8 it will be used by default when there are no
fx_variables
defined as preprocessor function arguments in the recipe, in v2.9 it will be used by default in all cases, and in v2.10 it will be the only way. This new system automatically defines supplementary variables using the new wildcards feature, so it will find many more supplementary variables. It will also add only a single supplementary variable, e.g. when a preprocessor function requires cell area, it will addareacello
if the main variable is frommodeling_realm
ocean
,seaIce
, orocnBgchem
, andareacella
in other cases. Previously, both variables were added and downloaded, but then only the one with the matching shape was used. This had the downside that unnecessary data was downloaded and errors in the supplementary variable were silently ignored.In some cases, the new way of adding supplementary variables may not work (yet), specifically when the facet values cannot be read from the path to the file or from ESGF. In that case, the supplementary variables can be specified in the recipe. Here are some examples of where things may not work and how to solve them. If you prefer to postpone upgrading to v2.10, there is also the option to run the tool with
--use-legacy-supplementaries=True
, which will use the previous behaviour.Example with OBS data
When using preprocessor function
weighting_landsea_fraction
with variablecSoil
fromLmon
and a dataset from theOBS
orOBS6
projects with the defaultdrs
setting. This is expected to start working only in v2.9 (see #1609 (comment) for details).the tool will automatically try to use
but that does not work yet because the tool cannot yet read the version from the filename, only from the directory name and it is not used in the directory name with the default
drs
#1943. Therefore, the user needs to specify the supplementary variable like this:Example with a variable that is on an unusual grid
When using the preprocessor function
area_statistics
with variablefgco2
fromOmon
and no supplementary variables defined in the recipe. The tool will automatically add theareacello
variable becausefgco2
is from an ocean realm, so it will replacewith
but that does not work because unusually, the
fgco2
variable ofMIROC-ESM
is not on an ocean grid. To make the recipe work, the user needs to specify the supplementary variable in the recipe for this dataset and tell the tool to not addareacello
automatically using theskip
flag:see Specifying supplementary variables in the recipe and Preprocessor functions that use ancillary variables and cell measures for more information.
Deprecations
fx_variables
for various preprocessor functions that use ancillary variables or cell measures is deprecated and will be removed in v2.10.0. The recommended upgrade procedure is to remove thefx_variables
section. If automatically defining the required supplementary variables does not work, define them in the variable or (additional_)datasets section as described in the documentation.use_legacy_supplementaries
on the command line/config-user.yml keeps support for the old ways of specifying ancillary variables and cell measures in the recipe, i.e. automatically adding some and customizing those with the preprocessor argumentfx_variables
. It is currently enabled only iffx_variables
is used in the recipe, but will be disabled by default in v2.9 and is scheduled for removal in v2.10.add_fx_variables
andremove_fx_variables
are deprecated and will be removed in v2.10. Use the new preprocessor functionsadd_supplementary_variables
andremove_supplementary_variables
instead.callback
argument of the functionesmvalcore.preprocessor.load
is deprecated and will be removed in v2.10 Proposal to deprecate thecallback
argument fromesmvalcore.preprocessor.load
in v2.8 and remove it in v2.10. #1800Please comment if you think this schedule is too fast.
Fixed issues
Closes #56
Closes #377
Closes #589
Closes #1138
Closes #1185
Closes #1297
Closes #1454
Closes #1896
Potential follow ups: #1760, #1891
Before you get started
Checklist
It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.
To help with the number pull requests: