-
Notifications
You must be signed in to change notification settings - Fork 39
Intake-ESM Integration based on #1218 #2690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…er.yml, skeleton of intake-esm inclusiion following #1218
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2690 +/- ##
==========================================
- Coverage 95.11% 94.92% -0.20%
==========================================
Files 255 257 +2
Lines 14999 15029 +30
==========================================
- Hits 14267 14266 -1
- Misses 732 763 +31 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see progress on this @charles-turner-1!
esmvalcore/config-developer.yml
Outdated
@@ -38,6 +38,34 @@ CMIP6: | |||
SYNDA: '{activity}/{institute}/{dataset}/{exp}/{ensemble}/{mip}/{short_name}/{grid}/{version}' | |||
NCI: '{activity}/{institute}/{dataset}/{exp}/{ensemble}/{mip}/{short_name}/{grid}/{version}' | |||
input_file: '{short_name}_{mip}_{dataset}_{exp}_{ensemble}_{grid}*.nc' | |||
catalogs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plan was to not further extend config-developer, but rather move this to the new configuration that lives in ~/.config/esmvaltool
. See #2371 for an example of what we thought the configuration should look like.
esmvalcore/config-developer.yml
Outdated
- /g/data/oi10/catalog/v2/esm/catalog.json | ||
facets: | ||
# mapping from recipe facets to intake-esm catalog facets | ||
# TODO: Fix these when Gadi is back up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also test on DKRZ Levante, the intake catalogs are located at /pool/data/Catalogs/dkrz_cmip6_disk.json
esmvalcore/intake/_dataset.py
Outdated
return ([_CACHE[cat_url] for cat_url in catalog_urls], facet_list) | ||
|
||
|
||
class IntakeDataset(Dataset): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having some reservations about subclassing the Dataset class for this purpose:
- A typical use case for many of our users will be that they have most data available from a central catalog that is managed by a central administrator, but want to augment that with the ability to download some files themselves. In that case, it is really useful to have the ability to deduplicate (e.g. pick the latest version of a file). I'm not sure if this can be achieved by subclassing the Dataset object.
- We will likely want to add support for other catalogs as well, e.g. intake-esgf, xcube, and STAC. If we need a new Dataset class for each of these, it may become confusing to users.
- How will this work from the recipe?
As an alternative, would it be an option to load the available data sources from the configuration / Dataset.session
and then make the Dataset.files
method loop over the available sources and deduplicate input files?
ESMValCore is quite flexible with what facets it accepts. We have a translation between some of 'our' facets and the official ones in the
If these are completely determined by the other facets, you can add them automatically using the extra facets facility |
How about adding a new module called e.g. |
Thanks for the review Bouwe, super helpful! I've only had a skim so far, but I'll get those suggestions incorporated next week |
Description
config-developer.yml
to include intake datasets.TODO:
tests/unit/test_dataset.py
, or is it preferable to add a new test module? I'll hold off writing these until I work out the facets issue.intake
submodule, but I could move it intodataset
if that's preferable? Also affects previous point.Have requested a review but obviously this is nowhere near ready to go on the infrastructure side wrt. tests, etc. A couple pointers in the right direction and that stuff should fly along.
Closes #31
Link to documentation:
Before you get started
Checklist
It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.
To help with the number pull requests: