Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moved extraction of reference datasets for (horizontal and vertical) regridding into preprocessor functions #1455

Closed
wants to merge 4 commits into from

Conversation

schlunma
Copy link
Contributor

@schlunma schlunma commented Feb 2, 2022

Description

This PR moves the extraction of reference datasets used for horizontal and vertical regridding to the dedicated preprocessor functions. This allows us to specify reference datasets that need to be (automatically) downloaded first.

In addition, it adds fixes prior to the loading step of the reference dataset for horizontal regridding, similar to what is already done for the target levels for vertical regridding.

Closes #1454
Closes #56

Link to documentation:


Before you get started

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.


To help with the number pull requests:

@schlunma schlunma added the bug Something isn't working label Feb 2, 2022
@schlunma schlunma added this to the v2.5.0 milestone Feb 2, 2022
@schlunma schlunma self-assigned this Feb 2, 2022
@schlunma
Copy link
Contributor Author

schlunma commented Feb 2, 2022

@valeriupredoi @zklaus Before I adapt the tests here, can I briefly get your feedback on this implementation? Does it make sense in your opinion?

@valeriupredoi
Copy link
Contributor

thanks a bunch @schlunma - looking at it now, sorry it escaped my wits yesterday!

Copy link
Contributor

@valeriupredoi valeriupredoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

draft request changes to a draft PR, not gonna be anal before it's actually RfR 😁

esmvalcore/preprocessor/_regrid.py Outdated Show resolved Hide resolved
----
If ``levels`` is a ``dict`` and it does not contain the key ``filename``,
it is automatically assumed that you specified the target grid with
``start_longitude``, ``end_longitude``, etc. If a valid reference dataset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait what? A grid is specified by an MxN specification, not by defining a box, am confusado here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah nevermind, it's the _spec_to_latlonvals() stuff - can you add a pointer to that func here maybe, that confused me, prob gonna confuse somebody else too

@valeriupredoi
Copy link
Contributor

so here's the thing - as I understand it, your are now passing all the basic ingredients to the regrid preprocessor, rather than have some bits done first in recipe and then others (like the reference level extraction) be done via the regridder preprocessor - I like it! I like that stuff gets done by the module that needs to do it, and not outside it. What happens if there's an issue with the reference dataset, other than a CMOR issue, that might not even be picked up if the user chooses to run with all CMOR shields down? Will that surface only when the regridder is called? That might be after a lot of stuff has already been computed

@schlunma
Copy link
Contributor Author

schlunma commented Feb 3, 2022

What happens if there's an issue with the reference dataset, other than a CMOR issue, that might not even be picked up if the user chooses to run with all CMOR shields down? Will that surface only when the regridder is called? That might be after a lot of stuff has already been computed

It will surface once the reference dataset has entered the preprocessor chain, so probably a little bit earlier than the actual regridding. I agree that this is not ideal because it also increases the computation times by duplicating the loading for every dataset, but I don't see another way right now.

Something that might be possible is to add an additional step between the check for data availability (and the downloading) and the actual start of the preprocessing, but this is something I can't do for v2.5.

I'm not sure how to proceed. Should I try to get this in or leave it to v2.6 and come up with a cleaner solution?

@valeriupredoi
Copy link
Contributor

thanks for the note, Manu! The question is what's the immediate benefit of this vs the cost it introduces and vs a longer mulling over and possible implementation of a download me/check me/use me function for datasets before the preprocessor starts? I'd also want to hear what @zklaus and @bouweandela think about this 🍺

@schlunma
Copy link
Contributor Author

schlunma commented Feb 3, 2022

The short-term benefit is that the bug mentioned in #1454 is fixed.

But after giving this some more thoughts I think this should be implemented properly, e.g., by adding a function _load_reference_datasets() between the download and run here:

def run(self):
"""Run all tasks in the recipe."""
self.write_filled_recipe()
if not self.tasks:
raise RecipeError('No tasks to run!')
# Download required data
if not self._cfg['offline']:
esgf.download(self._download_files, self._cfg['download_dir'])
self.tasks.run(max_parallel_tasks=self._cfg['max_parallel_tasks'])
self.write_html_summary()

Maybe I can address this for v2.5, but if not we can do a bugfix or even postpone this to v2.6, I don't think it's that urgent.

@valeriupredoi
Copy link
Contributor

thanks, Manu! I am of the personal opinion that this should not be hurried up, but rather done in a take it easy, test proper manner - up to you and the others, am not gonna block progress 🇨🇳 😁

@schlunma schlunma modified the milestones: v2.5.0, v2.6.0 Feb 4, 2022
@schlunma
Copy link
Contributor Author

I cannot finish this PR in time for v2.6, moving this to v2.7.

@schlunma schlunma modified the milestones: v2.6.0, v2.7.0 May 19, 2022
@bouweandela
Copy link
Member

This would also be addressed (differently) in #1609.

@schlunma schlunma closed this Aug 30, 2022
@schlunma schlunma deleted the fix_target_levels_before_using branch March 1, 2023 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants