Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating ESACCI-WATERVAPOUR cmorizer #3282

Merged
merged 9 commits into from
Mar 8, 2024
Merged

Conversation

malininae
Copy link
Contributor

Hello ESMValTool-ers,

As promised in #3268, here's a pull request to update ESACCI-WATERVAPOUR CMORizer with the officially released CDR2 data as well as addition of daily part to the CMORizer.

Updates include:

  • Update to esmvaltool/cmorizers/data/datasets.yml to reflect the official downloading instructions.
  • Change in the filenames for the esmvaltool/cmorizers/data/cmor_config/ESACCI-WATERVAPOUR.yml to reflect new filenames as well as adding the daily part.
  • Moving the dataset to OBS6 to be able to add prw in Eday table.
  • Minor changes change to esmvaltool/cmorizers/data/formatters/datasets/esacci_watervapour.py to account for the above mentioned updates.

I added @katjaweigel as a reviewer, since she was the original author. Let me know if some other reviewer should be added.

@katjaweigel
Copy link
Contributor

katjaweigel commented Jul 7, 2023

Dear @malininae thanks for adding this! I'll have a detailed look later but here are already two things I saw when looking at it:

  1. For the monthly data there is already version 3.2 now, would it be possible to change to this already (instead of the final version of 3.1) or are there too large differences?
  2. The data is used in some recipes, which are in the ESMValTool main branch:
  • esmvaltool/recipes/recipe_cmug_h2o.yml
  • esmvaltool/recipes/examples/recipe_check_obs.yml
  • several recipes under esmvaltool/recipes/clouds/*.yml

For the first one I'd prefer to change to the more recent data version, since the old data set was preliminary.
For esmvaltool/recipes/clouds/*.yml I'm not sure, therefore I also tag @axel-lauer who is the author of these recipes?
If the old version should be kept for these recipes, the cmorizer should also be able to deal with the old version but have the input for the old version commented out as in esmvaltool/cmorizers/data/cmor_config/WOA.yml, - esmvaltool/recipes/examples/recipe_check_obs.yml should in this case contain both versions.

@malininae
Copy link
Contributor Author

malininae commented Jul 7, 2023

Dear @malininae thanks for adding this! I'll have a detailed look later but here are already two things I saw when looking at it:

  1. For the monthly data there is already version 3.2 now, would it be possible to change to this already (instead of the final version of 3.1) or are there too large differences?
  2. The data is used in some recipes, which are in the ESMValTool main branch:
  • esmvaltool/recipes/recipe_cmug_h2o.yml
  • esmvaltool/recipes/examples/recipe_check_obs.yml
  • several recipes under esmvaltool/recipes/clouds/*.yml

For the first one I'd prefer to change to the more recent data version, since the old data set was preliminary. For esmvaltool/recipes/clouds/*.yml I'm not sure, therefore I also tag @axel-lauer who is the author of these recipes? If the old version should be kept for these recipes, the cmorizer should also be able to deal with the old version but have the input for the old version commented out as in esmvaltool/cmorizers/data/cmor_config/WOA.yml, - esmvaltool/recipes/examples/recipe_check_obs.yml should in this case contain both versions.

@katjaweigel All good points. Could you please refer me to the v3.2 information? At the CDR2 website it says it's v3.1, and in the files attributes the version numbers are 3.1 for both daily and monthly data. As far as I understand, the CDR1 data has v3.2 data (see https://data.ceda.ac.uk/neodc/esacci/water_vapour/data/TCWV-land/L3/v3.2), but CDR2 (global) has only 3.1 for now.
I double checked the recipes you mentioned, they all seem to be using CDR2, so I guess they all still will be v3.1. @axel-lauer, would it be possible for you to re-run your recipe with this updated CCI WV product and check if things have changed? I don't have any computing time neither on DKRZ, nor on JASMIN, and I am too lazy to download the data on our local computers.

Also an unusual for the OBS question, is how to deal with the different CDRs: CDR1 is technically Tier2, CDR2 is Tier3. Should I create a downloader for the CDR1? And like add the commented section for the CDR1 that it is Tier2?

@katjaweigel
Copy link
Contributor

Dear @malininae thanks for your answer! Sorry, I did not see that is was only the land data, where V3.2 is available. I got confused with all these versions.
Sorry, that I did not manage to look at it earlier, I'm on holiday the next week but will continue afterwards.

@valeriupredoi
Copy link
Contributor

Hi @malininae - thnks a lot for your work here 🍺 Looks like you are getting nice feedback from @katjaweigel - cheers, Katja! I can give this a nice technical review and @remi-kazeroni can do the data stuffs and archreview - but in the meantime, please shoten those two lines flake is complaining about, se I can see the tests all green 😁 🟢

@malininae
Copy link
Contributor Author

Hi @malininae - thnks a lot for your work here 🍺 Looks like you are getting nice feedback from @katjaweigel - cheers, Katja! I can give this a nice technical review and @remi-kazeroni can do the data stuffs and archreview - but in the meantime, please shoten those two lines flake is complaining about, se I can see the tests all green 😁 🟢

@valeriupredoi sorry, was on leave! Done. It was the path to the CDR data download, now split into two :-)

@remi-kazeroni
Copy link
Contributor

Sorry, I won't have time to review this myself. I just wanted to suggest to move this dataset from Tier 3 to Tier 2 as the raw data are now publicly available. If that is done, please don't forget to update recipes using this dataset accordingly. @katjaweigel and @axel-lauer, would you agree with making this dataset Tier 2?

@remi-kazeroni remi-kazeroni removed their request for review August 25, 2023 08:50
@katjaweigel
Copy link
Contributor

Dear @remi-kazeroni,
thanks for your message! I hope I can continue the scientific review soon. With Tier 2 there is the issue, mentioned already by @malininae, that the data set is currently partly Tier2 (CDR1, over land) and partly Tier3 (CDR2, global), I'm not sure, how to handle that best? I'd say if it is possible, data should be moved to Tier2.

@axel-lauer
Copy link
Contributor

@katjaweigel and @axel-lauer, would you agree with making this dataset Tier 2?

Yes, chaning from tier 3 to tier 2 would be a good idea. Btw, a more general strategy how to implement such changes of the tier for obs datasets is (probably) one of the agenda items of the upcoming "ESMValTool proto steering group" meeting. Hopefully, we will have some recommendations how to do this after the meeting.

@LisaBock
Copy link
Contributor

Hi @malininae !
As I need to use daily ESACCI-WATERVAPOUR CDR2 data right now, I would like to know how far you proceed with this PR. If helpful I would be able to add the usage of daily data to the cmorizer.
And it seems that in the existing recipe only CDR2 data is used. Are you also interested in CDR1 data or should we maybe skip that?

@malininae
Copy link
Contributor Author

Hi @malininae ! As I need to use daily ESACCI-WATERVAPOUR CDR2 data right now, I would like to know how far you proceed with this PR. If helpful I would be able to add the usage of daily data to the cmorizer. And it seems that in the existing recipe only CDR2 data is used. Are you also interested in CDR1 data or should we maybe skip that?

@LisaBock, thanks for flagging it! I was working on other stuff and it was off my radar for a bit. As far as I understood, there is still need of scientific review from @katjaweigel, she mentioned at this monthly meeting that it is still on her to-do list.

As I said before, the changes to the formatter were not very big. I used the data for in my analysis, they seemed to look good in comparison to ERA, but we haven't published the results yet, for both of us it is a side project. Furthermore, based on the above comments I didn't get if I need to move the CDR2 data to the Tier2, although the data is not publicly available? Should one maybe leave a comment/warning somewhere? @axel-lauer did you reach a consensus on that part at the meeting above?

Answering your question on CDR1, I personally don't need the data, since there is a global product in the CDR2, I think it has more applications, however, the CDR1 part was in the original cmorizer, so I decided to leave it in. At the same time, if others think that removing CDR1 one makes sense, it would make my life so much easier and question of Tiers will be eliminated. Let me know what you think.

@LisaBock
Copy link
Contributor

@LisaBock, thanks for flagging it! I was working on other stuff and it was off my radar for a bit. As far as I understood, there is still need of scientific review from @katjaweigel, she mentioned at this monthly meeting that it is still on her to-do list.

I will do a scientific review now as I will use the data. @katjaweigel is free to do one as well or she could cross it off her to-do list 😃

@LisaBock
Copy link
Contributor

As I said before, the changes to the formatter were not very big. I used the data for in my analysis, they seemed to look good in comparison to ERA, but we haven't published the results yet, for both of us it is a side project. Furthermore, based on the above comments I didn't get if I need to move the CDR2 data to the Tier2, although the data is not publicly available? Should one maybe leave a comment/warning somewhere? @axel-lauer did you reach a consensus on that part at the meeting above?

Regarding the issue if the CDR2 data is Tier2 or Tier3 I found in the Product User Manual:

8.1 Copyright
All intellectual property rights of the CM SAF products belong to EUMETSAT. The use of these
products is granted to every interested user, free of charge. If you wish to use these products
in publications, presentations, web pages etc., EUMETSAT’s copyright credit must be
shown by displaying the words “copyright (year) EUMETSAT” on each of the products
used

and

8.3 Re-distribution of CM SAF data
Please do not re-distribute CM SAF data to 3rd parties. The use of the CM SAF products is
granted free of charge to every interested user, but we have an essential interest to know how
many and what users the CM SAF has. This helps to ensure of the CM SAF operational
services as well as its evolution according to users needs and requirements. Each new user
shall register at CM SAF in order to retrieve the data.

So we need to keep it as Tier3 data for now.

@katjaweigel
Copy link
Contributor

@LisaBock Thanks a lot, I really don't manage to look at it before mid February, so better you speed it up.

@malininae
Copy link
Contributor Author

As I said before, the changes to the formatter were not very big. I used the data for in my analysis, they seemed to look good in comparison to ERA, but we haven't published the results yet, for both of us it is a side project. Furthermore, based on the above comments I didn't get if I need to move the CDR2 data to the Tier2, although the data is not publicly available? Should one maybe leave a comment/warning somewhere? @axel-lauer did you reach a consensus on that part at the meeting above?

Regarding the issue if the CDR2 data is Tier2 or Tier3 I found in the Product User Manual:

8.1 Copyright
All intellectual property rights of the CM SAF products belong to EUMETSAT. The use of these
products is granted to every interested user, free of charge. If you wish to use these products
in publications, presentations, web pages etc., EUMETSAT’s copyright credit must be
shown by displaying the words “copyright (year) EUMETSAT” on each of the products
used

and

8.3 Re-distribution of CM SAF data
Please do not re-distribute CM SAF data to 3rd parties. The use of the CM SAF products is
granted free of charge to every interested user, but we have an essential interest to know how
many and what users the CM SAF has. This helps to ensure of the CM SAF operational
services as well as its evolution according to users needs and requirements. Each new user
shall register at CM SAF in order to retrieve the data.

So we need to keep it as Tier3 data for now.

Thoughts on CDR1? Keeping or ditching? As said earlier, I don't have strong feelings about it.

@LisaBock
Copy link
Contributor

LisaBock commented Feb 7, 2024

Thoughts on CDR1? Keeping or ditching? As said earlier, I don't have strong feelings about it.

As it makes it easier for now and nobody is using CDR1 data I suggest to keep only the CDR2 dataset.

Copy link
Contributor

@LisaBock LisaBock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @malininae for updating and enhancing the cmorizer!

After making a small adjustment regarding the _get_time_bounds funtion it works totally fine for me.

I have some comments to the PR which you could find below or next to the code.

  • As mentioned in the previous comment I would suggest to skip the CDR1 dataset for now. If you agree please delete the lines regarding CDR1.
  • The official citation for the CDR2 dataset is: Schröder, Marc; Danne, Olaf; Falk, Ulrike; Niedorf, Anja; Preusker, Rene; Trent, Tim; Brockmann, Carsten; Fischer, Jürgen; Hegglin, Michaela; Hollmann, Rainer; Pinnock, Simon (2023): A combined high resolution global TCWV product from microwave and near infrared imagers - COMBI, Satellite Application Facility on Climate Monitoring, DOI:10.5676/EUM_SAF_CM/COMBI/V001, https://doi.org/10.5676/EUM_SAF_CM/COMBI/V001. Please update the file references/esacci-watervapour.bibtex.

@malininae malininae requested a review from a team as a code owner February 22, 2024 20:41
@malininae
Copy link
Contributor Author

malininae commented Feb 22, 2024

@LisaBock , should I update the recipes that use ESACCI-WATERVAPOUR? The only issue I won't be able to re-run them.

@LisaBock
Copy link
Contributor

@LisaBock , should I update the recipes that use ESACCI-WATERVAPOUR? The only issue I won't be able to re-run them.

@malininae Yes, I think we should update them. They are mainly from @katjaweigel, so @katjaweigel do you agree that we put the new ESACCI-WATERVAPOUR dataset in your recipes?

I could test them afterwards.

@katjaweigel
Copy link
Contributor

@LisaBock , should I update the recipes that use ESACCI-WATERVAPOUR? The only issue I won't be able to re-run them.

@malininae Yes, I think we should update them. They are mainly from @katjaweigel, so @katjaweigel do you agree that we put the new ESACCI-WATERVAPOUR dataset in your recipes?

I could test them afterwards.

Yes, all of my recipes you can update, thanks! Tell me, if any issues occur.

Copy link
Contributor

@LisaBock LisaBock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@malininae
Copy link
Contributor Author

Thanks @malininae ! Looks all fine now.

I checked it now and the following recipes need to be updated before we could merge:

Shall we do this directly in this PR?

@axel-lauer Are you also okay with us updating your recipes with the official dataset version of ESACCI-WATERVAPOUR?

Sorry it took a while. @LisaBock, would you mind to rerun recipes?

Copy link
Contributor

@LisaBock LisaBock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@malininae I tested the modified parts of the recipes and the results look like before.

@axel-lauer
Copy link
Contributor

@axel-lauer Are you also okay with us updating your recipes with the official dataset version of ESACCI-WATERVAPOUR?

Yes, I am fine changing this in my recipes. Thanks!

@axel-lauer
Copy link
Contributor

Removing the underscores in the version facet of the filename is great! Thanks for addressing this. Once merged, it would be nice to mention this change also in this open issue so ESACCI-WATERVAPOUR can be removed from that list: #3051

@axel-lauer
Copy link
Contributor

Just a thought, but shouldn't we use "tier 2" instead of "tier 3" now that we are using the official and publicly available version of this dataset?

@LisaBock
Copy link
Contributor

LisaBock commented Mar 8, 2024

Just a thought, but shouldn't we use "tier 2" instead of "tier 3" now that we are using the official and publicly available version of this dataset?

@axel-lauer As we discussed earlier in this PR (see <v) we decided to keep Tier3. In the Product User Manual (see below) they explicitly mention that they don't want us to re-distribute the data. Each user should register.

Regarding the issue if the CDR2 data is Tier2 or Tier3 I found in the Product User Manual:

8.1 Copyright
All intellectual property rights of the CM SAF products belong to EUMETSAT. The use of these
products is granted to every interested user, free of charge. If you wish to use these products
in publications, presentations, web pages etc., EUMETSAT’s copyright credit must be
shown by displaying the words “copyright (year) EUMETSAT” on each of the products
used

and

8.3 Re-distribution of CM SAF data
Please do not re-distribute CM SAF data to 3rd parties. The use of the CM SAF products is
granted free of charge to every interested user, but we have an essential interest to know how
many and what users the CM SAF has. This helps to ensure of the CM SAF operational
services as well as its evolution according to users needs and requirements. Each new user
shall register at CM SAF in order to retrieve the data.

@axel-lauer axel-lauer merged commit ec42e89 into main Mar 8, 2024
5 of 6 checks passed
@axel-lauer axel-lauer deleted the update_esacci_watervapour branch March 8, 2024 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants