-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New CH4 lifetime diagnostic #3507
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this contribution @FranziskaWinterstein! The code looks very good already, I just have a few comments.
Before this can be merged, please
- add some documentation. Your files basically include all relevant information in their docstring already, so I think you can re-use that by adding a file for your dianostic to the API doc (see example here) which you can reference from the recipe doc (see example here).
- make sure to delete all unused code (right now, there are lots of commented lines)
- fix all the flake8 issues. This should also fix most of the Codacy issues.
# from sklearn.metrics import r2_score | ||
from scipy.constants import N_A, R, G | ||
|
||
from esmvalcore.cmor._fixes.shared import add_model_level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find this function in the current ESMValCore version. Also, please do not import private objects (i.e., those that start with with a _
) in your code, as those might change without any prior notice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I added this function to my developer version of ESMValCore (branch cmor-4-lifetime
). Maybe it is not an appropriate place for this function. I need the number of the model level and would like to add it by the preprocessor. However, it gets lost during the diagnostic. I am unsure if I should leave it in the ESMValCore (at some other point) or put it into the lifetime diagnostic itself. What do you think? I thought it might be useful for other applications as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see, that makes sense. If you use the function in ESMValCore, it's fine to leave it there. You can then just add a public version of it here, which you can then import via
from esmvalcore.cmor.fixes import add_model_level
If you don't use it in ESMValCore, it's better to just put it in the diagnostic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice PR @FranziskaWinterstein! I just have a couple of suggestions on how to make the code more efficient.
In addition to these specific comments, you can try to consistently use Cube.core_data()
instead of Cube.data
everywhere (similar Coord.core_points()
instead of Coord.points
for multidimensional coordinates if necessary), and replace np.
with da.
(with import dask.array as da
). This will then hopefully decrease the memory usage of your diagnostic dramatically.
area = calculate_area(hus.coord('latitude'), | ||
hus.coord('longitude')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can probably replace this with
area = calculate_area(hus.coord('latitude'), | |
hus.coord('longitude')) | |
lat_lon_dims = sorted( | |
tuple(set(hus.coord_dims('latitude') + hus.coord_dims('longitude'))) | |
) | |
lat_lon_slice = next(hus.slices(['latitude', 'longitude'], ordered=False)) | |
area_2d = iris.analysis.cartography.area_weights(lat_lon_slice) | |
area = broadcast_to_shape( | |
da.array(area_2d), | |
hus.shape, | |
lat_lon_dims, | |
chunks=hus.lazy_data().chunksize, | |
) |
And remove the custom calculate_area
code. Note that area
is now an array, not a cube (but that shouldn't be a problem I think).
In theory you should be able to simply do area = iris.analysis.cartography.area_weights(hus)
, but that's not fully lazy yet and will return a numpy array instead of a dask array.
press = create_press(temp) | ||
|
||
rho = N_A / 10.**6 | ||
rho = rho * iris.analysis.maths.divide(press, R * temp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiosity: have you tried rho = rho * press / (R * temp)
?
|
||
|
||
def guess_interfaces(coordinate): | ||
""" | ||
Calculate the interfaces of a coordinate given its midpoints. | ||
|
||
Parameters | ||
---------- | ||
coordinate : iris.cube | ||
The coordinate array. | ||
|
||
Returns | ||
------- | ||
numpy.array | ||
An array with the lenght of the coordinate plus one. | ||
""" | ||
interfaces = 0.5 * (coordinate[0:-1] + coordinate[1:]) | ||
first = 0.5 * (3 * coordinate[0] - coordinate[1]) | ||
last = 0.5 * (3 * coordinate[-1] - coordinate[-2]) | ||
|
||
# Check limits | ||
# if coordinate.name.lower() in ['lat', 'latitude']: | ||
# first = np.sign(first) * min([abs(first), 90.]) | ||
# last = np.sign(last) * min([abs(last), 90.]) | ||
|
||
interfaces = np.insert(interfaces, 0, first) | ||
interfaces = np.append(interfaces, last) | ||
|
||
return interfaces | ||
|
||
|
||
def calculate_area(latitude, longitude): | ||
""" | ||
Calculate the area of each grid cell on a rectangular grid. | ||
|
||
Parameters | ||
---------- | ||
latitude : iris.cube.Coord | ||
The latitude coordinate of the grid. | ||
|
||
longitude : iris.cube.Coord | ||
The longitude coordinate of the grid. | ||
|
||
Returns | ||
------- | ||
iris.cube | ||
An array with the area (in m2) and the input latitude and longitude as | ||
coordinates. | ||
""" | ||
r_earth = 6378100.0 # from astropy.constants | ||
|
||
lat_i = np.deg2rad(guess_interfaces(latitude.points)) | ||
lon_i = np.deg2rad(guess_interfaces(longitude.points)) | ||
|
||
delta_x = abs(lon_i[1:] - lon_i[:-1]) | ||
delta_y = abs(np.sin(lat_i[1:]) - np.sin(lat_i[:-1])) | ||
|
||
output = np.outer(delta_y, delta_x) * r_earth**2 | ||
output = output.astype('float32') | ||
|
||
result = iris.cube.Cube(output, | ||
standard_name='cell_area', | ||
long_name='cell_area', | ||
var_name='cell_area', | ||
units='m2', | ||
dim_coords_and_dims=[(latitude, 0), | ||
(longitude, 1)]) | ||
|
||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def guess_interfaces(coordinate): | |
""" | |
Calculate the interfaces of a coordinate given its midpoints. | |
Parameters | |
---------- | |
coordinate : iris.cube | |
The coordinate array. | |
Returns | |
------- | |
numpy.array | |
An array with the lenght of the coordinate plus one. | |
""" | |
interfaces = 0.5 * (coordinate[0:-1] + coordinate[1:]) | |
first = 0.5 * (3 * coordinate[0] - coordinate[1]) | |
last = 0.5 * (3 * coordinate[-1] - coordinate[-2]) | |
# Check limits | |
# if coordinate.name.lower() in ['lat', 'latitude']: | |
# first = np.sign(first) * min([abs(first), 90.]) | |
# last = np.sign(last) * min([abs(last), 90.]) | |
interfaces = np.insert(interfaces, 0, first) | |
interfaces = np.append(interfaces, last) | |
return interfaces | |
def calculate_area(latitude, longitude): | |
""" | |
Calculate the area of each grid cell on a rectangular grid. | |
Parameters | |
---------- | |
latitude : iris.cube.Coord | |
The latitude coordinate of the grid. | |
longitude : iris.cube.Coord | |
The longitude coordinate of the grid. | |
Returns | |
------- | |
iris.cube | |
An array with the area (in m2) and the input latitude and longitude as | |
coordinates. | |
""" | |
r_earth = 6378100.0 # from astropy.constants | |
lat_i = np.deg2rad(guess_interfaces(latitude.points)) | |
lon_i = np.deg2rad(guess_interfaces(longitude.points)) | |
delta_x = abs(lon_i[1:] - lon_i[:-1]) | |
delta_y = abs(np.sin(lat_i[1:]) - np.sin(lat_i[:-1])) | |
output = np.outer(delta_y, delta_x) * r_earth**2 | |
output = output.astype('float32') | |
result = iris.cube.Cube(output, | |
standard_name='cell_area', | |
long_name='cell_area', | |
var_name='cell_area', | |
units='m2', | |
dim_coords_and_dims=[(latitude, 0), | |
(longitude, 1)]) | |
return result |
Not necessary anymore (see above)
def dpres_plevel_1d(plev, pmin, pmax): | ||
"""Calculate delta pressure levels. | ||
|
||
The delta pressure levels are based | ||
on the given pressure level coordinate | ||
as numpy array (one-dimensional). | ||
|
||
pmax can be set as float or as a multidimensional | ||
cube. The output of dpres_plevel will have the | ||
same dimensionality. | ||
""" | ||
increasing = (np.diff(plev) >= 0).all() | ||
decreasing = (np.diff(plev) <= 0).all() | ||
|
||
if isinstance(pmax, float): | ||
|
||
dplev = plev.copy() | ||
|
||
for i, lev in enumerate(plev): | ||
if increasing: | ||
if lev == min(plev): | ||
dplev[i] = (plev[i + 1] - lev) / 2. + (lev - pmin) | ||
elif lev == max(plev): | ||
dplev[i] = (pmax - lev) + (lev - plev[i - 1]) / 2. | ||
else: | ||
dplev[i] = ((plev[i + 1] - lev) / 2. | ||
+ (lev - plev[i - 1]) / 2.) | ||
elif decreasing: | ||
if lev == min(plev): | ||
dplev[i] = (lev - pmin) + (plev[i - 1] - lev) / 2. | ||
elif lev == max(plev): | ||
dplev[i] = (lev - plev[i + 1]) / 2. + (pmax - lev) | ||
else: | ||
dplev[i] = ((lev - plev[i + 1]) / 2. | ||
+ (plev[i - 1] - lev) / 2.) | ||
|
||
elif isinstance(pmax, iris.cube.Cube): | ||
|
||
cubelist = [pmax.copy() for lev in plev] | ||
|
||
for i, lev in enumerate(plev): | ||
if increasing: | ||
if lev == min(plev): | ||
cubelist[i].data = (lev - pmin) + (plev[i + 1] - lev) / 2. | ||
cubelist[i] = (((lev - pmin) | ||
+ (plev[i + 1] - lev) / 2.) | ||
* cubelist[i] / cubelist[i]) | ||
elif lev == max(plev): | ||
cubelist[i] = (pmax - lev) + (lev - plev[i - 1]) / 2. | ||
else: | ||
cubelist[i] = (((plev[i + 1] - lev) / 2. | ||
+ (lev - plev[i - 1]) / 2.) | ||
* cubelist[i] / cubelist[i]) | ||
elif decreasing: | ||
if lev == min(plev): | ||
cubelist[i] = (((lev - pmin) | ||
+ (plev[i - 1] - lev) / 2.) | ||
* cubelist[i] / cubelist[i]) | ||
elif lev == max(plev): | ||
cubelist[i] = (lev - plev[i + 1]) / 2. + (pmax - lev) | ||
else: | ||
cubelist[i] = (((lev - plev[i + 1]) / 2. | ||
+ (plev[i - 1] - lev) / 2.) | ||
* cubelist[i] / cubelist[i]) | ||
cubelist[i].units = 'Pa' | ||
cubelist[i].var_name = 'air_pressure' | ||
cubelist[i].add_aux_coord(iris.coords.AuxCoord(lev)) | ||
# merge to single cube | ||
dplev = iris.cube.CubeList(cubelist).merge_cube() | ||
|
||
else: | ||
raise NotImplementedError("Function not implemented" | ||
f" for type {type(pmax)}") | ||
|
||
return dplev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not used anywhere in the code as far as I can tell
cubelist_dplev = [plev_slice.copy() | ||
for plev_slice in plev.slices(['time', | ||
'latitude', | ||
'longitude'], | ||
ordered=True)] | ||
cubelist_plev = [plev_slice.copy() | ||
for plev_slice in plev.slices(['time', | ||
'latitude', | ||
'longitude'], | ||
ordered=True)] | ||
|
||
increasing = (plev.coord(z_coord, | ||
dim_coords=True).attributes['positive'] == 'down') | ||
last = plev.coords(z_coord)[0].shape[0] - 1 | ||
|
||
for i, lev in enumerate(cubelist_plev): | ||
if increasing: | ||
increment = [i + 1, i - 1] | ||
else: | ||
increment = [i - 1, i + 1] | ||
|
||
if i == 0: | ||
cube = (cubelist_plev[increment[0]] - lev) / 2. + (lev - pmin) | ||
cubelist_dplev[i].data = cube.data | ||
elif i == last: | ||
cube = (pmax - lev) + (lev - cubelist_plev[increment[1]]) / 2. | ||
cubelist_dplev[i].data = cube.data | ||
else: | ||
cube = ((lev - cubelist_plev[increment[0]]) / 2. | ||
+ (cubelist_plev[increment[1]] - lev) / 2.) | ||
cubelist_dplev[i].data = cube.data | ||
|
||
cubelist_dplev[i].add_aux_coord(iris.coords.AuxCoord( | ||
lev.coords(z_coord)[0].points[0])) | ||
|
||
dplev = iris.cube.CubeList(cubelist_dplev).merge_cube() | ||
dplev.transpose(new_order=[1, 0, 2, 3]) | ||
dplev = iris.util.reverse(dplev, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating these copies of the cubes and then iterating over them is inefficient and expensive. I think it should be possible to directly operate on the arrays here. We have some code here that tries to do that, but unfortunately that uses numpy arrays instead of dask. It basically boils down to creating a copy of the array with indices shifted by one (using dask.array.roll
) and then add this to the original array.
tropopause = tp_clim | ||
|
||
z_4d = broadcast_to_shape( | ||
var.coord(use_z_coord).points, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var.coord(use_z_coord).points, | |
var.coord(use_z_coord).lazy_points(), |
This will always give a dask array
" have a latitude cooridnate") | ||
|
||
tpp = (300. - 215. * ( | ||
np.cos(np.deg2rad(cube.coord('latitude').points)) ** 2)) * 100. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.cos(np.deg2rad(cube.coord('latitude').points)) ** 2)) * 100. | |
da.cos(da.deg2rad(cube.coord('latitude').lazy_points())) ** 2)) * 100. |
|
||
@staticmethod | ||
def get_provenance_record(ancestor_files, **kwargs): | ||
"""Create provenance record for the diagnostic data and plots.""" | ||
record = { | ||
'authors': [ | ||
'vegas-regidor_javier', | ||
], | ||
'references': [ | ||
'acknow_project', | ||
], | ||
'ancestors': ancestor_files, | ||
**kwargs | ||
} | ||
return record | ||
|
||
def get_plot_path(self, plot_type, var_info, add_ext=True): | ||
"""Get plot full path from variable info. | ||
|
||
Parameters | ||
---------- | ||
plot_type: str | ||
Name of the plot | ||
var_info: dict | ||
Variable information from ESMValTool | ||
add_ext: bool, optional (default: True) | ||
Add filename extension from configuration file. | ||
|
||
""" | ||
return os.path.join( | ||
self.get_plot_folder(var_info), | ||
self.get_plot_name(plot_type, var_info, add_ext=add_ext), | ||
) | ||
|
||
def get_plot_folder(self, var_info): | ||
"""Get plot storage folder from variable info. | ||
|
||
Parameters | ||
---------- | ||
var_info: dict | ||
Variable information from ESMValTool | ||
|
||
""" | ||
info = { | ||
'real_name': self._real_name(var_info['variable_group']), | ||
**var_info | ||
} | ||
folder = os.path.expandvars( | ||
os.path.expanduser( | ||
list(_replace_tags(self.plot_folder, info))[0] | ||
) | ||
) | ||
if self.plot_folder.startswith('/'): | ||
folder = '/' + folder | ||
if not os.path.isdir(folder): | ||
os.makedirs(folder, exist_ok=True) | ||
return folder | ||
|
||
def get_plot_name(self, plot_type, var_info, add_ext=True): | ||
"""Get plot filename from variable info. | ||
|
||
Parameters | ||
---------- | ||
plot_type: str | ||
Name of the plot | ||
var_info: dict | ||
Variable information from ESMValTool | ||
add_ext: bool, optional (default: True) | ||
Add filename extension from configuration file. | ||
|
||
""" | ||
info = { | ||
"plot_type": plot_type, | ||
'real_name': self._real_name(var_info['variable_group']), | ||
**var_info | ||
} | ||
file_name = list(_replace_tags(self.plot_filename, info))[0] | ||
if add_ext: | ||
file_name = self._add_file_extension(file_name) | ||
return file_name | ||
|
||
@staticmethod | ||
def _set_rasterized(axes=None): | ||
"""Rasterize all artists and collection of axes if desired.""" | ||
if axes is None: | ||
axes = plt.gca() | ||
if not isinstance(axes, list): | ||
axes = [axes] | ||
for single_axes in axes: | ||
for artist in single_axes.artists: | ||
artist.set_rasterized(True) | ||
for collection in single_axes.collections: | ||
collection.set_rasterized(True) | ||
|
||
@staticmethod | ||
def _real_name(variable_group): | ||
for subfix in ('Ymean', 'Ysum', 'mean', 'sum'): | ||
if variable_group.endswith(subfix): | ||
variable_group = variable_group.replace(subfix, '') | ||
return variable_group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check if you really needs all these class methods, it seems to me like at least some of them are not used.
…etime calculated separated from the LifetimeBase class
Description
This diagnostic calculates the CH4 lifetime and plot it in several different plot types.
This is work in progress
CH4 lifetime calculated in Troposphere only
lifetime plotted as timeseries only
use of model data on modellevels and on pressurelevels possible but comparability not fully tested
calculates a climatological tropopause-pressure,
relies on grmass for calculating CH4 mass, other option needs debugging
Link to documentation:
Checklist
It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.
New or updated recipe/diagnostic