Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New CH4 lifetime diagnostic #3507

Draft
wants to merge 56 commits into
base: main
Choose a base branch
from
Draft

New CH4 lifetime diagnostic #3507

wants to merge 56 commits into from

Conversation

FranziskaWinterstein
Copy link
Contributor

@FranziskaWinterstein FranziskaWinterstein commented Jan 23, 2024

Description

This diagnostic calculates the CH4 lifetime and plot it in several different plot types.

This is work in progress

  • CH4 lifetime calculated in Troposphere only

  • lifetime plotted as timeseries only

  • use of model data on modellevels and on pressurelevels possible but comparability not fully tested

  • calculates a climatological tropopause-pressure,

  • relies on grmass for calculating CH4 mass, other option needs debugging

  • Link to documentation:


Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.

New or updated recipe/diagnostic

Copy link
Contributor

@schlunma schlunma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution @FranziskaWinterstein! The code looks very good already, I just have a few comments.

Before this can be merged, please

  • add some documentation. Your files basically include all relevant information in their docstring already, so I think you can re-use that by adding a file for your dianostic to the API doc (see example here) which you can reference from the recipe doc (see example here).
  • make sure to delete all unused code (right now, there are lots of commented lines)
  • fix all the flake8 issues. This should also fix most of the Codacy issues.

esmvaltool/diag_scripts/lifetime/lifetime.py Outdated Show resolved Hide resolved
esmvaltool/diag_scripts/lifetime/lifetime.py Outdated Show resolved Hide resolved
esmvaltool/diag_scripts/lifetime/lifetime.py Outdated Show resolved Hide resolved
esmvaltool/diag_scripts/lifetime/lifetime.py Outdated Show resolved Hide resolved
esmvaltool/diag_scripts/lifetime/lifetime.py Outdated Show resolved Hide resolved
esmvaltool/diag_scripts/monitor/multi_datasets.py Outdated Show resolved Hide resolved
# from sklearn.metrics import r2_score
from scipy.constants import N_A, R, G

from esmvalcore.cmor._fixes.shared import add_model_level
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find this function in the current ESMValCore version. Also, please do not import private objects (i.e., those that start with with a _) in your code, as those might change without any prior notice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I added this function to my developer version of ESMValCore (branch cmor-4-lifetime). Maybe it is not an appropriate place for this function. I need the number of the model level and would like to add it by the preprocessor. However, it gets lost during the diagnostic. I am unsure if I should leave it in the ESMValCore (at some other point) or put it into the lifetime diagnostic itself. What do you think? I thought it might be useful for other applications as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, that makes sense. If you use the function in ESMValCore, it's fine to leave it there. You can then just add a public version of it here, which you can then import via

from esmvalcore.cmor.fixes import add_model_level

If you don't use it in ESMValCore, it's better to just put it in the diagnostic.

Copy link
Contributor

@schlunma schlunma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice PR @FranziskaWinterstein! I just have a couple of suggestions on how to make the code more efficient.

In addition to these specific comments, you can try to consistently use Cube.core_data() instead of Cube.data everywhere (similar Coord.core_points() instead of Coord.points for multidimensional coordinates if necessary), and replace np. with da. (with import dask.array as da). This will then hopefully decrease the memory usage of your diagnostic dramatically.

Comment on lines 69 to 70
area = calculate_area(hus.coord('latitude'),
hus.coord('longitude'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can probably replace this with

Suggested change
area = calculate_area(hus.coord('latitude'),
hus.coord('longitude'))
lat_lon_dims = sorted(
tuple(set(hus.coord_dims('latitude') + hus.coord_dims('longitude')))
)
lat_lon_slice = next(hus.slices(['latitude', 'longitude'], ordered=False))
area_2d = iris.analysis.cartography.area_weights(lat_lon_slice)
area = broadcast_to_shape(
da.array(area_2d),
hus.shape,
lat_lon_dims,
chunks=hus.lazy_data().chunksize,
)

And remove the custom calculate_area code. Note that area is now an array, not a cube (but that shouldn't be a problem I think).

In theory you should be able to simply do area = iris.analysis.cartography.area_weights(hus), but that's not fully lazy yet and will return a numpy array instead of a dask array.

press = create_press(temp)

rho = N_A / 10.**6
rho = rho * iris.analysis.maths.divide(press, R * temp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity: have you tried rho = rho * press / (R * temp)?

Comment on lines 177 to 245


def guess_interfaces(coordinate):
"""
Calculate the interfaces of a coordinate given its midpoints.

Parameters
----------
coordinate : iris.cube
The coordinate array.

Returns
-------
numpy.array
An array with the lenght of the coordinate plus one.
"""
interfaces = 0.5 * (coordinate[0:-1] + coordinate[1:])
first = 0.5 * (3 * coordinate[0] - coordinate[1])
last = 0.5 * (3 * coordinate[-1] - coordinate[-2])

# Check limits
# if coordinate.name.lower() in ['lat', 'latitude']:
# first = np.sign(first) * min([abs(first), 90.])
# last = np.sign(last) * min([abs(last), 90.])

interfaces = np.insert(interfaces, 0, first)
interfaces = np.append(interfaces, last)

return interfaces


def calculate_area(latitude, longitude):
"""
Calculate the area of each grid cell on a rectangular grid.

Parameters
----------
latitude : iris.cube.Coord
The latitude coordinate of the grid.

longitude : iris.cube.Coord
The longitude coordinate of the grid.

Returns
-------
iris.cube
An array with the area (in m2) and the input latitude and longitude as
coordinates.
"""
r_earth = 6378100.0 # from astropy.constants

lat_i = np.deg2rad(guess_interfaces(latitude.points))
lon_i = np.deg2rad(guess_interfaces(longitude.points))

delta_x = abs(lon_i[1:] - lon_i[:-1])
delta_y = abs(np.sin(lat_i[1:]) - np.sin(lat_i[:-1]))

output = np.outer(delta_y, delta_x) * r_earth**2
output = output.astype('float32')

result = iris.cube.Cube(output,
standard_name='cell_area',
long_name='cell_area',
var_name='cell_area',
units='m2',
dim_coords_and_dims=[(latitude, 0),
(longitude, 1)])

return result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def guess_interfaces(coordinate):
"""
Calculate the interfaces of a coordinate given its midpoints.
Parameters
----------
coordinate : iris.cube
The coordinate array.
Returns
-------
numpy.array
An array with the lenght of the coordinate plus one.
"""
interfaces = 0.5 * (coordinate[0:-1] + coordinate[1:])
first = 0.5 * (3 * coordinate[0] - coordinate[1])
last = 0.5 * (3 * coordinate[-1] - coordinate[-2])
# Check limits
# if coordinate.name.lower() in ['lat', 'latitude']:
# first = np.sign(first) * min([abs(first), 90.])
# last = np.sign(last) * min([abs(last), 90.])
interfaces = np.insert(interfaces, 0, first)
interfaces = np.append(interfaces, last)
return interfaces
def calculate_area(latitude, longitude):
"""
Calculate the area of each grid cell on a rectangular grid.
Parameters
----------
latitude : iris.cube.Coord
The latitude coordinate of the grid.
longitude : iris.cube.Coord
The longitude coordinate of the grid.
Returns
-------
iris.cube
An array with the area (in m2) and the input latitude and longitude as
coordinates.
"""
r_earth = 6378100.0 # from astropy.constants
lat_i = np.deg2rad(guess_interfaces(latitude.points))
lon_i = np.deg2rad(guess_interfaces(longitude.points))
delta_x = abs(lon_i[1:] - lon_i[:-1])
delta_y = abs(np.sin(lat_i[1:]) - np.sin(lat_i[:-1]))
output = np.outer(delta_y, delta_x) * r_earth**2
output = output.astype('float32')
result = iris.cube.Cube(output,
standard_name='cell_area',
long_name='cell_area',
var_name='cell_area',
units='m2',
dim_coords_and_dims=[(latitude, 0),
(longitude, 1)])
return result

Not necessary anymore (see above)

Comment on lines 297 to 371
def dpres_plevel_1d(plev, pmin, pmax):
"""Calculate delta pressure levels.

The delta pressure levels are based
on the given pressure level coordinate
as numpy array (one-dimensional).

pmax can be set as float or as a multidimensional
cube. The output of dpres_plevel will have the
same dimensionality.
"""
increasing = (np.diff(plev) >= 0).all()
decreasing = (np.diff(plev) <= 0).all()

if isinstance(pmax, float):

dplev = plev.copy()

for i, lev in enumerate(plev):
if increasing:
if lev == min(plev):
dplev[i] = (plev[i + 1] - lev) / 2. + (lev - pmin)
elif lev == max(plev):
dplev[i] = (pmax - lev) + (lev - plev[i - 1]) / 2.
else:
dplev[i] = ((plev[i + 1] - lev) / 2.
+ (lev - plev[i - 1]) / 2.)
elif decreasing:
if lev == min(plev):
dplev[i] = (lev - pmin) + (plev[i - 1] - lev) / 2.
elif lev == max(plev):
dplev[i] = (lev - plev[i + 1]) / 2. + (pmax - lev)
else:
dplev[i] = ((lev - plev[i + 1]) / 2.
+ (plev[i - 1] - lev) / 2.)

elif isinstance(pmax, iris.cube.Cube):

cubelist = [pmax.copy() for lev in plev]

for i, lev in enumerate(plev):
if increasing:
if lev == min(plev):
cubelist[i].data = (lev - pmin) + (plev[i + 1] - lev) / 2.
cubelist[i] = (((lev - pmin)
+ (plev[i + 1] - lev) / 2.)
* cubelist[i] / cubelist[i])
elif lev == max(plev):
cubelist[i] = (pmax - lev) + (lev - plev[i - 1]) / 2.
else:
cubelist[i] = (((plev[i + 1] - lev) / 2.
+ (lev - plev[i - 1]) / 2.)
* cubelist[i] / cubelist[i])
elif decreasing:
if lev == min(plev):
cubelist[i] = (((lev - pmin)
+ (plev[i - 1] - lev) / 2.)
* cubelist[i] / cubelist[i])
elif lev == max(plev):
cubelist[i] = (lev - plev[i + 1]) / 2. + (pmax - lev)
else:
cubelist[i] = (((lev - plev[i + 1]) / 2.
+ (plev[i - 1] - lev) / 2.)
* cubelist[i] / cubelist[i])
cubelist[i].units = 'Pa'
cubelist[i].var_name = 'air_pressure'
cubelist[i].add_aux_coord(iris.coords.AuxCoord(lev))
# merge to single cube
dplev = iris.cube.CubeList(cubelist).merge_cube()

else:
raise NotImplementedError("Function not implemented"
f" for type {type(pmax)}")

return dplev
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not used anywhere in the code as far as I can tell

Comment on lines 256 to 293
cubelist_dplev = [plev_slice.copy()
for plev_slice in plev.slices(['time',
'latitude',
'longitude'],
ordered=True)]
cubelist_plev = [plev_slice.copy()
for plev_slice in plev.slices(['time',
'latitude',
'longitude'],
ordered=True)]

increasing = (plev.coord(z_coord,
dim_coords=True).attributes['positive'] == 'down')
last = plev.coords(z_coord)[0].shape[0] - 1

for i, lev in enumerate(cubelist_plev):
if increasing:
increment = [i + 1, i - 1]
else:
increment = [i - 1, i + 1]

if i == 0:
cube = (cubelist_plev[increment[0]] - lev) / 2. + (lev - pmin)
cubelist_dplev[i].data = cube.data
elif i == last:
cube = (pmax - lev) + (lev - cubelist_plev[increment[1]]) / 2.
cubelist_dplev[i].data = cube.data
else:
cube = ((lev - cubelist_plev[increment[0]]) / 2.
+ (cubelist_plev[increment[1]] - lev) / 2.)
cubelist_dplev[i].data = cube.data

cubelist_dplev[i].add_aux_coord(iris.coords.AuxCoord(
lev.coords(z_coord)[0].points[0]))

dplev = iris.cube.CubeList(cubelist_dplev).merge_cube()
dplev.transpose(new_order=[1, 0, 2, 3])
dplev = iris.util.reverse(dplev, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating these copies of the cubes and then iterating over them is inefficient and expensive. I think it should be possible to directly operate on the arrays here. We have some code here that tries to do that, but unfortunately that uses numpy arrays instead of dask. It basically boils down to creating a copy of the array with indices shifted by one (using dask.array.roll) and then add this to the original array.

tropopause = tp_clim

z_4d = broadcast_to_shape(
var.coord(use_z_coord).points,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var.coord(use_z_coord).points,
var.coord(use_z_coord).lazy_points(),

This will always give a dask array

" have a latitude cooridnate")

tpp = (300. - 215. * (
np.cos(np.deg2rad(cube.coord('latitude').points)) ** 2)) * 100.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
np.cos(np.deg2rad(cube.coord('latitude').points)) ** 2)) * 100.
da.cos(da.deg2rad(cube.coord('latitude').lazy_points())) ** 2)) * 100.

Comment on lines 618 to 875

@staticmethod
def get_provenance_record(ancestor_files, **kwargs):
"""Create provenance record for the diagnostic data and plots."""
record = {
'authors': [
'vegas-regidor_javier',
],
'references': [
'acknow_project',
],
'ancestors': ancestor_files,
**kwargs
}
return record

def get_plot_path(self, plot_type, var_info, add_ext=True):
"""Get plot full path from variable info.

Parameters
----------
plot_type: str
Name of the plot
var_info: dict
Variable information from ESMValTool
add_ext: bool, optional (default: True)
Add filename extension from configuration file.

"""
return os.path.join(
self.get_plot_folder(var_info),
self.get_plot_name(plot_type, var_info, add_ext=add_ext),
)

def get_plot_folder(self, var_info):
"""Get plot storage folder from variable info.

Parameters
----------
var_info: dict
Variable information from ESMValTool

"""
info = {
'real_name': self._real_name(var_info['variable_group']),
**var_info
}
folder = os.path.expandvars(
os.path.expanduser(
list(_replace_tags(self.plot_folder, info))[0]
)
)
if self.plot_folder.startswith('/'):
folder = '/' + folder
if not os.path.isdir(folder):
os.makedirs(folder, exist_ok=True)
return folder

def get_plot_name(self, plot_type, var_info, add_ext=True):
"""Get plot filename from variable info.

Parameters
----------
plot_type: str
Name of the plot
var_info: dict
Variable information from ESMValTool
add_ext: bool, optional (default: True)
Add filename extension from configuration file.

"""
info = {
"plot_type": plot_type,
'real_name': self._real_name(var_info['variable_group']),
**var_info
}
file_name = list(_replace_tags(self.plot_filename, info))[0]
if add_ext:
file_name = self._add_file_extension(file_name)
return file_name

@staticmethod
def _set_rasterized(axes=None):
"""Rasterize all artists and collection of axes if desired."""
if axes is None:
axes = plt.gca()
if not isinstance(axes, list):
axes = [axes]
for single_axes in axes:
for artist in single_axes.artists:
artist.set_rasterized(True)
for collection in single_axes.collections:
collection.set_rasterized(True)

@staticmethod
def _real_name(variable_group):
for subfix in ('Ymean', 'Ysum', 'mean', 'sum'):
if variable_group.endswith(subfix):
variable_group = variable_group.replace(subfix, '')
return variable_group
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check if you really needs all these class methods, it seems to me like at least some of them are not used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants