Skip to content

Commit

Permalink
Improve release tools (#3462)
Browse files Browse the repository at this point in the history
  • Loading branch information
bouweandela authored Dec 20, 2023
1 parent 1c57025 commit 84706a0
Show file tree
Hide file tree
Showing 9 changed files with 978 additions and 902 deletions.
1,456 changes: 728 additions & 728 deletions doc/sphinx/source/changelog.rst

Large diffs are not rendered by default.

7 changes: 2 additions & 5 deletions doc/sphinx/source/community/maintainer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ the interface of library functions may change when updating to new versions. Thi
withdrawal of datasets used by a recipe can result in an existing recipe to stop working. Such "broken"
recipes might require some work to fix such problems and make the recipe fully functional again.

A first **contact point** for the technical lead development team (`@ESMValGroup/technical-lead-development-team`_) in such cases is the recipe "maintainer". The recipe
A first **contact point** for the technical lead development team (:team:`technical-lead-development-team`) in such cases is the recipe "maintainer". The recipe
maintainer is then asked to check the affected recipe and if possible, fix the problems or work with the technical
lead development team to find a solution. Ideally, a recipe maintainer is able to tell whether the results of a fixed
recipe are scientifically valid and look as expected. Being a recipe maintainer consists of the following tasks:
Expand All @@ -22,10 +22,7 @@ recipe are scientifically valid and look as expected. Being a recipe maintainer
* informing the core development team when no longer available as maintainer

Ideally, a recipe maintainer is named when contributing a new recipe to the ESMValTool. Recipe maintainers are asked to inform
the core development team (`@ESMValGroup/esmvaltool-coreteam`_) when they are no longer able to act as maintainer or when they would like to step down from this duty
the core development team (:team:`esmvaltool-coreteam`) when they are no longer able to act as maintainer or when they would like to step down from this duty
for any reason. The core development team will then try to find a successor. If no recipe maintainer can be found, the
:ref:`policy on unmaintained broken (non-working) recipes<broken-recipe-policy>` might apply eventually leading to
retirement of the affected recipe.

.. _`@ESMValGroup/technical-lead-development-team`: https://github.com/orgs/ESMValGroup/teams/technical-lead-development-team
.. _`@ESMValGroup/esmvaltool-coreteam`: https://github.com/orgs/ESMValGroup/teams/esmvaltool-coreteam
Original file line number Diff line number Diff line change
Expand Up @@ -136,8 +136,8 @@ Recipe output can be copied by doing from the VM:

.. code-block:: bash
nohup rsync -rlt /path_to_testing/esmvaltool_output/* /shared/esmvaltool/v2.x.x/
nohup rsync --exclude preproc/ -rlt /path_to_testing/esmvaltool_output/* /shared/esmvaltool/v2.x.x/
By copying the debug.html and index.html files into /shared/esmvaltool/v2.x.x/, the output
becomes available online, see for `example
<https://esmvaltool.dkrz.de/shared/esmvaltool/v2.7.0>`_.
Expand All @@ -151,13 +151,13 @@ Link the overview webpage to the release issue.
This makes it much easier to ask for feedback from recipe developers and analyse failures.

Results produced with the final ESMValCore release candidate should be put in a VM directory
named after the version number, e.g. ``v2.x.x``.
named after the version number, e.g. ``v2.x.x``.
Once the release process is over, test results produced with previous release candidates can be deleted to save space on the VM.

.. note::

If you wrote recipe runs output to Levante's `/scratch` partition, be aware that
the data will be removed after two weeks, so you will have to quickly move the
the data will be removed after two weeks, so you will have to quickly move the
output data to the VM, using the ``nohup`` command above.

Running the comparison
Expand Down Expand Up @@ -189,15 +189,15 @@ The steps to running the compare tool on the VM are the following:
- prerequisite - install `imagehash`: `pip install imagehash`
- reference run (v2.7.0; previous stable release): `export reference_dir=/work/bd0854/b382109/v270` (contains `preproc/` dirs too, 122 recipes)
- current run (v2.8.0): `export current_dir=path_to_current_run`
- run the :ref:`comparison script<compare_recipe_runs>` with:
- run the :ref:`comparison script<compare_recipe_runs>` with:

.. code-block:: bash
nohup python ESMValTool/esmvaltool/utils/testing/regression/compare.py --reference $reference_dir --current $current_dir > compare_v280_output.txt
Copy the comparison txt file to the release issue.
Some of the recipes will appear as having identical output to the one from previous release.
However, others will need human inspection.
Some of the recipes will appear as having identical output to the one from previous release.
However, others will need human inspection.
Ask the recipe maintainers (`@ESMValGroup/esmvaltool-recipe-maintainers`_) and ESMValTool Development Team (`@ESMValGroup/esmvaltool-developmentteam`_) to provide assistance in checking the results.
Here are some guidelines on how to perform the human inspection:

Expand Down
58 changes: 33 additions & 25 deletions doc/sphinx/source/community/release_strategy/release_strategy.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,34 +54,40 @@ With the following release schedule, we strive to have three releases per year a
Upcoming releases
^^^^^^^^^^^^^^^^^

- 2.10.0 (Release Manager: `Klaus Zimmermann`_)
- 2.11.0 (Release Manager: TBD)

+------------+--------------------------+
| 2023-10-02 |ESMValCore feature freeze |
+------------+--------------------------+
| 2023-10-09 |ESMValCore release |
+------------+--------------------------+
| 2023-10-16 |ESMValTool feature freeze |
+------------+--------------------------+
| 2023-10-23 |ESMValTool release |
+------------+--------------------------+
Planned for February or March 2024

Past releases
^^^^^^^^^^^^^

- 2.10.0 (Release Manager: `Klaus Zimmermann`_)

+------------+------------+----------------------------------------+-------------------------------------+
| Planned | Done | Event | Changelog |
+============+============+========================================+=====================================+
| 2023-10-02 | | ESMValCore `Feature Freeze`_ | |
+------------+------------+----------------------------------------+-------------------------------------+
| 2023-10-09 | 2023-12-19 | :esmvalcore-release:`v2.10.0` released | :ref:`esmvalcore:changelog-v2-10-0` |
+------------+------------+----------------------------------------+-------------------------------------+
| 2023-10-16 | | ESMValTool `Feature Freeze`_ | |
+------------+------------+----------------------------------------+-------------------------------------+
| 2023-10-16 | 2023-12-20 | :release:`v2.10.0` released | :ref:`changelog-v2-10-0` |
+------------+------------+----------------------------------------+-------------------------------------+

- 2.9.0 (Release Manager: `Bouwe Andela`_)

+------------+------------+---------------------------------------------------------------------------------------------+------------------------------------+
| Planned | Done | Event | Changelog |
+============+============+=============================================================================================+====================================+
| 2023-06-05 | | ESMValCore Feature Freeze | |
+------------+------------+---------------------------------------------------------------------------------------------+------------------------------------+
| 2023-06-12 | 2023-07-04 | `ESMValCore Release 2.9.0 <https://github.com/ESMValGroup/ESMValCore/releases/tag/v2.9.0>`_ | :ref:`esmvalcore:changelog-v2-9-0` |
+------------+------------+---------------------------------------------------------------------------------------------+------------------------------------+
| 2023-06-19 | | ESMValTool Feature Freeze | |
+------------+------------+---------------------------------------------------------------------------------------------+------------------------------------+
| 2023-06-26 | 2023-07-06 | `ESMValTool Release 2.9.0 <https://github.com/ESMValGroup/ESMValTool/releases/tag/v2.9.0>`_ | :ref:`changelog-v2-9-0` |
+------------+------------+---------------------------------------------------------------------------------------------+------------------------------------+
+------------+------------+---------------------------------------+-------------------------------------+
| Planned | Done | Event | Changelog |
+============+============+=======================================+=====================================+
| 2023-06-05 | | ESMValCore `Feature Freeze`_ | |
+------------+------------+---------------------------------------+-------------------------------------+
| 2023-06-12 | 2023-07-04 | :esmvalcore-release:`v2.9.0` released | :ref:`esmvalcore:changelog-v2-9-0` |
+------------+------------+---------------------------------------+-------------------------------------+
| 2023-06-19 | | ESMValTool `Feature Freeze`_ | |
+------------+------------+---------------------------------------+-------------------------------------+
| 2023-06-26 | 2023-07-06 | :release:`v2.9.0` released | :ref:`changelog-v2-9-0` |
+------------+------------+---------------------------------------+-------------------------------------+

- 2.8.1 (Bugfix, Release Manager: `Valeriu Predoi`_)

Expand Down Expand Up @@ -287,6 +293,8 @@ These are the detailed steps to take to make a release.
- If a bug is discovered that needs to be fixed before the release, a pull request can be made to the main branch to fix the bug. The person making the pull request can then ask the release manager to cherry-pick that commit into the release branch.
- Update the :ref:`list of broken recipes <broken-recipe-list>` with new recipes that could not be run successfully during the testing.
Open a separate GitHub issue for each failing recipe and assign the next milestone.
Open an overview issue, see :issue:`3484` for an example, and review past overview issues.
Take action to ensure that the broken recipe policy is followed.


#. ESMValCore release
Expand Down Expand Up @@ -339,7 +347,7 @@ Glossary

Feature freeze
~~~~~~~~~~~~~~
The date on which no new features may be submitted for the upcoming release.
The date on which no new features may be submitted for the upcoming release.
After this date, only critical bug fixes can still be included to the :ref:`release_branch`.
Development work can continue in the main branch.
If you are unsure whether new developments could interfere with the release, check with the :ref:`release_manager`.
Expand Down Expand Up @@ -411,7 +419,7 @@ All tests should pass before making a release (branch).
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The version number is automatically generated from the information provided by
git using [setuptools-scm](https://pypi.org/project/setuptools-scm/), but a
git using `setuptools-scm <https://pypi.org/project/setuptools-scm/>`__, but a
static version number is stored in ``CITATION.cff``.
Make sure to update the version number and release date in ``CITATION.cff``.
See https://semver.org for more information on choosing a version number.
Expand Down Expand Up @@ -464,8 +472,8 @@ and create the new release from the release branch (i.e. not from ``main``).
The release tag always starts with the letter ``v`` followed by the version
number, e.g. ``v2.1.0``.

6. Mark the release in the main branch
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
6. Merge the release branch back into the main branch
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When the (pre-)release is tagged, it is time to merge the release branch back into `main`.
We do this for two reasons, namely, one, to mark the point up to which commits in `main`
Expand Down
35 changes: 35 additions & 0 deletions doc/sphinx/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.doctest',
'sphinx.ext.extlinks',
'sphinx.ext.intersphinx',
'sphinx.ext.todo',
'sphinx.ext.coverage',
Expand Down Expand Up @@ -441,6 +442,40 @@
'sklearn': ('https://scikit-learn.org/stable', None),
}

# -- Extlinks extension -------------------------------------------------------
# See https://www.sphinx-doc.org/en/master/usage/extensions/extlinks.html

extlinks = {
"discussion": (
"https://github.com/ESMValGroup/ESMValTool/discussions/%s",
"Discussion #%s",
),
"issue": (
"https://github.com/ESMValGroup/ESMValTool/issues/%s",
"Issue #%s",
),
"pull": (
"https://github.com/ESMValGroup/ESMValTool/pull/%s",
"Pull request #%s",
),
"release": (
"https://github.com/ESMValGroup/ESMValTool/releases/tag/%s",
"ESMValTool %s",
),
"esmvalcore-release": (
"https://github.com/ESMValGroup/ESMValCore/releases/tag/%s",
"ESMValCore %s",
),
"team": (
"https://github.com/orgs/ESMValGroup/teams/%s",
"@ESMValGroup/%s",
),
"user": (
"https://github.com/%s",
"@%s",
),
}

# -- Custom Document processing ----------------------------------------------

import gensidebar
Expand Down
32 changes: 15 additions & 17 deletions doc/sphinx/source/utils.rst
Original file line number Diff line number Diff line change
Expand Up @@ -118,11 +118,11 @@ Running multiple recipes
It is possible to run more than one recipe in one go.

This can for example be achieved by using ``rose`` and/or ``cylc``, tools
that may be available at your local HPC cluster.
that may be available at your local HPC cluster.

In the case in which neither ``rose`` nor ``cylc`` are available at your HPC cluster,
it is possible to automatically generate job submission scripts, as well as a summary of the
job outputs using the scripts available in
job outputs using the scripts available in
`esmvaltool/utils/batch-jobs <https://github.com/ESMValGroup/ESMValTool/blob/main/esmvaltool/utils/batch-jobs>`__.

Using cylc
Expand Down Expand Up @@ -218,7 +218,7 @@ a copy of `u-bd684` is always located in ``/home/users/valeriu/roses/u-bd684`` o
Using the scripts in `utils/batch-jobs`
---------------------------------------

In `utils/batch-jobs <https://github.com/ESMValGroup/ESMValTool/blob/main/esmvaltool/utils/batch-jobs>`_,
In `utils/batch-jobs <https://github.com/ESMValGroup/ESMValTool/blob/main/esmvaltool/utils/batch-jobs>`_,
you can find a script to generate slurm submission scripts for all available recipes in ESMValTool,
as well as a script to parse the job outputs.

Expand All @@ -227,15 +227,15 @@ as well as a script to parse the job outputs.
Using `generate.py`
...................

The script `generate.py <https://github.com/ESMValGroup/ESMValTool/blob/main/esmvaltool/utils/batch-jobs/generate.py>`_,
The script `generate.py <https://github.com/ESMValGroup/ESMValTool/blob/main/esmvaltool/utils/batch-jobs/generate.py>`_,
is a simple python script that creates slurm submission scripts, and
if configured, submits them to the HPC cluster. It has been tested in `DKRZ's Levante cluster <https://docs.dkrz.de/doc/levante/index.html>`_.

The following parameters have to be set in the script in order to make it run:

* ``env``, *str*: Name of the conda environment in which `esmvaltool` is installed.
* ``mail``, *bool*: Whether or not to recieve mail notifications when a submitted job fails or finishes successfully. Default is ``False``.
* ``submit``, *bool*: Wheter or not to automatically submit the job after creating the launch script. Default value is ``False``.
* ``mail``, *bool*: Whether or not to receive mail notifications when a submitted job fails or finishes successfully. Default is ``False``.
* ``submit``, *bool*: Whether or not to automatically submit the job after creating the launch script. Default value is ``False``.
* ``account``, *str*: Name of the DKRZ account in which the job will be billed.
* ``outputs``, *str*: Name of the directory in which the job outputs (.out and .err files) are going to be saved. The outputs will be saved in `/home/user/<outputs>`.
* ``conda_path``, *str*: Full path to the `mambaforge/etc/profile.d/conda.sh` executable.
Expand All @@ -247,10 +247,10 @@ Optionally, the following parameters can be edited:
* ``memory``, *str*: Amount of memory requested for each run. Default is ``64G`` to allow to run 4 recipes on the same node in parallel.
* ``time``, *str*: Time limit. Default is ``04:00:00`` to increase the job priority. Jobs can run for up to 8 hours and 12 hours on the compute and interactive partitions, respectively.
* ``default_max_parallel_tasks``, *int*: Default is ``8`` which works for most recipes. For other cases, an entry needs to be made to the ``MAX_PARALLEL_TASKS`` dictionary (see below).

The script will generate a submission script for all recipes using by default the ``interactive`` queue and with a time limit of 4h. In case a recipe
may require of additional resources, they can be defined in the ``SPECIAL_RECIPES`` dictionary. The recipe name has to be given as a ``key`` in which the
values are another dictionary.
values are another dictionary.
The latter are used to specify the ``partition`` in which to submit the recipe, the new ``time`` limit and other ``memory`` requirements
given by the slurm flags ``--mem``, ``--constraint`` or ``--ntasks``. In general, an entry in ``SPECIAL_RECIPES`` should be set as:

Expand Down Expand Up @@ -284,17 +284,15 @@ Using `parse_recipes_outputs`
You can run this script (simply as a standalone Python script) after all recipes have been run, to gather a bird's eye view
of the run status for each recipe; running the script provides you with a Markdown-formatted list of recipes that succeeded,
recipes that failed due to a diagnostic error, and recipes that failed due to missing data (the two most common causes for
recipe run failure). You should add a ``SLURM_OUT_DIR`` e.g. ``SLURM_OUT_DIR = "/home/b/b382109/output_v270"`` - this is the
physical location of your SLURM output, after all recipes have finished running and a ``GLOB_PATTERN``, a glob pattern,
which is reccommended to be set to the ``*.out`` extension, so that the script finds all the ``.out`` files.

To keep the script execution fast, it is recommended to use ``log_level: info`` in your config-user.yml file so that SLURM
output files are rather small. This script also requires a list of recipes stored in a ``all_recipes.txt`` file, which can
be obtained by running:
recipe run failure). You should provide the location of the output log files from SLURM (``*.out`` and ``*.err``) to the
script as well as a list of all available recipes. To generate the list, run the command:

.. code-block:: bash
for recipe in $(esmvaltool recipes list | grep '\.yml$'); do echo "$recipe"; done > all_recipes.txt
for recipe in $(esmvaltool recipes list | grep '\.yml$'); do echo $(basename "$recipe"); done > all_recipes.txt
To keep the script execution fast, it is recommended to use ``log_level: info`` in your config-user.yml file so that SLURM
output files are rather small.

.. _overview_page:

Expand Down Expand Up @@ -323,7 +321,7 @@ Comparing recipe runs
A command-line tool is available for comparing one or more recipe runs to
known good previous run(s).
This tool uses `xarray <https://docs.xarray.dev/en/stable/>`_ to compare NetCDF
files and difference hasing provided by
files and difference hashing provided by
`imagehash <https://pypi.org/project/ImageHash/>`_ to compare PNG images.
All other file types are compared byte for byte.

Expand Down
Loading

0 comments on commit 84706a0

Please sign in to comment.