Skip to content

Initialize data version control for managing test images #1036

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Mar 18, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
48fb2d9
Initialize data version control
weiji14 Mar 11, 2021
4999875
Set dvc remote as https://dagshub.com/GenericMappingTools/pygmt.dvc
weiji14 Mar 11, 2021
9b61c77
Temporarily installing dvc using pip instead of conda to make CI work
weiji14 Mar 11, 2021
0c35dff
Refactor test_logo to use mpl_image_compare and track png files in dvc
weiji14 Mar 11, 2021
567c967
Add dvc install and dvc pull as a step in ci_tests.yaml to pull in data
weiji14 Mar 12, 2021
7e0940c
Merge branch 'master' into data_version_control
weiji14 Mar 12, 2021
4833466
List files in pygmt directory to see what happens after dvc pull
weiji14 Mar 12, 2021
f0ab167
Do `dvc pull` before `pip install` otherwise test PNGs aren't there
weiji14 Mar 12, 2021
f5e25fe
Merge branch 'master' into data_version_control
weiji14 Mar 15, 2021
6bd7ba9
First draft of instructions for using dvc to store baseline images
weiji14 Mar 15, 2021
e30c708
Instruct to do `git push` first and then `dvc push`
weiji14 Mar 16, 2021
df1ab56
Merge branch 'master' into data_version_control
weiji14 Mar 16, 2021
3208519
New checklist item for maintainers to get added to DAGsHub dvc remote
weiji14 Mar 16, 2021
2bd88c8
Move pygmt/tests/baseline/.gitignore to top-level
weiji14 Mar 17, 2021
93f6d6e
Just use `dvc push` without setting --remote upstream
weiji14 Mar 17, 2021
1f06f9a
Clarify that `git rm -r --cached` only needs to run during migration
weiji14 Mar 17, 2021
e36fd28
Try installing dvc from conda again now that there is a Py3.9 package
weiji14 Mar 17, 2021
f34bb09
Merge branch 'master' into data_version_control
weiji14 Mar 17, 2021
f3aa3c5
Install dvc and do `dvc pull` on GMT dev tests too
weiji14 Mar 17, 2021
af79eef
Refactor test_logo tests to be simpler and more unit-test like
weiji14 Mar 17, 2021
5860a72
Mention dvc status command to see which files need staging
weiji14 Mar 17, 2021
c37bdff
Use images for logo created using GMT 6.1.1
weiji14 Mar 17, 2021
393773b
List only files under pygmt/tests/baseline
weiji14 Mar 18, 2021
14cabd7
Update test_image to use SI units and long aliases
weiji14 Mar 18, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .dvc/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
/config.local
/tmp
/cache
4 changes: 4 additions & 0 deletions .dvc/config
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[core]
remote = upstream
['remote "upstream"']
url = https://dagshub.com/GenericMappingTools/pygmt.dvc
3 changes: 3 additions & 0 deletions .dvcignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Add patterns of files dvc should ignore, which could improve
# the performance. Learn more at
# https://dvc.org/doc/user-guide/dvcignore
8 changes: 7 additions & 1 deletion .github/workflows/ci_tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ jobs:
- name: Install dependencies
run: |
conda install gmt=6.1.1 numpy pandas xarray netCDF4 packaging \
codecov coverage[toml] ipython make \
codecov coverage[toml] dvc ipython make \
pytest-cov pytest-mpl pytest>=6.0 \
sphinx-gallery

Expand All @@ -109,6 +109,12 @@ jobs:
touch ~/.gmt/server/gmt_data_server.txt ~/.gmt/server/gmt_hash_server.txt
ls -lhR ~/.gmt

# Pull baseline image data from dvc remote (DAGsHub)
- name: Pull baseline image data from dvc remote
run: |
dvc pull
ls -lhR pygmt/tests/baseline/

# Install the package that we want to test
- name: Install the package
run: |
Expand Down
15 changes: 11 additions & 4 deletions .github/workflows/ci_tests_dev.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -77,11 +77,12 @@ jobs:
channels: conda-forge
miniconda-version: "latest"

# Install build dependencies from conda-forge
- name: Install build dependencies
# Install dependencies from conda-forge
- name: Install dependencies
run: |
conda install ninja cmake libblas libcblas liblapack fftw gdal ghostscript \
libnetcdf hdf5 zlib curl pcre ipython pytest pytest-cov pytest-mpl
conda install ninja cmake libblas libcblas liblapack fftw gdal \
ghostscript libnetcdf hdf5 zlib curl pcre ipython \
dvc pytest pytest-cov pytest-mpl

# Build and install latest GMT from GitHub
- name: Install GMT ${{ matrix.gmt_git_ref }} branch (Linux/macOS)
Expand Down Expand Up @@ -113,6 +114,12 @@ jobs:
touch ~/.gmt/server/gmt_data_server.txt ~/.gmt/server/gmt_hash_server.txt
ls -lhR ~/.gmt

# Pull baseline image data from dvc remote (DAGsHub)
- name: Pull baseline image data from dvc remote
run: |
dvc pull
ls -lhR pygmt/tests/baseline/

# Install the package that we want to test
- name: Install the package
run: |
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -44,3 +44,6 @@ doc/tutorials/

# macOS
.DS_Store

# Data files (tracked using dvc)
pygmt/tests/baseline/test_*.png
70 changes: 69 additions & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -423,7 +423,75 @@ If it's correct, copy it (and only it) to `pygmt/tests/baseline`.
When you run `make test` the next time, your test should be executed and
passing.

Don't forget to commit the baseline image as well.
Don't forget to commit the baseline image as well!
The images should be pushed up into a remote repository using `dvc` (instead of
`git`) as will be explained in the next section.

#### Using data version control ([dvc](https://dvc.org)) to manage test images

As the baseline images are quite large blob files that can change often (e.g.
with new GMT versions), it is not ideal to store them in `git` (which is meant
for tracking plain text files). Instead, we will use [`dvc`](https://dvc.org)
which is like `git` but for data. What `dvc` does is to store the hash (md5sum)
of a file. For example, given an image file like `test_logo.png`, `dvc` will
generate a `test_logo.png.dvc` plain text file containing the hash of the
image. This `test_logo.png.dvc` file can be stored as usual on GitHub, while
the `test_logo.png` file can be stored separately on our `dvc` remote at
https://dagshub.com/GenericMappingTools/pygmt.

To **pull** or sync files from the `dvc` remote to your local repository, use
the commands below. Note how `dvc` commands are very similar to `git`.

dvc status # should report any files 'not_in_cache'
dvc pull # pull down files from DVC remote cache (fetch + checkout)

Once the sync/download is complete, you should notice two things. There will be
images stored in the `pygmt/tests/baseline` folder (e.g. `test_logo.png`) and
these images are technically reflinks/symlinks/copies of the files under the
`.dvc/cache` folder. You can now run the image comparison test suite as per
usual.

pytest pygmt/tests/test_logo.py # run only one test
make test # run the entire test suite

To **push** or sync changes from your local repository up to the `dvc` remote
at DAGsHub, you will first need to set up authentication using the commands
below. This only needs to be done once, i.e. the first time you contribute a
test image to the PyGMT project.

dvc remote modify upstream --local auth basic
dvc remote modify upstream --local user "$DAGSHUB_USER"
dvc remote modify upstream --local password "$DAGSHUB_PASS"

The configuration will be stored inside your `.dvc/config.local` file. Note
that the $DAGSHUB_PASS token can be generated at
https://dagshub.com/user/settings/tokens after creating a DAGsHub account
(can be linked to your GitHub account). Once you have an account set up, please
ask one of the PyGMT maintainers to add you as a collaborator at
https://dagshub.com/GenericMappingTools/pygmt/settings/collaboration before
proceeding with the next steps.

The entire workflow for generating or modifying baseline test images can be
summarized as follows:

# Sync with both git and dvc remotes
git pull
dvc pull

# Generate new baseline images
pytest --mpl-generate-path=baseline pygmt/tests/test_logo.py
mv baseline/*.png pygmt/tests/baseline/

# Generate hash for baseline image and stage the *.dvc file in git
git rm -r --cached 'pygmt/tests/baseline/test_logo.png' # only run if migrating existing image from git to dvc
dvc status # check which files need to be added to dvc
dvc add pygmt/tests/baseline/test_logo.png
git add pygmt/tests/baseline/test_logo.png.dvc

# Commit changes and push to both the git and dvc remotes
git commit -m "Add test_logo.png into DVC"
git push
dvc push

### Documentation

Expand Down
1 change: 1 addition & 0 deletions MAINTENANCE.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ If you want to make a contribution to the project, see the
## Onboarding Access Checklist

- [ ] Added to [python-maintainers](https://github.com/orgs/GenericMappingTools/teams/python-maintainers) team in the [GenericMappingTools](https://github.com/orgs/GenericMappingTools/teams/) organization on GitHub (gives 'maintain' permissions)
- [ ] Added as collaborator on [DAGsHub](https://dagshub.com/GenericMappingTools/pygmt/settings/collaboration) (gives 'write' permission to dvc remote storage)
- [ ] Added as moderator on [GMT forum](https://forum.generic-mapping-tools.org) (to see mod-only discussions)
- [ ] Added as member on the [PyGMT devs Slack channel](https://pygmtdevs.slack.com) (for casual conversations)
- [ ] Added as maintainer on [PyPI](https://pypi.org/project/pygmt/) and [Test PyPI](https://test.pypi.org/project/pygmt) [optional]
Expand Down
1 change: 1 addition & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ dependencies:
- codecov
- coverage[toml]
- docformatter
- dvc
- flake8
- ipython
- isort>=5
Expand Down
Binary file removed pygmt/tests/baseline/test_image.png
Binary file not shown.
4 changes: 4 additions & 0 deletions pygmt/tests/baseline/test_image.png.dvc
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
outs:
- md5: de86468aa453b14912c8362c67e51064
size: 10403
path: test_image.png
Binary file removed pygmt/tests/baseline/test_logo.png
Binary file not shown.
4 changes: 4 additions & 0 deletions pygmt/tests/baseline/test_logo.png.dvc
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
outs:
- md5: 905d5b9f0f8d8b809899dfe9e87d0e91
size: 33347
path: test_logo.png
Binary file removed pygmt/tests/baseline/test_logo_on_a_map.png
Binary file not shown.
4 changes: 4 additions & 0 deletions pygmt/tests/baseline/test_logo_on_a_map.png.dvc
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
outs:
- md5: 409119aeeec2680d106e32527009c255
size: 77366
path: test_logo_on_a_map.png
2 changes: 1 addition & 1 deletion pygmt/tests/test_image.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,5 @@ def test_image():
Place images on map.
"""
fig = Figure()
fig.image(TEST_IMG, D="x0/0+w1i", F="+pthin,blue")
fig.image(TEST_IMG, position="x0/0+w2c", box="+pthin,blue")
return fig
32 changes: 12 additions & 20 deletions pygmt/tests/test_logo.py
Original file line number Diff line number Diff line change
@@ -1,34 +1,26 @@
"""
Tests for fig.logo.
"""
import pytest
from pygmt import Figure
from pygmt.helpers.testing import check_figures_equal


@check_figures_equal()
@pytest.mark.mpl_image_compare
def test_logo():
"""
Plot a GMT logo of a 2 inch width as a stand-alone plot.
Plot the GMT logo as a stand-alone plot.
"""
fig_ref, fig_test = Figure(), Figure()
# Use single-character arguments for the reference image
fig_ref.logo(D="x0/0+w2i")
fig_test.logo(position="x0/0+w2i")
return fig_ref, fig_test
fig = Figure()
fig.logo()
return fig


@check_figures_equal()
@pytest.mark.mpl_image_compare
def test_logo_on_a_map():
"""
Plot a GMT logo in the upper right corner of a map.
Plot the GMT logo at the upper right corner of a map.
"""
fig_ref, fig_test = Figure(), Figure()
# Use single-character arguments for the reference image
fig_ref.coast(R="-90/-70/0/20", J="M6i", G="chocolate", B="")
fig_ref.logo(D="jTR+o0.1i/0.1i+w3i", F="")

fig_test.coast(
region=[-90, -70, 0, 20], projection="M6i", land="chocolate", frame=True
)
fig_test.logo(position="jTR+o0.1i/0.1i+w3i", box=True)
return fig_ref, fig_test
fig = Figure()
fig.basemap(region=[-90, -70, 0, 20], projection="M15c", frame=True)
fig.logo(position="jTR+o0.25c/0.25c+w7.5c", box=True)
return fig