Skip to content

Commit

Permalink
Major update to intake v2
Browse files Browse the repository at this point in the history
including docs and tests
  • Loading branch information
kthyng committed Jul 19, 2024
1 parent ec634fb commit 6b511bd
Show file tree
Hide file tree
Showing 18 changed files with 579 additions and 657 deletions.
8 changes: 4 additions & 4 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,10 @@ build:
# uncomment to build from this exact version of package
# the downside is the version listed in the docs will be a dev version
# if uncommenting this, comment out installing pypi version of package in docs/env file
# python:
# install:
# - method: pip
# path: ./
python:
install:
- method: pip
path: ./

conda:
environment: docs/environment.yml
Expand Down
142 changes: 44 additions & 98 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,15 +24,13 @@ For changes prior to 2022-10-19, all contributions are Copyright James Munroe, s



Intake is a lightweight set of tools for loading and sharing data in data
science projects. Intake ERDDAP provides a set of integrations for ERDDAP.
Intake is a lightweight set of tools for loading and sharing data in data science projects. Intake ERDDAP provides a set of integrations for ERDDAP.

- Quickly identify all datasets from an ERDDAP service in a geographic region,
or containing certain variables.
- Quickly identify all datasets from an ERDDAP service in a geographic region, or containing certain variables.
- Produce a pandas DataFrame for a given dataset or query.
- Get an xarray Dataset for the Gridded datasets.

The Key features are:
The key features are:

- Pandas DataFrames for any TableDAP dataset.
- xarray Datasets for any GridDAP datasets.
Expand All @@ -59,7 +57,7 @@ project is available on PyPI, so it can be installed using `pip`
The following are prerequisites for a developer environment for this project:

- [conda](https://docs.conda.io/en/latest/miniconda.html)
- (optional but highly recommended) [mamba](https://mamba.readthedocs.io/en/latest/) Hint: `conda install -c conda-forge mamba`
- (optional but highly recommended) [mamba](https://mamba.readthedocs.io/en/latest/). Hint: `conda install -c conda-forge mamba`

Note: if `mamba` isn't installed, replace all instances of `mamba` in the following instructions with `conda`.

Expand All @@ -83,126 +81,74 @@ Note: if `mamba` isn't installed, replace all instances of `mamba` in the follow
pip install -e .
```

Note that you need to install with `pip install .` once to get the `entry_points` correct too.

## Examples

To create an intake catalog for all of the ERDDAP's TableDAP offerings use:
To create an `intake` catalog for all of the ERDDAP's TableDAP offerings use:

```python
import intake
catalog = intake.open_erddap_cat(
import intake_erddap
catalog = intake_erddap.ERDDAPCatalogReader(
server="https://erddap.sensors.ioos.us/erddap"
)
).read()
```


The catalog objects behave like a dictionary with the keys representing the
dataset's unique identifier within ERDDAP, and the values being the
`TableDAPSource` objects. To access a source object:
The catalog objects behave like a dictionary with the keys representing the dataset's unique identifier within ERDDAP, and the values being the `TableDAPReader` objects. To access a Reader object (for a single dataset, in this case for dataset_id "aoos_204"):

```python
source = catalog["datasetid"]
dataset = catalog["aoos_204"]
```

From the source object, a pandas DataFrame can be retrieved:
From the reader object, a pandas DataFrame can be retrieved:

```python
df = source.read()
df = dataset.read()
```

Find other dataset_ids available with

```python
list(catalog)
```

Consider a case where you need to find all wind data near Florida:

```python
import intake
import intake_erddap
from datetime import datetime
bbox = (-87.84, 24.05, -77.11, 31.27)
catalog = intake.open_erddap_cat(
catalog = intake_erddap.ERDDAPCatalogReader(
server="https://erddap.sensors.ioos.us/erddap",
bbox=bbox,
intersection="union",
start_time=datetime(2022, 1, 1),
end_time=datetime(2023, 1, 1),
standard_names=["wind_speed", "wind_from_direction"],
)
variables=["wind_speed", "wind_from_direction"],
).read()

df = next(catalog.values()).read()
dataset_id = list(catalog)[0]
print(dataset_id)
df = catalog[dataset_id].read()
```

Using the `standard_names` input with `intersection="union"` searches for datasets that have both "wind_speed" and "wind_from_direction". Using the `variables` input subsequently narrows the dataset to only those columns, plus "time", "latitude", "longitude", and "z".

<table class="align-default">
<thead>
<tr style="text-align: right;">
<th></th>
<th>time (UTC)</th>
<th>wind_speed (m.s-1)</th>
<th>wind_from_direction (degrees)</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>2022-12-14T19:40:00Z</td>
<td>7.0</td>
<td>140.0</td>
</tr>
<tr>
<th>1</th>
<td>2022-12-14T19:20:00Z</td>
<td>7.0</td>
<td>120.0</td>
</tr>
<tr>
<th>2</th>
<td>2022-12-14T19:10:00Z</td>
<td>NaN</td>
<td>NaN</td>
</tr>
<tr>
<th>3</th>
<td>2022-12-14T19:00:00Z</td>
<td>9.0</td>
<td>130.0</td>
</tr>
<tr>
<th>4</th>
<td>2022-12-14T18:50:00Z</td>
<td>9.0</td>
<td>130.0</td>
</tr>
<tr>
<th>...</th>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<th>48296</th>
<td>2022-01-01T00:40:00Z</td>
<td>4.0</td>
<td>120.0</td>
</tr>
<tr>
<th>48297</th>
<td>2022-01-01T00:30:00Z</td>
<td>3.0</td>
<td>130.0</td>
</tr>
<tr>
<th>48298</th>
<td>2022-01-01T00:20:00Z</td>
<td>4.0</td>
<td>120.0</td>
</tr>
<tr>
<th>48299</th>
<td>2022-01-01T00:10:00Z</td>
<td>4.0</td>
<td>130.0</td>
</tr>
<tr>
<th>48300</th>
<td>2022-01-01T00:00:00Z</td>
<td>4.0</td>
<td>130.0</td>
</tr>
</tbody>
</table>
```python
time (UTC) latitude (degrees_north) ... wind_speed (m.s-1) wind_from_direction (degrees)
0 2022-01-01T00:00:00Z 28.508 ... 3.6 126.0
1 2022-01-01T00:10:00Z 28.508 ... 3.8 126.0
2 2022-01-01T00:20:00Z 28.508 ... 3.6 124.0
3 2022-01-01T00:30:00Z 28.508 ... 3.4 125.0
4 2022-01-01T00:40:00Z 28.508 ... 3.5 124.0
... ... ... ... ... ...
52524 2022-12-31T23:20:00Z 28.508 ... 5.9 176.0
52525 2022-12-31T23:30:00Z 28.508 ... 6.8 177.0
52526 2022-12-31T23:40:00Z 28.508 ... 7.2 175.0
52527 2022-12-31T23:50:00Z 28.508 ... 7.4 169.0
52528 2023-01-01T00:00:00Z 28.508 ... 8.1 171.0

[52529 rows x 6 columns]
```
6 changes: 3 additions & 3 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,11 @@
------------------------


.. autoclass:: intake_erddap.erddap.ERDDAPSource
.. autoclass:: intake_erddap.erddap.ERDDAPReader
:members: get_client

.. autoclass:: intake_erddap.erddap.TableDAPSource
.. autoclass:: intake_erddap.erddap.TableDAPReader
:members: read, read_partition, read_chunked

.. autoclass:: intake_erddap.erddap.GridDAPSource
.. autoclass:: intake_erddap.erddap.GridDAPReader
:members: read_partition, read_chunked, to_dask, close
15 changes: 10 additions & 5 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,17 +26,17 @@
# -- Project information -----------------------------------------------------

project = "intake-erddap"
copyright = "Copyright 2022 Axiom Data Science, LLC"
copyright = "Copyright 2022-2024 Axiom Data Science, LLC"
author = "Axiom Data Science, LLC"

# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
# see https://pypi.org/project/setuptools-scm/ for details
from pkg_resources import get_distribution
from importlib.metadata import version as imversion


release = get_distribution("intake_erddap").version
release = imversion("intake_erddap")
# for example take major/minor
version = ".".join(release.split(".")[:2])

Expand Down Expand Up @@ -71,6 +71,11 @@

nb_execution_timeout = 120


# https://myst-nb.readthedocs.io/en/v0.9.0/use/execute.html
# jupyter_execute_notebooks = "off"
nb_execution_mode = "force"

# Add any paths that contain templates here, relative to this directory.
templates_path = ["_templates"]

Expand All @@ -85,10 +90,10 @@
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
#html_theme = "furo"
html_theme = "furo"

# furo variables
html_title = "intake-axds documentation"
html_title = "intake-erddap documentation"

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
Expand Down
13 changes: 7 additions & 6 deletions docs/environment.yml
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
name: docs
name: intake-erddap-docs
channels:
- conda-forge
- nodefaults
dependencies:
- python=3.9
- python=3.11
# If your docs code examples depend on other packages add them here
- numpy
- dask
- pandas
- erddapy
- panel
- intake
# - intake
- intake-xarray>=0.6.1
- cf_pandas
# These are needed for the docs themselves
Expand All @@ -29,10 +29,11 @@ dependencies:
- pip
- recommonmark
- pip:
- furo
- git+https://github.com/intake/intake
- intake-parquet
- intake-xarray
- intake-erddap
# - intake-parquet
# - intake-xarray
# - intake-erddap
# - "dask[complete]"
- docrep<=0.2.7
- furo
Expand Down
Loading

0 comments on commit 6b511bd

Please sign in to comment.