ISMIP 6 Virtualization Pipeline

This repository is used to build a "virtualized" cloud-ready version of the Ice-Sheet Model Intercomparison for CMIP 6 (ISMIP 6) model outputs. To learn more about ISMIP, visit https://www.ismip.org/

The base of the repository is virtualize_ismip6, which contains the processing pipeline to take the original unmodified NetCDF files.

TL;DR: Browse the data at englacial.org/static/models/, or read more at englacial.org/models.html.

If you're interested in using this ISMIP 6 dataset, there are a couple of options:

Check out the online ISMIP 6 viewer here: https://englacial.org/static/models/ (source repository here: https://github.com/englacial/ismip-viewer)
Try accesing the data through Xarray or your favorite Zarr library. To get started, check out the notebooks/ folder for some starting examples.

Accesing the virtualized ISMIP 6 dataset with Xarray

There's some boilerplate code to open the dataset. Don't worry - you can just copy and paste this part.

SOURCE_BUCKET = "s3://us-west-2.opendata.source.coop/englacial/ismip6"

storage = icechunk.s3_storage(
    bucket="us-west-2.opendata.source.coop",
    prefix="englacial/ismip6/icechunk-ais",
    region="us-west-2",
    anonymous=True,
)

config = icechunk.RepositoryConfig.default()
config.set_virtual_chunk_container(
    icechunk.VirtualChunkContainer(
        SOURCE_BUCKET + "/",
        store=icechunk.s3_store(region="us-west-2", anonymous=True),
    )
)
credentials = icechunk.containers_credentials({SOURCE_BUCKET + "/": None})

repo = icechunk.Repository.open(
    storage=storage,
    config=config,
    authorize_virtual_chunk_access=credentials,
)
session = repo.readonly_session(branch="main")

Once you've got it open, you can load data like this:

ds = xr.open_zarr(session.store, group="combined/JPL1_ISSM/exp05", consolidated=False)

Keep reading for more details, or check out the notebooks to get started.

More details

This repository contains tools for indexing, ingesting, and serving ISMIP6 Antarctic ice sheet model outputs. We are not associated with ISMIP. These are tools for publicly-available data that we hope are interesting and useful to the scientific community.

There are three main components:

Ingest pipeline (virtualize_ismip6/) -- Virtualizes NetCDF source files into an Icechunk store using VirtualiZarr and Lithops serverless functions on AWS Lambda.
Python library (ismip6_helper/) -- Handles file indexing, grid correction, time encoding normalization, and ignore-value detection. You shouldn't need to use this! If you use our virtualized dataset, all the fixes are already encoded!
Static indexing site (ismip_data_index_website/) -- Catalogs available outputs at docs.englacial.org/ismip-indexing/.

The interactive web viewer lives in a separate repository: englacial/ismip-viewer.

Data

Source files

A copy of the ISMIP6 outputs (originally available through Globus) is hosted on source.coop:

s3://us-west-2.opendata.source.coop/englacial/ismip6/

Public, anonymous read access. No authentication required. For citation guidance, see the ISMIP wiki.

Icechunk store

The ingest pipeline writes a virtualized Icechunk store to:

s3://us-west-2.opendata.source.coop/englacial/ismip6/icechunk-ais/

This store contains virtual references to chunks in the source NetCDF files -- no data is duplicated. It is organized into three top-level groups:

combined/ -- All variables merged per model+experiment, with time binned to annual resolution
state/ -- State variables only (e.g. lithk, orog, base), native time resolution
flux/ -- Flux variables only (e.g. acabf, dlithkdt), native time resolution

See virtualize_ismip6/ICECHUNK_STORE.md for details on the store structure and how the pipeline works.

Data overview

10,034 files (~1.1 TB total)
17 models from 14 institutions
94 experiments
37 variables
All Antarctic ice sheet (AIS) data

Developers

Setup

# Install dependencies
uv sync --extra dev

# Run unit tests
uv run --extra dev pytest ismip6_helper/tests/ virtualize_ismip6/tests/ -v -m "not integration"

# Run integration tests (builds local stores and compares against remote — takes ~20 min)
uv run --extra dev pytest virtualize_ismip6/tests/test_build_and_compare.py -v -m integration

Running the ingest pipeline

The pipeline virtualizes source NetCDF files and writes them to the Icechunk store using Lithops on AWS Lambda:

# Build all three store types (combined, state, flux) on AWS Lambda
python virtualize_ismip6/virtualize_with_lithops_combine_variables.py \
    --config virtualize_ismip6/lithops_aws.yaml \
    --write-creds sc_creds.json

# Or build a specific store type
python virtualize_ismip6/virtualize_with_lithops_combine_variables.py \
    --config virtualize_ismip6/lithops_aws.yaml \
    --write-creds sc_creds.json \
    --store-type flux

# Local execution (useful for testing)
python virtualize_ismip6/virtualize_with_lithops_combine_variables.py \
    --config virtualize_ismip6/lithops_local.yaml \
    --local-storage --local-execution \
    --test-model PISM1 --test-experiment ctrl_proj_std

Note: the sc_creds.json input file is copied from source.coop as the temporary credentials for write access.

See python virtualize_ismip6/virtualize_with_lithops_combine_variables.py --help for all options, and virtualize_ismip6/lithops_aws.md for AWS infrastructure setup.

Python API

You probably shouldn't be using this directly! This library fixes small inconsistencies in the metadata of the original NetCDF files. The fixes are already encoded in the virtualized dataset.

The ismip6_helper library provides utilities for working with ISMIP6 data:

from ismip6_helper import get_file_index

# Get file index (cached locally)
df = get_file_index()

# Force rebuild from source bucket
df = get_file_index(force_rebuild=True)

Key modules:

index -- File indexing and path parsing
grid_utils -- Grid coordinate correction
time_utils -- Time encoding normalization
merge_virtual -- Union time axis computation and manifest padding
variable_classification -- State/flux variable classification
ignore_value -- Sentinel value detection and annotation

Name		Name	Last commit message	Last commit date
Latest commit History 149 Commits
.github		.github
binder		binder
ismip6_helper		ismip6_helper
ismip_data_index_website		ismip_data_index_website
ismip_metadata		ismip_metadata
notebooks		notebooks
virtualize_ismip6		virtualize_ismip6
.dockerignore		.dockerignore
.gitignore		.gitignore
.sops.yaml		.sops.yaml
Dockerfile.lithops		Dockerfile.lithops
LICENSE		LICENSE
README.md		README.md
icechunk_virtualization_flow.png		icechunk_virtualization_flow.png
plot_file_sizes.py		plot_file_sizes.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ISMIP 6 Virtualization Pipeline

Accesing the virtualized ISMIP 6 dataset with Xarray

More details

Data

Source files

Icechunk store

Data overview

Developers

Setup

Running the ingest pipeline

Python API

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ISMIP 6 Virtualization Pipeline

Accesing the virtualized ISMIP 6 dataset with Xarray

More details

Data

Source files

Icechunk store

Data overview

Developers

Setup

Running the ingest pipeline

Python API

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages