hs2p is a Python package for fast, scalable whole-slide tiling. You can request tiles at any spacing, whether or not that spacing is natively present in the image pyramid. It is designed for computational pathology workflows that need reproducible coordinates.
We support two main workflows:
- a Python API for library-style integration
- a CLI for batch preprocessing
Try hs2p interactively: hs2p-demo on HuggingFace Spaces
You can adjust tiling parameters (spacing, tile size, tissue threshold, overlap) and instantly see a tiling preview and tissue mask overlay.
You can also upload your own pyramidal WSI (up to 1 GB).
pip install hs2pTiling computes a reproducible grid of tile coordinates for each slide and saves them as named artifacts with extraction metadata, ready for downstream use.
When a precomputed tissue mask is not provided, hs2p segments tissue on-the-fly. If you want to precompute tissue masks, a standalone script is available.
Sampling filters or partitions tile coordinates by annotation coverage so you can keep only tiles relevant to a tissue class or label.
hs2p supports pre-extracted tissue masks. If you don't have such tissue masks, you can either:
- use our standalone tissue segmentation script (Recommended)
- tune the SegmentationConfig parameters and let
hs2psegments tissue on the fly
Minimal tiling example:
from pathlib import Path
from hs2p import (
SlideSpec,
TilingConfig,
overlay_mask_on_slide,
save_tiling_result,
tile_slide,
write_tiling_preview,
)
result = tile_slide(
SlideSpec(
sample_id="slide-1",
image_path=Path("/data/wsi/slide-1.tif"),
mask_path=Path("/data/mask/slide-1.tif"),
),
tiling=TilingConfig(
backend="openslide",
target_spacing_um=0.5,
target_tile_size_px=224,
tolerance=0.07,
overlap=0.0,
tissue_threshold=0.1,
),
)
artifacts = save_tiling_result(result, output_dir=Path("output"))
print(artifacts.tiles_npz_path) # output/coordinates/slide-1.tiles.npz ; more info in docs/artifacts.md
print(artifacts.tiles_meta_path) # output/coordinates/slide-1.tiles.meta.json ; more info in docs/artifacts.md
tiling_preview_path = write_tiling_preview(
result=result,
output_dir=Path("output"),
downsample=32,
)
print(tiling_preview_path) # output/preview/tiling/slide-1.jpg ; low resolution preview of tiling result, good for QC
mask_overlay = overlay_mask_on_slide(
wsi_path=result.image_path,
annotation_mask_path=Path("/data/mask/slide-1.tif"),
downsample=32,
backend=result.backend,
)
mask_overlay.save("output/preview/mask/slide-1.jpg")result is a TilingResult for one slide. It gives downstream pipelines the tile coordinates plus the metadata needed to relate those coordinates back to the slide pyramid and persist them as reusable named artifacts.
More API details: docs/api.md
The CLI is intended for fast batch processing of multiple slides with the same config. Both CLI entrypoints expect the same input csv schema:
sample_id,image_path,mask_path
slide-1,/data/wsi/slide-1.tif,/data/mask/slide-1.tif
slide-2,/data/wsi/slide-2.tif,For a first run, start from hs2p/configs/default.yaml and edit only the essentials:
csvoutput_dirtiling.backendtiling.params.target_spacing_umtiling.params.target_tile_size_px
Run tiling:
python -m hs2p.tiling --config-file /path/to/config.yamlRun sampling:
python -m hs2p.sampling --config-file /path/to/config.yamlFor sampling, add tiling.sampling_params.pixel_mapping and tiling.sampling_params.tissue_percentage for the annotations you want to keep.
When stdout is an interactive terminal, both CLI entrypoints show live rich progress with:
- slide-level batch progress
- elapsed and remaining time
- live tile counts for tiling discovery or sampling retention
- final summary panels with output and
process_list.csvlocations
When stdout is redirected or otherwise non-interactive, hs2p falls back to concise plain-text stage updates.
If a run fails, check output_dir/logs/log.txt for the full log stream.
More CLI details: docs/cli.md
hs2p writes explicit named artifacts rather than anonymous coordinate dumps.
- Tiling writes
coordinates/{sample_id}.tiles.npzandcoordinates/{sample_id}.tiles.meta.json - Sampling writes the same pair under
coordinates/<annotation>/ - Batch runs also write
process_list.csv - Saved coordinate arrays use a deterministic column-major order: numeric
xfirst, then numericywithin each sharedx
Artifact field reference: docs/artifacts.md
If you prefer running hs2p in a container, a published Docker image is available:
docker pull waticlems/hs2p:latest
docker run --rm -it -v /path/to/your/data:/data waticlems/hs2p:latest
