slide2vec is a Python package for efficient encoding of whole-slide images using publicly available foundation models. It builds on hs2p for fast preprocessing and exposes a focused surface around Model, Pipeline, and ExecutionOptions.
pip install slide2vecInstall the full model runtime only when you need embedding/model execution:
pip install "slide2vec[models]"slide2vec now keeps the base install focused on the core package surface and moves the heavier model stack into the optional models extra.
from slide2vec import Model, PreprocessingConfig
model = Model.from_pretrained("virchow2", level="tile")
preprocessing = PreprocessingConfig(
target_spacing_um=0.5,
target_tile_size_px=224,
tissue_threshold=0.1,
)
embedded = model.embed_slide(
"/path/to/slide.svs",
preprocessing=preprocessing,
)
tile_embeddings = embedded.tile_embeddings
coordinates = embedded.coordinatesBy default, ExecutionOptions() uses all available GPUs. Set ExecutionOptions(num_gpus=4) when you want to cap the sharding explicitly.
Use Pipeline(...) for manifest-driven batch processing when you want artifacts written to disk instead of only in-memory outputs:
from slide2vec import ExecutionOptions, Pipeline
pipeline = Pipeline(
model=model,
preprocessing=preprocessing,
execution=ExecutionOptions(output_dir="outputs/demo"),
)
result = pipeline.run(manifest_path="/path/to/slides.csv")Manifest-driven runs use the schema below. mask_path and spacing_at_level_0 are optional.
sample_id,image_path,mask_path,spacing_at_level_0
slide-1,/path/to/slide-1.svs,/path/to/mask-1.png,0.25
slide-2,/path/to/slide-2.svs,,
...Use spacing_at_level_0 when the slide file reports a missing or incorrect level-0 spacing and you want to override it.
The package writes explicit artifact directories:
tile_embeddings/<sample_id>.ptor.npztile_embeddings/<sample_id>.meta.jsonslide_embeddings/<sample_id>.ptor.npzslide_embeddings/<sample_id>.meta.json- optional
slide_latents/<sample_id>.ptor.npz
.pt remains the default format. .npz is available through ExecutionOptions(output_format="npz").
slide2vec currently ships preset configs for 10 tile-level models and 3 slide-level models.
For the full catalog and preset names, see docs/models.md.
The CLI is a thin wrapper over the package API.
Bundled configs live under slide2vec/configs/preprocessing/ and slide2vec/configs/models/.
python -m slide2vec --config-file /path/to/config.yamlBy default, manifest-driven CLI runs use all available GPUs. Set speed.num_gpus=4 when you want to cap the sharding explicitly.
New to the CLI or doing batch runs to disk? Start with docs/cli.md for the config-driven workflow, overrides, and common run patterns.
Docker remains available when you prefer a containerized runtime:
docker pull waticlems/slide2vec:latest
docker run --rm -it \
-v /path/to/your/data:/data \
-e HF_TOKEN=<your-huggingface-api-token> \
waticlems/slide2vec:latestdocs/cli.mdfor the config-driven CLI guidedocs/python-api.mdfor the detailed API referencedocs/models.mdfor the full supported-model catalog