Pre-processing, segmentation, and comparative analysis of paired MERSCOPE and Xenium spatial transcriptomics datasets.
MerXen takes paired spatial transcriptomics datasets (one MERSCOPE, one Xenium per tissue section pair) and runs a standardised pipeline:
- SpatialData build — Builds platform-specific SpatialData zarrs from raw MERSCOPE and Xenium output folders
- Cell segmentation — Cellpose-SAM image-based segmentation followed by ProSeg transcript-based refinement
- Section alignment — Optionally registers paired adjacent sections to a Xenium reference coordinate system with Spateo
- Comparative analysis — QC metrics, gene-level comparison, visualisation, first-pass Scanpy/Squidpy clustering, and optional local MapMyCells cell type assignment across platforms
The workflow is orchestrated by Nextflow to process multiple sample pairs with logging and reproducibility.
Full documentation lives in docs/. Start with docs/index.md.
- Usage: Getting started · Samplesheet format · Running the pipeline · Configuration · Outputs
- Pipeline stages: SpatialData build · Segmentation · Enrichment · QC · Alignment · Comparison · Visualization · Squidpy clustering · MapMyCells
- Developer reference: Pipeline architecture · Python API · CLI reference · Development workflow
MerXen/
├── workflows/ # Nextflow pipeline
│ ├── main.nf # DSL2 entry point
│ ├── nextflow.config # Parameters, executor, per-process resources
│ ├── samplesheet.example.csv # Template samplesheet
│ └── modules/ # One .nf module per pipeline stage
├── src/merxen/ # Installable Python package
│ ├── config.py # Pydantic configs (pipeline contract)
│ ├── cli/ # Click entry points (one per stage)
│ ├── io/ # Samplesheet, SpatialData builders, image/transcript I/O
│ ├── segmentation/ # Cellpose tiling + ProSeg subprocess
│ ├── enrichment/ # Shape layers + per-shape gene tables
│ ├── qc/ # Per-dataset and cross-platform metrics
│ ├── visualization/ # Plotting
│ ├── analysis/ # Scanpy/Squidpy downstream analyses
│ └── alignment/ # Optional Spateo cross-section registration
├── tests/ # pytest suite, mirrors src/merxen/
├── docs/ # Project documentation (start at docs/index.md)
├── notebooks/ # Exploratory notebooks only
├── pyproject.toml # Dependencies, merxen entry point, tool config
├── environment.yml # Conda env (Python 3.12 + pip)
├── requirements.lock # Pinned dependency tree
├── .env.example # Required environment variables template
├── Agents.md # Project standards (must-read for contributors)
└── CLAUDE.md # Short overview + pointer to Agents.md
# Create conda environment
conda env create -f environment.yml
conda activate merxen
# Optional: enable Spateo-based section alignment
pip install spateo-release==1.1.1
pip install "anndata>=0.12.10"
# Install pre-commit hooks
pre-commit install
pre-commit install --hook-type pre-pushCopy .env.example to .env and fill in values:
cp .env.example .envSee .env.example for required variables (ProSeg binary path, output root, etc.) and docs/configuration.md for the full environment + Nextflow parameter reference.
# Run via Nextflow with a samplesheet
nextflow run workflows/main.nf --samplesheet samples.csv --outdir ./results --proseg_binary /path/to/prosegA template samplesheet is provided at workflows/samplesheet.example.csv. Copy and edit this file with your dataset-specific paths before running the workflow.
cp workflows/samplesheet.example.csv workflows/samplesheet.csvThe samplesheet points at raw platform folders with optional reusable SpatialData cache paths (merscope_dir, merscope_spatialdata_path, xenium_dir, xenium_spatialdata_path, plus per-platform channel, z-range, and voxel-layer settings). The full schema, validation rules, and worked examples are documented in docs/samplesheet.md. For Nextflow invocation options — resuming, stage-range runs, force rebuild, parameter overrides, cluster execution — see docs/running-the-pipeline.md.
# All tests (excluding slow)
pytest
# Including integration tests
pytest -m "not slow"
# Full suite
pytest --run-slow# Lint and format
ruff check . --fix
ruff format .
# Type check
mypy src/
# Regenerate lockfile after changing dependencies
uv pip compile pyproject.toml --extra dev -o requirements.lockProject standards (layout, dependencies, naming, type hints, docstrings, git workflow, commit message prefixes) are defined in Agents.md. Day-to-day development mechanics — testing, pre-commit hooks, CI, debugging, adding a new pipeline stage — are in docs/development.md.