Skip to content

Commit

Permalink
Merge pull request #257 from Dana-Farber-AIOS/v2.0.0
Browse files Browse the repository at this point in the history
v2.0.0
  • Loading branch information
jacob-rosenthal authored Dec 19, 2021
2 parents 5f30ae1 + a4fddcd commit dfb3d11
Show file tree
Hide file tree
Showing 35 changed files with 599 additions and 665 deletions.
25 changes: 18 additions & 7 deletions CONTRIBUTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ There are many ways to contribute to PathML, including:
* Writing documentation
* Fixing bugs
* Writing code for new features
* Sharing workflows [coming soon]
* Sharing trained model parameters [coming soon]
* Sharing ``PathML`` with colleagues, students, etc.

Expand All @@ -21,10 +20,12 @@ Report bugs or errors by filing an issue on GitHub. Make sure to include the fol

* Short description of the bug
* Minimum working example to reproduce the bug
* Expected result
* Actual result
* Expected result vs. actual result
* Any other useful information

If a bug cannot be reproduced by someone else on a different machine, it will usually be hard to identify
what is causing it.

Requesting a new feature
=========================
Request a new feature by filing an issue on GitHub. Make sure to include the following information:
Expand All @@ -48,17 +49,27 @@ Setting up a local development environment
Running tests
-------------

To run the full testing suite:

.. code-block::
python -m pytest
Some tests are known to be very slow. To skip them, run instead:

.. code-block::
python -m pytest -m "not slow"
Building documentation locally
------------------------------

.. code-block::
cd docs # enter docs directory
make html # build docs in html format
cd docs # enter docs directory
pip install -r readthedocs-requirements # install packages to build docs
make html # build docs in html format
Then use your favorite web browser to open ``pathml/docs/build/html/index.html``

Expand All @@ -77,10 +88,10 @@ How to contribute code, documentation, etc.

1. Create a new GitHub issue for what you will be working on, if one does not already exist
2. Create a local development environment (see above)
3. Implement your changes
3. Create a new branch from the dev branch and implement your changes
4. Write new tests as needed to maintain code coverage
5. Ensure that all tests pass
6. Commit your changes and submit a pull request referencing the corresponding issue
6. Push your changes and open a pull request on GitHub referencing the corresponding issue
7. Respond to discussion/feedback about the pull request, make changes as necessary

Versioning
Expand Down
30 changes: 22 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,23 @@

<img src=https://raw.githubusercontent.com/Dana-Farber-AIOS/pathml/master/docs/source/_static/images/overview.png width="750">

![tests](https://github.com/Dana-Farber-AIOS/pathml/actions/workflows/tests-conda.yml/badge.svg?branch=dev)
[![Documentation Status](https://readthedocs.org/projects/pathml/badge/?version=latest)](https://pathml.readthedocs.io/en/latest/?badge=latest)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![PyPI version](https://img.shields.io/pypi/v/pathml)](https://pypi.org/project/pathml/)
[![Downloads](https://pepy.tech/badge/pathml)](https://pepy.tech/project/pathml)
[![codecov](https://codecov.io/gh/Dana-Farber-AIOS/pathml/branch/master/graph/badge.svg?token=UHSQPTM28Y)](https://codecov.io/gh/Dana-Farber-AIOS/pathml)

| Branch | Test status |
| ------ | ------------- |
| master | ![tests](https://github.com/Dana-Farber-AIOS/pathml/actions/workflows/tests-conda.yml/badge.svg?branch=master) |
| dev | ![tests](https://github.com/Dana-Farber-AIOS/pathml/actions/workflows/tests-conda.yml/badge.svg?branch=dev) |

A toolkit for computational pathology and machine learning.

**View [documentation](https://pathml.readthedocs.io/en/latest/)**

**Please cite [our paper](https://www.biorxiv.org/content/10.1101/2021.10.21.465212)**
:construction: the `dev` branch is under active development, with experimental features, bug fixes, and refactors that may happen at any time!
Stable versions are available as tagged commits on the `master` branch, or as versioned releases on PyPI

# Installation

Expand Down Expand Up @@ -77,7 +82,7 @@ conda activate pathml

Optionally install CUDA (instructions [here](#CUDA))

Install PathML:
Install `PathML`:
````
pip install -e .
````
Expand Down Expand Up @@ -105,12 +110,12 @@ python -c "import torch; print(torch.cuda.is_available())"

# Using with Jupyter

Jupyter notebooks are a convenient way to work interactively. To use PathML in Jupyter notebooks:
Jupyter notebooks are a convenient way to work interactively. To use `PathML` in Jupyter notebooks:

## Set JAVA_HOME environment variable

PathML relies on Java to enable support for reading a wide range of file formats.
Before using PathML in Jupyter, you may need to manually set the `JAVA_HOME` environment variable
Before using `PathML` in Jupyter, you may need to manually set the `JAVA_HOME` environment variable
specifying the path to Java. To do so:

1. Get the path to Java by running `echo $JAVA_HOME` in the terminal in your pathml conda environment (outside of Jupyter)
Expand All @@ -120,20 +125,20 @@ specifying the path to Java. To do so:
os.environ["JAVA_HOME"] = "/opt/conda/envs/pathml" # change path as needed
````
## Register PathML as an IPython kernel
## Register environment as an IPython kernel
````
conda activate pathml
conda install ipykernel
python -m ipykernel install --user --name=pathml
````
This makes PathML available as a kernel in jupyter lab or notebook.
This makes the pathml environment available as a kernel in jupyter lab or notebook.
# Contributing
``PathML`` is an open source project. Consider contributing to benefit the entire community!
There are many ways to contribute to PathML, including:
There are many ways to contribute to `PathML`, including:
* Submitting bug reports
* Submitting feature requests
Expand All @@ -146,6 +151,15 @@ There are many ways to contribute to PathML, including:
See [contributing](https://github.com/Dana-Farber-AIOS/pathml/blob/master/CONTRIBUTING.rst) for more details.
# Citing
If you use `PathML` in your work, please cite our paper:
Rosenthal J, Carelli R, Omar M, Brundage D, Halbert E, Nyman J, Hari SN, Van Allen EM, Marchionni L, Umeton R, Loda M.
Building tools for machine learning and artificial intelligence in cancer research: best practices and a case study
with the PathML toolkit for computational pathology. *Molecular Cancer Research*, 2021.
DOI: [10.1158/1541-7786.MCR-21-0665](https://doi.org/10.1158/1541-7786.MCR-21-0665)
# License
The GNU GPL v2 version of PathML is made available via Open Source licensing.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/api_core_reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ We also provide instantiations of common slide types for convenience:
``pathml.core.types.HE`` 'HE' None True False False False
``pathml.core.types.IHC`` 'IHC' None True False False False
``pathml.core.types.IF`` 'Fluor' None False False False False
``pathml.core.types.CODX`` 'Fluor' 'CODEX' False False False False
``pathml.core.types.CODEX`` 'Fluor' 'CODEX' False False False False
``pathml.core.types.Vectra`` 'Fluor' 'Vectra' False False False False
============================= ======= ======== ======= ======= ========== ===========

Expand Down
47 changes: 31 additions & 16 deletions docs/source/h5path.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ must have sufficient storage. Performance will benefit from storage with fast re
How it Works
------------

Each :class:`~pathml.core.slide_data.SlideData` object is backed by an ``.h5path`` file on disk.
Each :class:`~pathml.core.SlideData` object is backed by an ``.h5path`` file on disk.
All interaction with the ``.h5path`` file is handled automatically by the :class:`~pathml.core.h5managers.h5pathManager`.
For example, when a user calls ``slidedata.tiles[tile_key]``, the :class:`~pathml.core.h5managers.h5pathManager` will
retrieve the tile from disk and return it, without the user needing to worry about accessing the HDF5 file themself.
Expand Down Expand Up @@ -67,8 +67,8 @@ HDF5 format consists of 3 types of elements:
import h5py
root = h5py.File('path/to/file.h5path', 'r')
im = root['array'][...]
im_slice = root['array'][0:100, 0:100, :]
im = root['tiles']['(0, 0)']['array'][...]
im_slice = root['tiles']['(0, 0)']['array'][0:100, 0:100, :]
``Attributes`` are stored in a ``.attrs`` object which can be queried like a dictionary:

Expand All @@ -81,15 +81,7 @@ HDF5 format consists of 3 types of elements:
``.h5path`` File Format
-----------------------

**h5path** utilizes a self-describing hierarchical file system similar to :class:`~pathml.core.slide_data`.

The full-resolution whole-slide image is stored in the ``array`` Dataset.

Whole-slide masks are stored in the ``masks/`` Group. All masks are enforced to be the same shape as the image array.

Tile metadata is stored in the ``tiles/`` Group, but tile-level images and masks are not stored separately.
Instead, to retrieve an individual tile, the coordinates and tile_shape attributes are used to slice the
corresponding region from the whole-slide image and masks.
**h5path** utilizes a self-describing hierarchical file system similar to :class:`~pathml.core.SlideData`.

Here we examine the **h5path** file format in detail:

Expand All @@ -109,7 +101,6 @@ Here we examine the **h5path** file format in detail:
│ ├── rgb (Attribute, bool)
│ ├── volumetric (Attribute, bool)
│ └── time_series (Attribute, bool)
├── array (Dataset)
├── masks/ (Group)
│ ├── mask1 (Dataset, array)
│ ├── mask2 (Dataset, array)
Expand All @@ -118,7 +109,13 @@ Here we examine the **h5path** file format in detail:
│ └── `.h5ad` format
└── tiles/ (Group)
├── tile_shape (Attribute, tuple)
├── tile_stride (Attribute, tuple)
├── tile_key1/ (Group)
│ ├── array (Dataset, array)
│ ├── masks/ (Group)
│ │ ├── mask1 (Dataset, array)
│ │ ├── mask2 (Dataset, array)
│ │ └── etc...
│ ├── coords (Attribute, tuple)
│ ├── name (Attribute, str)
│ └── labels/ (Group)
Expand All @@ -130,10 +127,28 @@ Here we examine the **h5path** file format in detail:
└── etc...


Slide-level metadata is stored in the ``fields/`` group.

Slide-level counts matrix metadata is stored in the ``counts/`` group.

The ``tiles/`` group stores tile-level data. Each tile occupies its own group, and tile coordinates are used as
keys for indexing tiles within the ``tiles/`` group. Within each tile's group, the ``array`` dataset contains the
tile image, the ``masks/`` group contains tile-level masks, and other metadata including name, labels, and coords
are stored as attributes. Slide-level metadata about tiling, including tile shape and stride, are stored as attributes
in the ``tiles/`` group.

Whole-slide masks are stored in the ``masks/`` Group. All masks are enforced to be the same shape as the image array.
However, when running a pipeline, these masks are moved to the tile-level and stored within the tile groups.
The slide-level masks are therefore not saved when calling :meth:`SlideData.write() <pathml.core.SlideData.write>`.

We use ``float16`` as the data type for all Datasets.

.. note:: Be aware that the ``h5path`` format specification may change between major versions

Reading and Writing
-------------------

:class:`~pathml.core.slide_data.SlideData` objects are easily written to **h5path** format
by calling :meth:`SlideData.write() <pathml.core.slide_data.SlideData.write>`.
All files with ``.h5`` or ``.h5path`` extensions are loaded to :class:`~pathml.core.slide_data.SlideData` objects
:class:`~pathml.core.SlideData` objects are easily written to **h5path** format
by calling :meth:`SlideData.write() <pathml.core.SlideData.write>`.
All files with ``.h5`` or ``.h5path`` extensions are loaded to :class:`~pathml.core.SlideData` objects
automatically.
1 change: 1 addition & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,4 @@ dependencies:
- sphinx-rtd-theme==1.0.0
- sphinx-autoapi==1.8.4
- sphinx-copybutton==0.4.0
- tqdm
2 changes: 1 addition & 1 deletion pathml/_version.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@
License: GNU GPL 2.0
"""

__version__ = "1.0.dev4"
__version__ = "2.0.0"
Loading

0 comments on commit dfb3d11

Please sign in to comment.