Skip to content

Commit 4f728ef

Browse files
committed
docs: initial mkdocs conversion
1 parent 51a9f8a commit 4f728ef

23 files changed

+1143
-1017
lines changed

.readthedocs.yml

+23-17
Original file line numberDiff line numberDiff line change
@@ -4,21 +4,27 @@ build:
44
os: ubuntu-22.04
55
tools:
66
python: "3.11"
7-
jobs:
8-
post_checkout:
9-
# Full history is required for dunamai to calculate the version
10-
- git fetch --unshallow || true
11-
post_create_environment:
12-
# Install poetry
13-
# https://python-poetry.org/docs/#installing-manually
14-
- pip install poetry
15-
# Tell poetry to not use a virtual environment
16-
- poetry config virtualenvs.create false
17-
post_install:
18-
# Install dependencies with 'docs' dependency group
19-
# https://python-poetry.org/docs/managing-dependencies/#dependency-groups
20-
- poetry install --with dev,docs --all-extras
7+
commands:
8+
# Full history is required for dunamai to calculate the version
9+
- git fetch --unshallow || true
10+
# Install poetry
11+
# https://python-poetry.org/docs/#installing-manually
12+
- pip install poetry
13+
# Install poetry-dynamic-versioning plugin
14+
- poetry self add "poetry-dynamic-versioning[plugin]"
15+
# Build the project
16+
- poetry build --format sdist
17+
# Extract the built sdist
18+
- mkdir -p dist/sdist && tar -xzf dist/*.tar.gz -C dist/sdist/
19+
# Replace the files from the repo with the built sdist
20+
- mv dist/sdist/*/pyproject.toml .
21+
# Tell poetry to not use a virtual environment
22+
- poetry config virtualenvs.create false
23+
# Install dependencies with 'docs' dependency group
24+
# https://python-poetry.org/docs/managing-dependencies/#dependency-groups
25+
- poetry install --with dev,docs --all-extras
26+
# Build the docs
27+
- poetry run mkdocs build --clean --site-dir $READTHEDOCS_OUTPUT/html --config-file mkdocs.yml
2128

22-
sphinx:
23-
builder: html
24-
configuration: docs/conf.py
29+
mkdocs:
30+
configuration: mkdocs.yml

Makefile

+2-2
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,8 @@ test: ## Test the code with pytest
1818

1919
.PHONY: docs
2020
docs: ## Build the documentation
21-
@echo "📚 Building documentation"
22-
@poetry run sphinx-build docs build
21+
@echo "📚 Serving documentation"
22+
@mkdocs serve
2323

2424
.PHONY: build
2525
build: clean-build ## Build wheel file using poetry

PKG-INFO

+250
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,250 @@
1+
Metadata-Version: 2.1
2+
Name: pycytominer
3+
Version: 1.1.0.post29.dev0+9615c0d
4+
Summary: Python package for processing image-based profiling data
5+
Home-page: https://pycytominer.readthedocs.io/
6+
License: BSD-3-Clause
7+
Author: Erik Serrano
8+
Maintainer: Gregory P. Way
9+
Maintainer-email: [email protected]
10+
Requires-Python: >=3.8,<4.0
11+
Classifier: License :: OSI Approved :: BSD License
12+
Classifier: Programming Language :: Python :: 3
13+
Classifier: Programming Language :: Python :: 3.8
14+
Classifier: Programming Language :: Python :: 3.9
15+
Classifier: Programming Language :: Python :: 3.10
16+
Classifier: Programming Language :: Python :: 3.11
17+
Classifier: Programming Language :: Python :: 3.12
18+
Provides-Extra: cell-locations
19+
Provides-Extra: collate
20+
Requires-Dist: boto3 (>=1.26.79) ; extra == "cell-locations"
21+
Requires-Dist: cytominer-database (==0.3.4) ; extra == "collate"
22+
Requires-Dist: fire (>=0.5.0) ; extra == "cell-locations"
23+
Requires-Dist: fsspec (>=2023.1.0) ; extra == "cell-locations"
24+
Requires-Dist: numpy (>=1.16.5)
25+
Requires-Dist: pandas (>=1.2.0)
26+
Requires-Dist: pyarrow (>=8.0.0)
27+
Requires-Dist: s3fs (>=2023.4.0) ; extra == "cell-locations"
28+
Requires-Dist: scikit-learn (>=0.21.2)
29+
Requires-Dist: scipy (>=1.5)
30+
Requires-Dist: sqlalchemy (>=1.3.6,<2)
31+
Project-URL: Repository, https://github.com/cytomining/pycytominer
32+
Description-Content-Type: text/markdown
33+
34+
<img height="200" src="https://raw.githubusercontent.com/cytomining/pycytominer/main/logo/with-text-for-light-bg.png?raw=true">
35+
36+
- [Data processing for image-based profiling](#data-processing-for-image-based-profiling)
37+
- [Installation](##installation)
38+
- [Frameworks](#frameworks)
39+
- [API](#api)
40+
- [Usage](#usage)
41+
- [Pipeline orchestration](#pipeline-orchestration)
42+
- [Other functionality](#other-functionality)
43+
- [CellProfiler CSV collation](#cellprofiler-csv-collation)
44+
- [Creating a cell locations lookup table](#creating-a-cell-locations-lookup-table)
45+
- [Generating a GCT file for morpheus](#generating-a-gct-file-for-morpheus)
46+
- [Citing pycytominer](#citing-pycytominer)
47+
48+
# Data processing for image-based profiling
49+
50+
[![Build Status](https://github.com/cytomining/pycytominer/actions/workflows/integration-test.yml/badge.svg?branch=main)](https://github.com/cytomining/pycytominer/actions/workflows/integration-test.yml?query=branch%3Amain)
51+
[![Coverage Status](https://codecov.io/gh/cytomining/pycytominer/branch/main/graph/badge.svg)](https://codecov.io/github/cytomining/pycytominer?branch=main)
52+
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
53+
[![RTD](https://readthedocs.org/projects/pycytominer/badge/?version=latest&style=flat)](https://pycytominer.readthedocs.io/)
54+
[![DOI](https://img.shields.io/badge/DOI-10.48550/arXiv.2311.13417-blue)](https://doi.org/10.48550/arXiv.2311.13417)
55+
56+
Pycytominer is a suite of common functions used to process high dimensional readouts from high-throughput cell experiments.
57+
The tool is most often used for processing data through the following pipeline:
58+
59+
<img height="325" alt="Description of the pycytominer pipeline. Images flow from feature extraction and are processed with a series of steps" src="https://github.com/cytomining/pycytominer/blob/main/media/pipeline.png?raw=true">
60+
61+
[Click here for high resolution pipeline image](https://github.com/cytomining/pycytominer/blob/main/media/pipeline.png)
62+
63+
Image data flow from a microscope to cell segmentation and feature extraction tools (e.g. CellProfiler or DeepProfiler).
64+
From here, additional single cell processing tools curate the single cell readouts into a form manageable for pycytominer input.
65+
For CellProfiler, we use [cytominer-database](https://github.com/cytomining/cytominer-database) or [CytoTable](https://github.com/cytomining/CytoTable).
66+
For DeepProfiler, we include single cell processing tools in [pycytominer.cyto_utils](pycytominer/cyto_utils/).
67+
68+
From the single cell output, pycytominer performs five steps using a simple API (described below), before passing along data to [cytominer-eval](https://github.com/cytomining/cytominer-eval) for quality and perturbation strength evaluation.
69+
70+
## Installation
71+
72+
You can install pycytominer via pip:
73+
74+
```bash
75+
pip install pycytominer
76+
```
77+
78+
or conda:
79+
80+
```bash
81+
conda install -c conda-forge pycytominer
82+
```
83+
84+
## Frameworks
85+
86+
Pycytominer is primarily built on top of [pandas](https://pandas.pydata.org/docs/index.html), also using aspects of SQLAlchemy, sklearn, and pyarrow.
87+
88+
Pycytominer currently supports [parquet](https://parquet.apache.org/) and compressed text file (e.g. `.csv.gz`) i/o.
89+
90+
## API
91+
92+
Pycytominer has five major processing functions:
93+
94+
1. Aggregate - Average single-cell profiles based on metadata information (most often "well").
95+
2. Annotate - Append metadata (most often from the platemap file) to the feature profile
96+
3. Normalize - Transform input feature data into consistent distributions
97+
4. Feature select - Exclude non-informative or redundant features
98+
5. Consensus - Average aggregated profiles by replicates to form a "consensus signature"
99+
100+
The API is consistent for each of these functions:
101+
102+
```python
103+
# Each function takes as input a pandas DataFrame or file path
104+
# and transforms the input data based on the provided options and methods
105+
df = function(
106+
profiles_or_path,
107+
features,
108+
samples,
109+
method,
110+
output_file,
111+
additional_options...
112+
)
113+
```
114+
115+
Each processing function has unique arguments, see our [documentation](https://pycytominer.readthedocs.io/) for more details.
116+
117+
## Usage
118+
119+
The default way to use pycytominer is within python scripts, and using pycytominer is simple and fun.
120+
121+
```python
122+
# Real world example
123+
import pandas as pd
124+
import pycytominer
125+
126+
commit = "da8ae6a3bc103346095d61b4ee02f08fc85a5d98"
127+
url = f"https://media.githubusercontent.com/media/broadinstitute/lincs-cell-painting/{commit}/profiles/2016_04_01_a549_48hr_batch1/SQ00014812/SQ00014812_augmented.csv.gz"
128+
129+
df = pd.read_csv(url)
130+
131+
normalized_df = pycytominer.normalize(
132+
profiles=df,
133+
method="standardize",
134+
samples="Metadata_broad_sample == 'DMSO'"
135+
)
136+
```
137+
138+
### Pipeline orchestration
139+
140+
Pycytominer is a collection of different functions with no explicit link between steps.
141+
However, some options exist to use pycytominer within a pipeline framework.
142+
143+
| Project | Format | Environment | pycytominer usage |
144+
| :------------------------------------------------------------------------------- | :-------- | :------------------- | :---------------------- |
145+
| [Profiling-recipe](https://github.com/cytomining/profiling-recipe) | yaml | agnostic | full pipeline support |
146+
| [CellProfiler-on-Terra](https://github.com/broadinstitute/cellprofiler-on-Terra) | WDL | google cloud / Terra | single-cell aggregation |
147+
| [CytoSnake](https://github.com/WayScience/CytoSnake) | snakemake | agnostic | full pipeline support |
148+
149+
A separate project called [AuSPICES](https://github.com/broadinstitute/AuSPICEs) offers pipeline support up to image feature extraction.
150+
151+
## Other functionality
152+
153+
Pycytominer was written with a goal of processing any high-throughput image-based profiling data.
154+
However, the initial use case was developed for processing image-based profiling experiments specifically.
155+
And, more specifically than that, image-based profiling readouts from [CellProfiler](https://github.com/CellProfiler) measurements from [Cell Painting](https://www.nature.com/articles/nprot.2016.105) data.
156+
157+
Therefore, we have included some custom tools in `pycytominer/cyto_utils` that provides other functionality:
158+
159+
Note, [`pycytominer.cyto_utils.cells.SingleCells()`](./pycytominer/cyto_utils/cells.py) contains code to interact with single-cell SQLite files, which are output from CellProfiler.
160+
Processing capabilities for SQLite files depends on SQLite file size and your available computational resources (for ex. memory and cores).
161+
162+
### CellProfiler CSV collation
163+
164+
If running your images on a cluster, unless you have a MySQL or similar large database set up then you will likely end up with lots of different folders from the different cluster runs (often one per well or one per site), each one containing an `Image.csv`, `Nuclei.csv`, etc.
165+
In order to look at full plates, therefore, we first need to collate all of these CSVs into a single file (currently SQLite) per plate.
166+
We currently do this with a library called [cytominer-database](https://github.com/cytomining/cytominer-database).
167+
168+
If you want to perform this data collation inside pycytominer using the `cyto_utils` function `collate` (and/or you want to be able to run the tests and have them all pass!), you will need `cytominer-database==0.3.4`; this will change your installation commands slightly:
169+
170+
```bash
171+
# Example for general case commit:
172+
pip install "pycytominer[collate]"
173+
174+
# Example for specific commit:
175+
pip install "pycytominer[collate] @ git+https://github.com/cytomining/pycytominer@77d93a3a551a438799a97ba57d49b19de0a293ab"
176+
```
177+
178+
If using `pycytominer` in a conda environment, in order to run `collate.py`, you will also want to make sure to add `cytominer-database=0.3.4` to your list of dependencies.
179+
180+
### Creating a cell locations lookup table
181+
182+
The `CellLocation` class offers a convenient way to augment a [LoadData](https://cellprofiler-manual.s3.amazonaws.com/CPmanual/LoadData.html) file with X,Y locations of cells in each image.
183+
The locations information is obtained from a single cell SQLite file.
184+
185+
To use this functionality, you will need to modify your installation command, similar to above:
186+
187+
```bash
188+
# Example for general case commit:
189+
pip install "pycytominer[cell_locations]"
190+
```
191+
192+
Example using this functionality:
193+
194+
```bash
195+
metadata_input="s3://cellpainting-gallery/test-cpg0016-jump/source_4/workspace/load_data_csv/2021_08_23_Batch12/BR00126114/test_BR00126114_load_data_with_illum.parquet"
196+
single_single_cell_input="s3://cellpainting-gallery/test-cpg0016-jump/source_4/workspace/backend/2021_08_23_Batch12/BR00126114/test_BR00126114.sqlite"
197+
augmented_metadata_output="~/Desktop/load_data_with_illum_and_cell_location_subset.parquet"
198+
199+
python \
200+
-m pycytominer.cyto_utils.cell_locations_cmd \
201+
--metadata_input ${metadata_input} \
202+
--single_cell_input ${single_single_cell_input} \
203+
--augmented_metadata_output ${augmented_metadata_output} \
204+
add_cell_location
205+
206+
# Check the output
207+
208+
python -c "import pandas as pd; print(pd.read_parquet('${augmented_metadata_output}').head())"
209+
210+
# It should look something like this (depends on the width of your terminal):
211+
212+
# Metadata_Plate Metadata_Well Metadata_Site ... PathName_OrigRNA ImageNumber CellCenters
213+
# 0 BR00126114 A01 1 ... s3://cellpainting-gallery/cpg0016-jump/source_... 1 [{'Nuclei_Location_Center_X': 943.512129380054...
214+
# 1 BR00126114 A01 2 ... s3://cellpainting-gallery/cpg0016-jump/source_... 2 [{'Nuclei_Location_Center_X': 29.9516027655562...
215+
```
216+
217+
### Generating a GCT file for morpheus
218+
219+
The software [morpheus](https://software.broadinstitute.org/morpheus/) enables profile visualization in the form of interactive heatmaps.
220+
Pycytominer can convert profiles into a `.gct` file for drag-and-drop input into morpheus.
221+
222+
```python
223+
# Real world example
224+
import pandas as pd
225+
import pycytominer
226+
227+
commit = "da8ae6a3bc103346095d61b4ee02f08fc85a5d98"
228+
plate = "SQ00014812"
229+
url = f"https://media.githubusercontent.com/media/broadinstitute/lincs-cell-painting/{commit}/profiles/2016_04_01_a549_48hr_batch1/{plate}/{plate}_normalized_feature_select.csv.gz"
230+
231+
df = pd.read_csv(url)
232+
output_file = f"{plate}.gct"
233+
234+
pycytominer.cyto_utils.write_gct(
235+
profiles=df,
236+
output_file=output_file
237+
)
238+
```
239+
240+
## Citing pycytominer
241+
242+
If you have used `pycytominer` in your project, please use the citation below.
243+
You can also find the citation in the 'cite this repository' link at the top right under `about` section.
244+
245+
APA:
246+
247+
```text
248+
Serrano, E., Chandrasekaran, N., Bunten, D., Brewer, K., Tomkinson, J., Kern, R., Bornholdt, M., Fleming, S., Pei, R., Arevalo, J., Tsang, H., Rubinetti, V., Tromans-Coia, C., Becker, T., Weisbart, E., Bunne, C., Kalinin, A. A., Senft, R., Taylor, S. J., Jamali, N., Adeboye, A., Abbasi, H. S., Goodman, A., Caicedo, J., Carpenter, A. E., Cimini, B. A., Singh, S., & Way, G. P. Reproducible image-based profiling with Pycytominer. https://doi.org/10.48550/arXiv.2311.13417
249+
```
250+

README.md

+13-13
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,17 @@
11
<img height="200" src="https://raw.githubusercontent.com/cytomining/pycytominer/main/logo/with-text-for-light-bg.png?raw=true">
22

3+
- [Data processing for image-based profiling](#data-processing-for-image-based-profiling)
4+
- [Installation](##installation)
5+
- [Frameworks](#frameworks)
6+
- [API](#api)
7+
- [Usage](#usage)
8+
- [Pipeline orchestration](#pipeline-orchestration)
9+
- [Other functionality](#other-functionality)
10+
- [CellProfiler CSV collation](#cellprofiler-csv-collation)
11+
- [Creating a cell locations lookup table](#creating-a-cell-locations-lookup-table)
12+
- [Generating a GCT file for morpheus](#generating-a-gct-file-for-morpheus)
13+
- [Citing pycytominer](#citing-pycytominer)
14+
315
# Data processing for image-based profiling
416

517
[![Build Status](https://github.com/cytomining/pycytominer/actions/workflows/integration-test.yml/badge.svg?branch=main)](https://github.com/cytomining/pycytominer/actions/workflows/integration-test.yml?query=branch%3Amain)
@@ -111,19 +123,7 @@ And, more specifically than that, image-based profiling readouts from [CellProfi
111123

112124
Therefore, we have included some custom tools in `pycytominer/cyto_utils` that provides other functionality:
113125

114-
- [Data processing for image-based profiling](#data-processing-for-image-based-profiling)
115-
- [Installation](#installation)
116-
- [Frameworks](#frameworks)
117-
- [API](#api)
118-
- [Usage](#usage)
119-
- [Pipeline orchestration](#pipeline-orchestration)
120-
- [Other functionality](#other-functionality)
121-
- [CellProfiler CSV collation](#cellprofiler-csv-collation)
122-
- [Creating a cell locations lookup table](#creating-a-cell-locations-lookup-table)
123-
- [Generating a GCT file for morpheus](#generating-a-gct-file-for-morpheus)
124-
- [Citing pycytominer](#citing-pycytominer)
125-
126-
Note, [`pycytominer.cyto_utils.cells.SingleCells()`](pycytominer/cyto_utils/cells.py) contains code to interact with single-cell SQLite files, which are output from CellProfiler.
126+
Note, [`pycytominer.cyto_utils.cells.SingleCells()`](./pycytominer/cyto_utils/cells.py) contains code to interact with single-cell SQLite files, which are output from CellProfiler.
127127
Processing capabilities for SQLite files depends on SQLite file size and your available computational resources (for ex. memory and cores).
128128

129129
### CellProfiler CSV collation

docs/cyto_utils.md

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Cyto utilities
2+
3+
Functions enabling smooth interaction with CellProfiler and DeepProfiler output formats.
4+
5+
::: pycytominer.cyto_utils

docs/functions.md

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Main Functions
2+
3+
::: pycytominer
4+
options:
5+
members: - aggregate - annotate - consensus - feature_select - normalize

docs/index.md

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
{%
2+
include-markdown "../README.md"
3+
%}

docs/index.rst

-22
This file was deleted.

0 commit comments

Comments
 (0)