fixed documentation bugs

YosefLab · Apr 20, 2020 · 839a880 · 839a880
1 parent 9090807
commit 839a880
Show file tree

Hide file tree

Showing 8 changed files with 72 additions and 50 deletions.
diff --git a/docs/conf.py b/docs/conf.py
@@ -40,9 +40,8 @@
     "matplotlib.pyplot",
     "matplotlib.backends",
     "matplotlib.lines",
-    "matplotlib.transforms"
-
-
+    "matplotlib.transforms",
+    "sklearn.calibration"
 ]
 
 for mod_name in MOCK_MODULES:

diff --git a/docs/figures/epitome_diagram.png b/docs/figures/epitome_diagram.png
diff --git a/docs/figures/epitome_diagram_celllines.png b/docs/figures/epitome_diagram_celllines.png
diff --git a/docs/index.rst b/docs/index.rst
@@ -5,7 +5,7 @@ Introduction
 
 Epitome is a computational model that leverages chromatin accessibility data to predict ChIP-seq peaks on unseen cell types. Epitome computes the chromatin similarity between 11 cell types in ENCODE and the novel cell types, and uses chromatin similarity to transfer ChIP-seq peaks in known cell types to a novel cell type of interest.
 
-.. image:: figures/epitome_diagram.png
+.. image:: figures/epitome_diagram_celllines.png
 
 
 .. toctree::

diff --git a/docs/installation/source.rst b/docs/installation/source.rst
@@ -1,5 +1,5 @@
-Building Epitome from Source
-============================
+Installing Epitome
+==================
 
 **Note**: Epitome is configured for tensorflow 2/Cuda 9. If you have a different
 version of cuda, update tensorflow-gpu version accordingly.
@@ -10,6 +10,7 @@ Requirements
 * `conda <https://docs.conda.io/en/latest/miniconda.html>`__
 * python 3.7
 
+
 Installation
 ------------
 
@@ -20,25 +21,33 @@ Installation
 	conda create --name EpitomeEnv python=3.7
 	source activate EpitomeEnv
 
+2. Install Epitome from Pypi:
 
-2. Get Epitome code:
+.. code:: bash
+
+	pip install epitome
+
+From Source
+-----------
+
+1. Create and activate a pytion 3.7 conda venv:
 
 .. code:: bash
 
-	git clone [email protected]:akmorrow13/epitome.git
-	cd epitome
+	conda create --name EpitomeEnv python=3.7
+	source activate EpitomeEnv
 
 
-3. Install Epitome and its requirements
+2. Get Epitome code:
 
 .. code:: bash
 
-	pip install -e .
+	git clone https://github.com/YosefLab/epitome.git
+	cd epitome
 
 
+3. Install Epitome and its requirements
 
-Configuring Data
-----------------
+.. code:: bash
 
-Epitome requires data for training, validation and test. See `Configuring Epitome data <../usage/data.html>`__ for more information
-on how to download data for Epitome or generate your own dataset.
+	pip install -e .
diff --git a/docs/usage/data.rst b/docs/usage/data.rst
@@ -2,31 +2,32 @@ Configuring data
 ================
 
 Epitome pre-processes ChIP-seq peaks and DNase-seq peaks from ENCODE for usage
-in the Epitome models. Pre-processed data can be downloaded from:
+in the Epitome models. Pre-processed datasets are lazily downloaded from `Amazon S3 <../https://epitome-data.s3-us-west-1.amazonaws.com/data.zip>`__ when users run an Epitome model.
 
-TODO: upload to AWS
 
-This data contains the following:
-- train.npz, valid.npz, and test.npz: compressed numpy data matrices. Valid.npz includes chr7 data, test.npz includes chr8 and chr8,
-and train.npz includes data from all other chromosomes.
-- all.pos.bed.gz: gunzipped genomic regions matching the numpy data matrices
-- feature_name: ChIP-seq and DNase-seq peaks corresponding to the data matrix.
+This dataset contains the following files:
+
+- **train.npz, valid.npz, and test.npz**: compressed numpy data matrices. Valid.npz includes chr7 data, test.npz includes chr8 and chr9, and train.npz includes data from all other chromosomes.
+
+- **all.pos.bed.gz**: gunzipped genomic regions matching the numpy data matrices.
+
+- **feature_name**: ChIP-seq and DNase-seq peaks corresponding to the data matrix.
 
 
 Generating data for Epitome
 ---------------------------
 
-TODO: need to add this script as a binary in the module.
-
 You can generate your own Epitome dataset from ENCODE using the following command:
 ```download_encode.py```.
 
 .. code:: bash
 
-  python get_deepsea_data.py -h
+  python download_encode.py -h
 
   usage: download_encode.py [-h] [--metadata_url METADATA_URL]
                           [--min_chip_per_cell MIN_CHIP_PER_CELL]
                           [--regions_file REGIONS_FILE]
                           download_path {hg19,mm10,GRCh38} bigBedToBed
                           output_path
+
+TODO: need to add this script as a binary in the module.
diff --git a/docs/usage/predict.rst b/docs/usage/predict.rst
@@ -12,17 +12,19 @@ To get predictions on the whole genome, run:
 .. code:: python
 
   peak_result = model.score_whole_genome(peak_file, # chromatin accessibility peak file
-      output_path, # where to save results
-      chrs=["chr8","chr9"]) # chromosomes you would like to score. Leave blank for whole genome.
+    output_path, # where to save results
+    chrs=["chr8","chr9"]) # chromosomes you would like to score. Leave blank for whole genome.
 
 **Note:** Scoring on the whole genome scores about 7 million regions and takes about 1.5 hours.
 
+TODO: talk about including histone modification files.
+
 
 You can also get predictions on specific genomic regions:
 
 .. code:: python
 
   results = model.score_peak_file(peak_file, # chromatin accessibility peak file
-                         regions_file) # bed file of regions to score
+    regions_file) # bed file of regions to score
 
 This method returns a dataframe of the scored predictions.
diff --git a/docs/usage/train.rst b/docs/usage/train.rst
@@ -1,12 +1,12 @@
 Training an Epitome Model
 =========================
 
-Once you have `installed Epitome and configured the data <../installation/source.html>`__, you are ready to train a model.
+Once you have `installed Epitome <../installation/source.html>`__, you are ready to train a model.
 
 Training a Model
 ----------------
 
-First, import Epitome and specified the `path to Epitome data: <./data.html>`__
+First, import Epitome:
 
 .. code:: python
 
@@ -16,36 +16,45 @@ First, import Epitome and specified the `path to Epitome data: <./data.html>`__
 	from epitome.functions import *
 	from epitome.viz import *
 
-	epitome_data_path = <path_to_dataset>
+Quick Start
+^^^^^^^^^^^
 
-To train a model, you will need to first specify the assays and cell lines you would like to train with:
+First, define the assays you would like to train. Then you can create a `VLP` model:
+
+TODO: have a function that lists all of the assays you can build a model from.
+
+.. code:: python
+
+	assays = ['CTCF','RAD21','SMC3']
+	model = VLP(assays, test_celltypes = ["K562"]) # cell line reserved for testing
+
+To train a model on a specific set of targets and cell lines, you will need to first specify the assays and cell lines you would like to train with:
 
 .. code:: bash
 
-	matrix, cellmap, assaymap = get_assays_from_feature_file(feature_path=os.path.join(epitome_data_path,'feature_name'),
-                                      eligible_assays = None,
-                                      eligible_cells = None,
-                                      min_assays_per_cell=2,
-                                      min_cells_per_assay=2)
+	matrix, cellmap, assaymap = get_assays_from_feature_file(eligible_assays = None,
+		eligible_cells = None,
+		min_assays_per_cell=6,
+		min_cells_per_assay=8)
 
+	# visualize cell lines and ChIP-seq peaks you have selected
+	plot_assay_heatmap(matrix, cellmap, assaymap)
 
 
-Next train a model for 5000 iterations:
+Next define a model:
 
 .. code:: python
 
-  	# for each, train a model
-	shuffle_size = 2
+	model = VLP(list(assaymap),
+		matrix = matrix,
+		assaymap = assaymap,
+		cellmap = cellmap,
+		test_celltypes = ["K562"]) # cell line reserved for testing)
+
 
-	model = MLP(epitome_data_path,
-	            ["K562"], # cell line reserved for testing
-	            matrix,
-	            assaymap,
-	            cellmap,
-	            shuffle_size=shuffle_size,
-	            prefetch_size = 64,
-	            debug = False,
-	            batch_size=64)
+Next, train the model. Here, we train the model for 5000 iterations:
+
+.. code:: python
 
 	model.train(5000)
 
@@ -54,6 +63,8 @@ You can then evaluate model performance on held out test cell lines specified in
 
 .. code:: python
 
-  results = model.test(self, 10000, log=True)
+	results = model.test(10000,
+		mode = Dataset.TEST,
+		calculate_metrics=True)
 
 The output of `results` will contain the predictions and truth values, a dictionary of assay specific performance metrics, and the average auROC and auPRC across all evaluated assays.