Update documentation (#2)

* ✨ add doc dependencies and build website locally - some hints on how to build docs - added missing file - cleaned-up configuration * ✨ build documentation references on Read the Docs * 🐛 install package on readthedocs * 🎨 switch theme * 🎨 add intersphinx and typehint highlighting * 🐛 update to GitHub (actions) - 🐛 update repository references - 🔥 remove travis CI config file * 🎨 add an executed example to the docs (jupytext+mystnb 🐛 pass on parameters in umap fct. * 🐛 fix various type annotation bugs (intersphinx linking) need to be valid types (typos) or module.objects (missing module name) * 🎨 style landing page
Multiomics-Analytics-Group · Sep 6, 2024 · 43de3ca · 43de3ca
1 parent 1205c28
commit 43de3ca
Show file tree

Hide file tree

Showing 20 changed files with 289 additions and 142 deletions.
diff --git a/.readthedocs.yaml b/.readthedocs.yaml
@@ -20,9 +20,13 @@ sphinx:
 #    - pdf
 #    - epub
 
+
 # Optional but recommended, declare the Python requirements required
 # to build your documentation
 # See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
-# python:
-#    install:
-#    - requirements: docs/requirements.txt
+python:
+   install:
+   - method: pip
+     path: .
+     extra_requirements:
+      - docs
diff --git a/.travis.yml b/.travis.yml
diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst
@@ -15,7 +15,7 @@ Types of Contributions
 Report Bugs
 ~~~~~~~~~~~
 
-Report bugs at https://github.com/albsantosdel/acore/issues.
+Report bugs at https://github.com/Multiomics-Analytics-Group/acore/issues.
 
 If you are reporting a bug, please include:
 
@@ -45,7 +45,7 @@ articles, and such.
 Submit Feedback
 ~~~~~~~~~~~~~~~
 
-The best way to send feedback is to file an issue at https://github.com/albsantosdel/acore/issues.
+The best way to send feedback is to file an issue at https://github.com/Multiomics-Analytics-Group/acore/issues.
 
 If you are proposing a feature:
 
@@ -62,7 +62,7 @@ Ready to contribute? Here's how to set up `acore` for local development.
 1. Fork the `acore` repo on GitHub.
 2. Clone your fork locally::
 
-    $ git clone git@github.com:your_name_here/acore.git
+    $ git clone https://github.com/Multiomics-Analytics-Group/acore.git
 
 3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development::
 
@@ -102,27 +102,20 @@ Before you submit a pull request, check that it meets these guidelines:
 2. If the pull request adds functionality, the docs should be updated. Put
    your new functionality into a function with a docstring, and add the
    feature to the list in README.rst.
-3. The pull request should work for Python 3.5, 3.6, 3.7 and 3.8, and for PyPy. Check
-   https://travis-ci.com/albsantosdel/acore/pull_requests
-   and make sure that the tests pass for all supported Python versions.
-
+3. The pull request should pass the workflows on GitHub.
+
 Tips
 ----
 
 To run a subset of tests::
 
-$ pytest tests.test_acore
+$ pytest tests
 
 
 Deploying
 ---------
 
 A reminder for the maintainers on how to deploy.
 Make sure all your changes are committed (including an entry in HISTORY.rst).
-Then run::
-
-$ bump2version patch # possible: major / minor / patch
-$ git push
-$ git push --tags
-
-Travis will then deploy to PyPI if tests pass.
+Then run create a new `GitHub release <https://github.com/Multiomics-Analytics-Group/acore/releases>`_.
+GitHub will then deploy to PyPI if the tests pass.
diff --git a/README.rst b/README.rst
@@ -51,8 +51,8 @@ A Python package with statistical functions to analyse multimodal molecular data
 * Documentation: https://analytics-core.readthedocs.io.
 
 
-Installation
-============
+PyPI Installation
+=================
 
 ::
 

diff --git a/acore/correlation_analysis.py b/acore/correlation_analysis.py
@@ -12,8 +12,8 @@ def calculate_correlations(x, y, method='pearson'):
     """
     Calculates a Spearman (nonparametric) or a Pearson (parametric) correlation coefficient and p-value to test for non-correlation.
 
-    :param ndarray x: array 1
-    :param ndarray y: array 2
+    :param numpy.ndarray x: array 1
+    :param numpy.ndarray y: array 2
     :param str method: chooses which kind of correlation method to run
     :return: Tuple with two floats, correlation coefficient and two-tailed p-value.
 
@@ -37,8 +37,8 @@ def run_correlation(df, alpha=0.05, subject='subject', group='group', method='pe
     :param str subject: name of column containing subject identifiers.
     :param str group: name of column containing group identifiers.
     :param str method: method to use for correlation calculation ('pearson', 'spearman').
-    :param floar alpha: error rate. Values velow alpha are considered significant.
-    :param string correction: type of correction see apply_pvalue_correction for methods
+    :param float alpha: error rate. Values velow alpha are considered significant.
+    :param str correction: type of correction see apply_pvalue_correction for methods
     :return: Pandas dataframe with columns: 'node1', 'node2', 'weight', 'padj' and 'rejected'.
 
     Example::
@@ -83,7 +83,7 @@ def run_multi_correlation(df_dict, alpha=0.05, subject='subject', on=['subject',
     :param list on: column names to join dataframes on (must be found in all dataframes).
     :param str method: method to use for correlation calculation ('pearson', 'spearman').
     :param float alpha: error rate. Values velow alpha are considered significant.
-    :param string correction: type of correction see apply_pvalue_correction for methods
+    :param str correction: type of correction see apply_pvalue_correction for methods
     :return: Pandas dataframe with columns: 'node1', 'node2', 'weight', 'padj' and 'rejected'.
 
     Example::
@@ -131,7 +131,7 @@ def run_rm_correlation(df, alpha=0.05, subject='subject', correction='fdr_bh'):
     :param df: pandas dataframe with samples as rows and features as columns.
     :param str subject: name of column containing subject identifiers.
     :param float alpha: error rate. Values velow alpha are considered significant.
-    :param string correction: type of correction type see apply_pvalue_correction for methods
+    :param str correction: type of correction type see apply_pvalue_correction for methods
     :return: Pandas dataframe with columns: 'node1', 'node2', 'weight', 'pvalue', 'dof', 'padj' and 'rejected'.
 
     Example::

diff --git a/acore/differential_regulation.py b/acore/differential_regulation.py
@@ -438,7 +438,7 @@ def run_repeated_measurements_anova(
 
     :param df: pandas dataframe with samples as rows and protein identifiers as columns (with additional columns 'group', 'sample' and 'subject').
     :param str subject: column with subject identifiers
-    :param srt within: column with within factor identifiers
+    :param str within: column with within factor identifiers
     :param list drop_cols: column labels to be dropped from the dataframe
     :param float alpha: error rate for multiple hypothesis correction
     :param int permutations: number of permutations used to estimate false discovery rates
@@ -488,8 +488,8 @@ def run_mixed_anova(
 
     :param df: pandas dataframe with samples as rows and protein identifiers as columns (with additional columns 'group', 'sample' and 'subject').
     :param str subject: column with subject identifiers
-    :param srt within: column with within factor identifiers
-    :param srt between: column with between factor identifiers
+    :param str within: column with within factor identifiers
+    :param str between: column with between factor identifiers
     :param list drop_cols: column labels to be dropped from the dataframe
     :param float alpha: error rate for multiple hypothesis correction
     :param int permutations: number of permutations used to estimate false discovery rates

diff --git a/acore/exploratory_analysis.py b/acore/exploratory_analysis.py
@@ -12,9 +12,9 @@ def calculate_coefficient_variation(values):
     deviation to the mean, in percentage. For more information
     visit https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.variation.html.
 
-    :param ndarray values: numpy array
+    :param numpy.ndarray values: numpy array
     :return: The calculated variation along rows.
-    :rtype: ndarray
+    :rtype: numpy.ndarray
 
     Example::
 
@@ -283,7 +283,7 @@ def run_umap(data, drop_cols=['sample', 'subject'], group='group', annotation_co
             annotations = data[annotation_cols]
 
     if X.size:
-        X = umap.UMAP(n_neighbors=10, min_dist=0.3, metric=metric).fit_transform(X)
+        X = umap.UMAP(n_neighbors=n_neighbors, min_dist=min_dist, metric=metric).fit_transform(X)
         args = {"x_title": "C1", "y_title": "C2"}
         resultDf = pd.DataFrame(X, index=y)
         resultDf = resultDf.reset_index()

diff --git a/acore/imputation_analysis.py b/acore/imputation_analysis.py
@@ -12,7 +12,7 @@ def imputation_KNN(data, drop_cols=['group', 'sample', 'subject'], group='group'
     :param str group: column label containing group identifiers.
     :param list drop_cols: column labels to be dropped. Final dataframe should only have gene/protein/etc identifiers as columns.
     :param float cutoff: minimum ratio of missing/valid values required to impute in each column.
-    :param boolean alone: if True removes all columns with any missing values.
+    :param bool alone: if True removes all columns with any missing values.
     :return: Pandas dataframe with samples as rows and protein identifiers as columns.
 
     Example::

diff --git a/acore/multiple_testing.py b/acore/multiple_testing.py
@@ -10,7 +10,7 @@ def apply_pvalue_correction(pvalues, alpha=0.05, method='bonferroni'):
     """
     Performs p-value correction using the specified method. For more information visit https://www.statsmodels.org/dev/generated/statsmodels.stats.multitest.multipletests.html.
 
-    :param ndarray pvalues: et of p-values of the individual tests.
+    :param numpy.ndarray pvalues: et of p-values of the individual tests.
     :param float alpha: error rate.
     :param str method: method of p-value correction:
         - bonferroni : one-step correction
@@ -42,7 +42,7 @@ def apply_pvalue_fdrcorrection(pvalues, alpha=0.05, method='indep'):
     """
     Performs p-value correction for false discovery rate. For more information visit https://www.statsmodels.org/devel/generated/statsmodels.stats.multitest.fdrcorrection.html.
 
-    :param ndarray pvalues: et of p-values of the individual tests.
+    :param numpy.ndarray pvalues: et of p-values of the individual tests.
     :param float alpha: error rate.
     :param str method: method of p-value correction ('indep', 'negcorr').
     :return: Tuple with two arrays, boolen for rejecting H0 hypothesis and float for adjusted p-value.
@@ -60,7 +60,7 @@ def apply_pvalue_twostage_fdrcorrection(pvalues, alpha=0.05, method='bh'):
     """
     Iterated two stage linear step-up procedure with estimation of number of true hypotheses. For more information visit https://www.statsmodels.org/dev/generated/statsmodels.stats.multitest.fdrcorrection_twostage.html.
 
-    :param ndarray pvalues: et of p-values of the individual tests.
+    :param numpy.ndarray pvalues: et of p-values of the individual tests.
     :param float alpha: error rate.
     :param str method: method of p-value correction ('bky', 'bh').
     :return: Tuple with two arrays, boolen for rejecting H0 hypothesis and float for adjusted p-value.
@@ -139,7 +139,7 @@ def get_counts_permutation_fdr(value, random, observed, n, alpha):
     Calculates local FDR values (q-values) by computing the fraction of accepted hits from the permuted data over accepted hits from the measured data normalized by the total number of permutations.
 
     :param float value: computed p-value on measured data for a feature.
-    :param ndarray random: p-values computed on the permuted data.
+    :param numpy.ndarray random: p-values computed on the permuted data.
     :param observed: pandas Series with p-values calculated on the originally measured data.
     :param int n: number of permutations to be applied.
     :param float alpha: error rate. Values velow alpha are considered significant.

diff --git a/acore/normalization_analysis.py b/acore/normalization_analysis.py
@@ -56,7 +56,7 @@ def normalize_data(data, method='median', normalize=None):
     This function normalizes the data using the selected method
 
     :param data: DataFrame with the data to be normalized (samples x features)
-    :param string method: normalization method to choose among: median_polish, median,
+    :param str method: normalization method to choose among: median_polish, median,
                         quantile, linear
     :param str normalize: whether the normalization should be done by 'features' (columns) or 'samples' (rows) (default None)
     :return: Pandas dataframe.

diff --git a/docs/.gitignore b/docs/.gitignore
@@ -0,0 +1,4 @@
+_build
+jupyter_execute
+reference
+
diff --git a/docs/README.md b/docs/README.md
@@ -0,0 +1,31 @@
+# Docs creation
+
+In order to build the docs you need to 
+
+  1. install sphinx and additional support packages
+  2. build the package reference files
+  3. run sphinx to create a local html version
+
+The documentation is build using readthedocs automatically.
+
+Install the docs dependencies of the package (as speciefied in toml):
+
+```bash
+# in main folder
+pip install .[docs]
+```
+
+## Build docs using Sphinx command line tools
+
+Command to be run from `path/to/docs`, i.e. from within the `docs` package folder: 
+
+Options:
+  - `--separate` to build separate pages for each (sub-)module
+
+```bash	
+# pwd: docs
+# apidoc
+sphinx-apidoc --force --implicit-namespaces --module-first -o reference ../acore
+# build docs
+sphinx-build -n -W --keep-going -b html ./ ./_build/
+```
diff --git a/docs/api_examples/exploratory_analysis.py b/docs/api_examples/exploratory_analysis.py
@@ -0,0 +1,61 @@
+# %% [markdown]
+# # Exploratory Analysis
+
+# %%
+import pandas as pd
+import acore.exploratory_analysis as ea
+
+data = pd.DataFrame(
+    {
+        "group": ["A", "A", "B", "B"],
+        "protein1": [1.4, 2.2, 5.3, 4.2],
+        "protein2": [5.6, 0.3, 2.1, 8.1],
+        "protein3": [9.1, 10.01, 11.2, 12.9],
+    }
+)
+
+# %% [markdown]
+# Show first two principal components of the data.
+
+# %%
+result_dfs, annotation = ea.run_pca(
+    data, drop_cols=[], annotation_cols=[], group="group", components=2, dropna=True
+)
+
+# %% [markdown]
+# Show what was computed:
+
+# %%
+result_dfs[0]
+
+# %%
+result_dfs[1]
+
+# %%
+result_dfs[2]
+
+# %%
+annotation
+
+# %% [markdown]
+# Visualize UMAP low-dimensional embedding of the data.
+
+# %%
+result, annotation = ea.run_umap(
+    data,
+    drop_cols=["sample", "subject"],
+    group="group",
+    n_neighbors=10,
+    min_dist=0.3,
+    metric="cosine",
+    dropna=True,
+)
+
+# %%
+result['umap']
+
+# %%
+annotation
+
+# %% [markdown]
+# Make sure to check the parameter annotations in the API docs.