🎨 more explanations, less

Multiomics-Analytics-Group · Feb 18, 2025 · 7ea8191 · 7ea8191
1 parent 469a2c4
commit 7ea8191
Show file tree

Hide file tree

Showing 2 changed files with 48 additions and 32 deletions.
diff --git a/docs/api_examples/enrichment_analysis.ipynb b/docs/api_examples/enrichment_analysis.ipynb
@@ -8,23 +8,24 @@
    },
    "source": [
     "# Enrichment analysis\n",
+    "requires\n",
+    "- some cluster of proteins/genes (e.g. up- and downregulated proteins/genes)\n",
+    "- functional annotations, i.e. a category summarizing a set of proteins/genes.\n",
     "\n",
-    "- we need some groups of genes to compute clusters\n",
-    "- we need functional annotations, i.e. a category summarizing a set of genes.\n",
-    "-\n",
     "You can start with watching Lars Juhl Jensen's brief introduction to enrichment analysis\n",
     "on [youtube](https://www.youtube.com/watch?v=2NC1QOXmc5o).\n",
     "\n",
-    "Use example data for ovarian cancer\n",
-    "([PXD010372](https://github.com/Multiomics-Analytics-Group/acore/tree/main/example_data/PXD010372))"
+    "Here we use as example data from an ovarian cancer dataset:\n",
+    "[PXD010372](https://github.com/Multiomics-Analytics-Group/acore/tree/main/example_data/PXD010372)\n",
+    "\n",
+    "First make sure you have the required packages installed:"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "956ed7b7",
    "metadata": {
-    "lines_to_next_cell": 2,
     "tags": [
      "hide-output"
     ]
@@ -88,7 +89,8 @@
    "id": "10ed1830",
    "metadata": {},
    "source": [
-    "# Load processed data"
+    "# Load processed data\n",
+    "from our repository. See details on obtaining the data under the example data section."
    ]
   },
   {
@@ -125,7 +127,7 @@
    "metadata": {},
    "source": [
     "Keep only features with a certain amount of non-NaN values and select 100 of these\n",
-    "for illustration. Add the ones which were differently regulated in the ANOVA using all\n",
+    "for illustration. Add always four which were differently regulated in the ANOVA using all\n",
     "the protein groups."
    ]
   },
@@ -136,16 +138,14 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "idx_always_included = [\"Q5HYN5\", \"P39059\", \"O43432\", \"O43175\"]\n",
-    "df_omics[idx_always_included]"
+    "idx_always_included = [\"Q5HYN5\", \"P39059\", \"O43432\", \"O43175\"]"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "1145a2cd",
    "metadata": {
-    "lines_to_next_cell": 2,
     "tags": [
      "hide-input"
     ]
@@ -167,6 +167,15 @@
     "df_omics"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "ff72465c",
+   "metadata": {},
+   "source": [
+    "And we have the following patient metadata, from which we will use the `Status` column as\n",
+    "our dependent variable and the `PlatinumValue` as a covariate."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -184,7 +193,7 @@
    "id": "4bbf5dc4",
    "metadata": {},
    "source": [
-    "# Compute up and downregulated genes\n",
+    "# ANCOVA: Compute up and downregulated genes\n",
     "These will be used to find enrichments in the set of both up and downregulated genes."
    ]
   },
@@ -222,7 +231,8 @@
    "id": "d6c0a225",
    "metadata": {},
    "source": [
-    "# Find functional annotations, here pathways\n"
+    "# Download functional annotations, here pathways, for the protein groups\n",
+    "in our selection of the dataset."
    ]
   },
   {
@@ -276,7 +286,8 @@
    "id": "d4734452",
    "metadata": {},
    "source": [
-    "See how many protein groups are associated with each annotation."
+    "See how many protein groups are associated with each annotation. We observe that most\n",
+    "functional annotations are associated only to a single protein group in our dataset."
    ]
   },
   {
@@ -391,8 +402,8 @@
    "id": "e51bd7e3",
    "metadata": {},
    "source": [
-    "And even more if we do not restrict the analysis to functional annotation to at least\n",
-    "finding two proteins in a functional set."
+    "And even more if we do not restrict the analysis of finding at least two proteins\n",
+    "of a functional set in our data set (i.e. we only need to find one match from the set)."
    ]
   },
   {
@@ -504,7 +515,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "enrichtments[\"NES\"].plot.hist()"
+    "ax = enrichtments[\"NES\"].plot.hist()"
    ]
   },
   {

diff --git a/docs/api_examples/enrichment_analysis.py b/docs/api_examples/enrichment_analysis.py
@@ -1,20 +1,21 @@
 # %% [markdown]
 # # Enrichment analysis
+# requires
+# - some cluster of proteins/genes (e.g. up- and downregulated proteins/genes)
+# - functional annotations, i.e. a category summarizing a set of proteins/genes.
 #
-# - we need some groups of genes to compute clusters
-# - we need functional annotations, i.e. a category summarizing a set of genes.
-# -
 # You can start with watching Lars Juhl Jensen's brief introduction to enrichment analysis
 # on [youtube](https://www.youtube.com/watch?v=2NC1QOXmc5o).
 #
-# Use example data for ovarian cancer
-# ([PXD010372](https://github.com/Multiomics-Analytics-Group/acore/tree/main/example_data/PXD010372))
+# Here we use as example data from an ovarian cancer dataset:
+# [PXD010372](https://github.com/Multiomics-Analytics-Group/acore/tree/main/example_data/PXD010372)
+#
+# First make sure you have the required packages installed:
 
 
 # %% tags=["hide-output"]
 # %pip install acore vuecore 'plotly<6'
 
-
 # %%
 from pathlib import Path
 
@@ -43,6 +44,7 @@
 
 # %% [markdown]
 # # Load processed data
+# from our repository. See details on obtaining the data under the example data section.
 
 # %%
 df_omics = pd.read_csv(omics, index_col=0)
@@ -60,12 +62,11 @@
 
 # %% [markdown]
 # Keep only features with a certain amount of non-NaN values and select 100 of these
-# for illustration. Add the ones which were differently regulated in the ANOVA using all
+# for illustration. Add always four which were differently regulated in the ANOVA using all
 # the protein groups.
 
 # %%
 idx_always_included = ["Q5HYN5", "P39059", "O43432", "O43175"]
-df_omics[idx_always_included]
 
 # %% tags=["hide-input"]
 df_omics = (
@@ -82,13 +83,16 @@
 )
 df_omics
 
+# %% [markdown]
+# And we have the following patient metadata, from which we will use the `Status` column as
+# our dependent variable and the `PlatinumValue` as a covariate.
 
 # %%
 df_meta
 
 
 # %% [markdown]
-# # Compute up and downregulated genes
+# # ANCOVA: Compute up and downregulated genes
 # These will be used to find enrichments in the set of both up and downregulated genes.
 
 # %%
@@ -107,8 +111,8 @@
 diff_reg.query("rejected")
 
 # %% [markdown]
-# # Find functional annotations, here pathways
-#
+# # Download functional annotations, here pathways, for the protein groups
+# in our selection of the dataset.
 
 # %%
 fname_annotations = f"downloaded/annotations_{features_to_sample}.csv"
@@ -150,7 +154,8 @@
 annotations
 
 # %% [markdown]
-# See how many protein groups are associated with each annotation.
+# See how many protein groups are associated with each annotation. We observe that most
+# functional annotations are associated only to a single protein group in our dataset.
 
 # %% tags=["hide-input"]
 _ = (
@@ -209,8 +214,8 @@
 ret
 
 # %% [markdown]
-# And even more if we do not restrict the analysis to functional annotation to at least
-# finding two proteins in a functional set.
+# And even more if we do not restrict the analysis of finding at least two proteins
+# of a functional set in our data set (i.e. we only need to find one match from the set).
 
 # %%
 ret = acore.enrichment_analysis.run_up_down_regulation_enrichment(
@@ -265,7 +270,7 @@
 enrichtments.iloc[0].to_dict()
 
 # %%
-enrichtments["NES"].plot.hist()
+ax = enrichtments["NES"].plot.hist()
 
 # %% [markdown]
 # The normalised enrichment score (NES) can be used in a PCA plot to see if the samples