Skip to content

Commit

Permalink
🎨 more explanations, less
Browse files Browse the repository at this point in the history
  • Loading branch information
enryH committed Feb 18, 2025
1 parent 469a2c4 commit 7ea8191
Show file tree
Hide file tree
Showing 2 changed files with 48 additions and 32 deletions.
45 changes: 28 additions & 17 deletions docs/api_examples/enrichment_analysis.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,23 +8,24 @@
},
"source": [
"# Enrichment analysis\n",
"requires\n",
"- some cluster of proteins/genes (e.g. up- and downregulated proteins/genes)\n",
"- functional annotations, i.e. a category summarizing a set of proteins/genes.\n",
"\n",
"- we need some groups of genes to compute clusters\n",
"- we need functional annotations, i.e. a category summarizing a set of genes.\n",
"-\n",
"You can start with watching Lars Juhl Jensen's brief introduction to enrichment analysis\n",
"on [youtube](https://www.youtube.com/watch?v=2NC1QOXmc5o).\n",
"\n",
"Use example data for ovarian cancer\n",
"([PXD010372](https://github.com/Multiomics-Analytics-Group/acore/tree/main/example_data/PXD010372))"
"Here we use as example data from an ovarian cancer dataset:\n",
"[PXD010372](https://github.com/Multiomics-Analytics-Group/acore/tree/main/example_data/PXD010372)\n",
"\n",
"First make sure you have the required packages installed:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "956ed7b7",
"metadata": {
"lines_to_next_cell": 2,
"tags": [
"hide-output"
]
Expand Down Expand Up @@ -88,7 +89,8 @@
"id": "10ed1830",
"metadata": {},
"source": [
"# Load processed data"
"# Load processed data\n",
"from our repository. See details on obtaining the data under the example data section."
]
},
{
Expand Down Expand Up @@ -125,7 +127,7 @@
"metadata": {},
"source": [
"Keep only features with a certain amount of non-NaN values and select 100 of these\n",
"for illustration. Add the ones which were differently regulated in the ANOVA using all\n",
"for illustration. Add always four which were differently regulated in the ANOVA using all\n",
"the protein groups."
]
},
Expand All @@ -136,16 +138,14 @@
"metadata": {},
"outputs": [],
"source": [
"idx_always_included = [\"Q5HYN5\", \"P39059\", \"O43432\", \"O43175\"]\n",
"df_omics[idx_always_included]"
"idx_always_included = [\"Q5HYN5\", \"P39059\", \"O43432\", \"O43175\"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1145a2cd",
"metadata": {
"lines_to_next_cell": 2,
"tags": [
"hide-input"
]
Expand All @@ -167,6 +167,15 @@
"df_omics"
]
},
{
"cell_type": "markdown",
"id": "ff72465c",
"metadata": {},
"source": [
"And we have the following patient metadata, from which we will use the `Status` column as\n",
"our dependent variable and the `PlatinumValue` as a covariate."
]
},
{
"cell_type": "code",
"execution_count": null,
Expand All @@ -184,7 +193,7 @@
"id": "4bbf5dc4",
"metadata": {},
"source": [
"# Compute up and downregulated genes\n",
"# ANCOVA: Compute up and downregulated genes\n",
"These will be used to find enrichments in the set of both up and downregulated genes."
]
},
Expand Down Expand Up @@ -222,7 +231,8 @@
"id": "d6c0a225",
"metadata": {},
"source": [
"# Find functional annotations, here pathways\n"
"# Download functional annotations, here pathways, for the protein groups\n",
"in our selection of the dataset."
]
},
{
Expand Down Expand Up @@ -276,7 +286,8 @@
"id": "d4734452",
"metadata": {},
"source": [
"See how many protein groups are associated with each annotation."
"See how many protein groups are associated with each annotation. We observe that most\n",
"functional annotations are associated only to a single protein group in our dataset."
]
},
{
Expand Down Expand Up @@ -391,8 +402,8 @@
"id": "e51bd7e3",
"metadata": {},
"source": [
"And even more if we do not restrict the analysis to functional annotation to at least\n",
"finding two proteins in a functional set."
"And even more if we do not restrict the analysis of finding at least two proteins\n",
"of a functional set in our data set (i.e. we only need to find one match from the set)."
]
},
{
Expand Down Expand Up @@ -504,7 +515,7 @@
"metadata": {},
"outputs": [],
"source": [
"enrichtments[\"NES\"].plot.hist()"
"ax = enrichtments[\"NES\"].plot.hist()"
]
},
{
Expand Down
35 changes: 20 additions & 15 deletions docs/api_examples/enrichment_analysis.py
Original file line number Diff line number Diff line change
@@ -1,20 +1,21 @@
# %% [markdown]
# # Enrichment analysis
# requires
# - some cluster of proteins/genes (e.g. up- and downregulated proteins/genes)
# - functional annotations, i.e. a category summarizing a set of proteins/genes.
#
# - we need some groups of genes to compute clusters
# - we need functional annotations, i.e. a category summarizing a set of genes.
# -
# You can start with watching Lars Juhl Jensen's brief introduction to enrichment analysis
# on [youtube](https://www.youtube.com/watch?v=2NC1QOXmc5o).
#
# Use example data for ovarian cancer
# ([PXD010372](https://github.com/Multiomics-Analytics-Group/acore/tree/main/example_data/PXD010372))
# Here we use as example data from an ovarian cancer dataset:
# [PXD010372](https://github.com/Multiomics-Analytics-Group/acore/tree/main/example_data/PXD010372)
#
# First make sure you have the required packages installed:


# %% tags=["hide-output"]
# %pip install acore vuecore 'plotly<6'


# %%
from pathlib import Path

Expand Down Expand Up @@ -43,6 +44,7 @@

# %% [markdown]
# # Load processed data
# from our repository. See details on obtaining the data under the example data section.

# %%
df_omics = pd.read_csv(omics, index_col=0)
Expand All @@ -60,12 +62,11 @@

# %% [markdown]
# Keep only features with a certain amount of non-NaN values and select 100 of these
# for illustration. Add the ones which were differently regulated in the ANOVA using all
# for illustration. Add always four which were differently regulated in the ANOVA using all
# the protein groups.

# %%
idx_always_included = ["Q5HYN5", "P39059", "O43432", "O43175"]
df_omics[idx_always_included]

# %% tags=["hide-input"]
df_omics = (
Expand All @@ -82,13 +83,16 @@
)
df_omics

# %% [markdown]
# And we have the following patient metadata, from which we will use the `Status` column as
# our dependent variable and the `PlatinumValue` as a covariate.

# %%
df_meta


# %% [markdown]
# # Compute up and downregulated genes
# # ANCOVA: Compute up and downregulated genes
# These will be used to find enrichments in the set of both up and downregulated genes.

# %%
Expand All @@ -107,8 +111,8 @@
diff_reg.query("rejected")

# %% [markdown]
# # Find functional annotations, here pathways
#
# # Download functional annotations, here pathways, for the protein groups
# in our selection of the dataset.

# %%
fname_annotations = f"downloaded/annotations_{features_to_sample}.csv"
Expand Down Expand Up @@ -150,7 +154,8 @@
annotations

# %% [markdown]
# See how many protein groups are associated with each annotation.
# See how many protein groups are associated with each annotation. We observe that most
# functional annotations are associated only to a single protein group in our dataset.

# %% tags=["hide-input"]
_ = (
Expand Down Expand Up @@ -209,8 +214,8 @@
ret

# %% [markdown]
# And even more if we do not restrict the analysis to functional annotation to at least
# finding two proteins in a functional set.
# And even more if we do not restrict the analysis of finding at least two proteins
# of a functional set in our data set (i.e. we only need to find one match from the set).

# %%
ret = acore.enrichment_analysis.run_up_down_regulation_enrichment(
Expand Down Expand Up @@ -265,7 +270,7 @@
enrichtments.iloc[0].to_dict()

# %%
enrichtments["NES"].plot.hist()
ax = enrichtments["NES"].plot.hist()

# %% [markdown]
# The normalised enrichment score (NES) can be used in a PCA plot to see if the samples
Expand Down

0 comments on commit 7ea8191

Please sign in to comment.