Skip to content

Commit

Permalink
:constructions: add enrichtment example to docs
Browse files Browse the repository at this point in the history
  • Loading branch information
enryH committed Nov 25, 2024
1 parent 3961d84 commit 7f51e02
Show file tree
Hide file tree
Showing 2 changed files with 332 additions and 0 deletions.
331 changes: 331 additions & 0 deletions docs/api_examples/enrichment_analysis.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,331 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "f79a8051",
"metadata": {
"lines_to_next_cell": 2
},
"source": [
"# Enrichment analysis\n",
"\n",
"- we need some groups of genes to compute clusters\n",
"- we need functional annotations, i.e. a category summarizing a set of genes.\n",
"-\n",
"You can start with watching Lars Juhl Jensen's brief introduction to enrichment analysis\n",
"on [youtube](https://www.youtube.com/watch?v=2NC1QOXmc5o).\n",
"\n",
"Use example data for ovarian cancer\n",
"([PXD010372](https://github.com/Multiomics-Analytics-Group/acore/tree/main/example_data/PXD010372))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "956ed7b7",
"metadata": {
"lines_to_next_cell": 2,
"tags": [
"hide-output"
]
},
"outputs": [],
"source": [
"%pip install acore"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a3030d08",
"metadata": {},
"outputs": [],
"source": [
"from pathlib import Path\n",
"\n",
"import pandas as pd\n",
"\n",
"import acore\n",
"import acore.differential_regulation\n",
"import acore.enrichment_analysis"
]
},
{
"cell_type": "markdown",
"id": "fddd607c",
"metadata": {},
"source": [
"Parameters of this notebook"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6af9349a",
"metadata": {
"tags": [
"parameters"
]
},
"outputs": [],
"source": [
"base_path: str = (\n",
" \"https://raw.githubusercontent.com/Multiomics-Analytics-Group/acore/refs/heads/main/\"\n",
" \"example_data/PXD010372/processed\"\n",
")\n",
"omics: str = f\"{base_path}/omics.csv\"\n",
"meta_pgs: str = f\"{base_path}/meta_pgs.csv\"\n",
"meta: str = f\"{base_path}/meta_patients.csv\"\n",
"N_to_sample: int = 1_000"
]
},
{
"cell_type": "markdown",
"id": "10ed1830",
"metadata": {},
"source": [
"# Load processed data"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8d70ef4c",
"metadata": {},
"outputs": [],
"source": [
"df_omics = pd.read_csv(omics, index_col=0)\n",
"df_meta_pgs = pd.read_csv(meta_pgs, index_col=0)\n",
"df_meta = pd.read_csv(meta, index_col=0)\n",
"df_omics"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b3897e1f",
"metadata": {},
"outputs": [],
"source": [
"df_omics.notna().sum().sort_values(ascending=True).plot()"
]
},
{
"cell_type": "markdown",
"id": "8ce47108",
"metadata": {},
"source": [
"Keep only features with a certain amount of non-NaN values and select 100 of these\n",
"for illustration. Add the ones which were differently regulated in the ANOVA using all\n",
"the protein groups."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f3a8ab49",
"metadata": {},
"outputs": [],
"source": [
"idx_always_included = [\"Q5HYN5\", \"P39059\", \"O43432\", \"O43175\"]\n",
"df_omics[idx_always_included]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1145a2cd",
"metadata": {
"lines_to_next_cell": 2
},
"outputs": [],
"source": [
"df_omics = (\n",
" df_omics\n",
" # .dropna(axis=1)\n",
" .drop(idx_always_included, axis=1)\n",
" .dropna(thresh=18, axis=1)\n",
" .sample(\n",
" N_to_sample - len(idx_always_included),\n",
" axis=1,\n",
" random_state=42,\n",
" )\n",
" .join(df_omics[idx_always_included])\n",
")\n",
"df_omics"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aea77e80",
"metadata": {
"lines_to_next_cell": 2
},
"outputs": [],
"source": [
"df_meta"
]
},
{
"cell_type": "markdown",
"id": "4bbf5dc4",
"metadata": {},
"source": [
"## Compute up and downregulated genes\n",
"These will be used to find enrichments in the set of both up and downregulated genes."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "231bb6da",
"metadata": {},
"outputs": [],
"source": [
"group = \"Status\"\n",
"covariates = [\"PlatinumValue\"]\n",
"diff_reg = acore.differential_regulation.run_anova(\n",
" df_omics.join(df_meta[[group]]),\n",
" drop_cols=[],\n",
" subject=None,\n",
" group=group,\n",
")\n",
"diff_reg.describe(exclude=[\"float\"])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1e347b06",
"metadata": {},
"outputs": [],
"source": [
"diff_reg.query(\"rejected == True\")"
]
},
{
"cell_type": "markdown",
"id": "d6c0a225",
"metadata": {},
"source": [
"## Find functional annotations, here pathways\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d2668415",
"metadata": {},
"outputs": [],
"source": [
"from acore.io.uniprot import (\n",
" check_id_mapping_results_ready,\n",
" get_id_mapping_results_link,\n",
" get_id_mapping_results_search,\n",
" submit_id_mapping,\n",
")\n",
"\n",
"\n",
"def fetch_annotations(ids: pd.Index | list) -> pd.DataFrame:\n",
" \"\"\"Fetch annotations for UniProt IDs. Combines several calls to the API of UniProt's\n",
" knowledgebase (KB).\n",
"\n",
" Parameters\n",
" ----------\n",
" ids : pd.Index | list\n",
" Iterable of UniProt IDs. Fetches annotations as speecified by the specified fields.\n",
" fields : str, optional\n",
" Fields to fetch, by default \"accession,go_p,go_c. See for availble fields:\n",
" https://www.uniprot.org/help/return_fields\n",
"\n",
" Returns\n",
" -------\n",
" pd.DataFrame\n",
" DataFrame with annotations of the UniProt IDs.\n",
" \"\"\"\n",
" job_id = submit_id_mapping(from_db=\"UniProtKB_AC-ID\", to_db=\"UniProtKB\", ids=ids)\n",
"\n",
" if check_id_mapping_results_ready(job_id):\n",
" link = get_id_mapping_results_link(job_id)\n",
" # add fields to the link to get more information\n",
" # From and Entry (accession) are the same for UniProt IDs.\n",
" results = get_id_mapping_results_search(\n",
" link + \"?fields=accession,go_p,go_c,go_f&format=tsv\"\n",
" )\n",
" header = results.pop(0).split(\"\\t\")\n",
" results = [line.split(\"\\t\") for line in results]\n",
" df = pd.DataFrame(results, columns=header)\n",
" return df\n",
"\n",
"\n",
"fname_annotations = \"downloaded/annotations.csv\"\n",
"fname = Path(fname_annotations)\n",
"try:\n",
" annotations = pd.read_csv(fname, index_col=0)\n",
" print(f\"Loaded annotations from {fname}\")\n",
"except FileNotFoundError:\n",
" print(f\"Fetching annotations for {df_omics.columns.size} UniProt IDs.\")\n",
" annotations = fetch_annotations(df_omics.columns)\n",
" annotations = (\n",
" annotations.set_index(\"Entry\")\n",
" .rename_axis(\"identifier\")\n",
" .drop(\"From\", axis=1)\n",
" .rename_axis(\"source\", axis=1)\n",
" .stack()\n",
" .to_frame(\"annotation\")\n",
" .replace(\"\", pd.NA)\n",
" .dropna()\n",
" .sort_values([\"source\", \"annotation\"])\n",
" .reset_index()\n",
" )\n",
" fname.parent.mkdir(exist_ok=True, parents=True)\n",
" annotations.to_csv(fname, index=True)\n",
"\n",
"annotations"
]
},
{
"cell_type": "markdown",
"id": "4165bc94",
"metadata": {},
"source": [
"## Enrichment analysis\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f300c5b5",
"metadata": {
"lines_to_next_cell": 2
},
"outputs": [],
"source": [
"ret = acore.enrichment_analysis.run_regulation_enrichment(\n",
" regulation_data=diff_reg,\n",
" annotation=annotations,\n",
" correction_alpha=0.01,\n",
")\n",
"ret"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6dd57b99",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"jupytext": {
"cell_metadata_filter": "tags,-all",
"main_language": "python",
"notebook_metadata_filter": "-all"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@

api_examples/exploratory_analysis
api_examples/normalization_analysis
api_examples/enrichment_analysis

.. toctree::
:maxdepth: 1
Expand Down

0 comments on commit 7f51e02

Please sign in to comment.