✨ mostly cleaned-up ANCOVA notebook

Multiomics-Analytics-Group · Jan 14, 2025 · 47a650c · 47a650c
1 parent 7909c07
commit 47a650c
Showing 1 changed file with 52 additions and 40 deletions.
diff --git a/docs/api_examples/ANCOVA_analysis.ipynb b/docs/api_examples/ANCOVA_analysis.ipynb
@@ -4,9 +4,14 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# ANCOVA analysis\n",
-    "\n",
-    "- [ ] include a PCA colored by groups as well as covariance factors"
+    "# ANCOVA analysis\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This notebook shows how to use ANCOVA to compare data factoring in one or several covariates based on a proteomics dataset and a corresponding set of clinical metadata."
    ]
   },
   {
@@ -25,7 +30,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 2,
    "metadata": {
     "tags": [
      "hide-input"
@@ -47,9 +52,17 @@
     ")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Here you can specify your data. In this example, we are sourcing from a github-available Alzheimer's Disease dataset. <br>  \n",
+    "Here we also define the freq_cutoff, which will be used for filtering later. "
+   ]
+  },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 3,
    "id": "3b15eca2",
    "metadata": {
     "tags": [
@@ -74,8 +87,8 @@
    "id": "9bc64629",
    "metadata": {},
    "source": [
-    "## Load data.\n",
-    "Clinical data:"
+    "## Load data\n",
+    "View your clinical data:"
    ]
   },
   {
@@ -98,7 +111,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Proteomics data:"
+    "View your omics data:"
    ]
   },
   {
@@ -125,7 +138,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "If data is already filtered and/or imputed, skip this step."
+    "In this step, omics data is filtered based on your previously determined freq_cutoff, which indicates the percentage of values which need to be present for a feature to not be filtered. <br>  \n",
+    "Protein group names are also reduced to only display the first protein of the protein group, and intensities are log2-transformed. <br>  \n",
+    "If data is already filtered and/or imputed, you can skip this step."
    ]
   },
   {
@@ -156,19 +171,12 @@
     "omics"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Consider replacing with the filter from the acore package!"
-   ]
-  },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "## Preparing metadata\n",
-    "add both relevant clinical information to the omics data"
+    "Add relevant clinical information to the omics data. In this case, 'AD' is the Alzheimer's Disease status, and gender ('male') and age are covariates of interest."
    ]
   },
   {
@@ -205,7 +213,7 @@
    "metadata": {},
    "source": [
     "### Checking missing data\n",
-    "... between two AD groups (after previous filtering)"
+    "We now compare missing data between the two AD groups (after previous filtering)."
    ]
   },
   {
@@ -229,7 +237,7 @@
    "id": "7b673a65",
    "metadata": {},
    "source": [
-    "Plot number of missing values per group, ordered by proportion of non-misisng values\n",
+    "Plot number of missing values per group, ordered by proportion of non-misisng values.\n",
     "in non-Alzheimer disease group"
    ]
   },
@@ -251,7 +259,7 @@
    "id": "a6fa14ee",
    "metadata": {},
    "source": [
-    "Plot 20 protein groups with biggest difference in missing values between groups"
+    "We can then plot the 20 protein groups with biggest difference in missing values between groups."
    ]
   },
   {
@@ -292,7 +300,7 @@
    "metadata": {},
    "source": [
     "## Running ANCOVA analysis\n",
-    "Use combined dataset for ANCOVA analysis."
+    "We use the combined dataset to run ANCOVA analysis, i.e. the omics data which includes the dependent variable AD and the covariates."
    ]
   },
   {
@@ -313,7 +321,7 @@
    "id": "12a07afd",
    "metadata": {},
    "source": [
-    "metadata here is of type integer. All floats are proteomics measurements."
+    "The metadata here is integer type. All floats are proteomics measurements."
    ]
   },
   {
@@ -350,7 +358,7 @@
    "id": "a0ff095c",
    "metadata": {},
    "source": [
-    "run ANCOVA analysis"
+    "Run ANCOVA analysis."
    ]
   },
   {
@@ -359,19 +367,16 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# omics_and_clinic = omics_and_clinic.astype(float)\n",
-    "# ? this is no needed for run_ancova (the regex where groups are joined)\n",
     "ancova = (\n",
     "    ad.run_ancova(\n",
-    "        omics_and_clinic.astype({\"AD\": str}),  # ! target needs to be of type str\n",
-    "        # subject='Sample ID', # not used\n",
+    "        omics_and_clinic.astype({\"AD\": str}),  # hint: target needs to be of type str\n",
     "        drop_cols=[],\n",
-    "        group=\"AD\",  # needs to be a string\n",
+    "        group=\"AD\",  \n",
     "        covariates=covariates,\n",
     "    )\n",
     "    .set_index(\"identifier\")\n",
     "    .sort_values(by=\"posthoc padj\")\n",
-    ")  # need to be floats?\n",
+    ")  \n",
     "ancova_acore = ancova\n",
     "ancova"
    ]
@@ -381,17 +386,24 @@
    "id": "b7cee413",
    "metadata": {},
    "source": [
-    "The first columns contain group averages for each group for the specific\n",
-    "protein group\n",
-    "ancova.iloc[:, :6]"
+    "The first columns contain group averages for each group for the specific protein group:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ancova.iloc[:,:6]"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "3cc00d86",
    "metadata": {},
    "source": [
-    "The others contain the test results (based on a linear model) for each protein group\n",
+    "The last columns contain the test results (based on a linear model) for each protein group\n",
     "(on each row). Some information is duplicated."
    ]
   },
@@ -436,7 +448,7 @@
    "metadata": {},
    "source": [
     "## ANOVA analysis\n",
-    "not controlling for covariates\n",
+    "The ANOVA analysis compares groups similarly to the ANCOVA, but without factoring in covariates. To compare significant hits when analyzing with and without covariates, we are running an ANOVA.\n",
     "> To check: pvalues for proteins with missing mean values? some merging issue?"
    ]
   },
@@ -464,7 +476,7 @@
    "id": "28434ecb",
    "metadata": {},
    "source": [
-    "view averages per protein group"
+    "The output table is similar to that of ANCOVA. In the first columns, averages per protein group are listed."
    ]
   },
   {
@@ -487,7 +499,7 @@
    "id": "3d362360",
    "metadata": {},
    "source": [
-    "Test results"
+    "The last columns contain the test results for each protein group (the identifiers)."
    ]
   },
   {
@@ -512,7 +524,7 @@
    "id": "d02676e9",
    "metadata": {},
    "source": [
-    "Other information"
+    "The remaining columns are related to test method."
    ]
   },
   {
@@ -535,7 +547,7 @@
    "metadata": {},
    "source": [
     "## Comparing ANOVA and ANCOVA results\n",
-    "Cross tabulated results after FDR correction for both ANOVA and ANCOVA"
+    "Cross-tabulated results after FDR correction for both ANOVA and ANCOVA."
    ]
   },
   {
@@ -565,7 +577,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "acore",
+   "display_name": ".venv",
    "language": "python",
    "name": "python3"
   },
@@ -579,7 +591,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.10"
+   "version": "3.9.6"
   }
  },
  "nbformat": 4,