Skip to content

Commit

Permalink
✨ mostly cleaned-up ANCOVA notebook
Browse files Browse the repository at this point in the history
  • Loading branch information
FilippaQ committed Jan 14, 2025
1 parent 7909c07 commit 47a650c
Showing 1 changed file with 52 additions and 40 deletions.
92 changes: 52 additions & 40 deletions docs/api_examples/ANCOVA_analysis.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# ANCOVA analysis\n",
"\n",
"- [ ] include a PCA colored by groups as well as covariance factors"
"# ANCOVA analysis\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook shows how to use ANCOVA to compare data factoring in one or several covariates based on a proteomics dataset and a corresponding set of clinical metadata."
]
},
{
Expand All @@ -25,7 +30,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 2,
"metadata": {
"tags": [
"hide-input"
Expand All @@ -47,9 +52,17 @@
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here you can specify your data. In this example, we are sourcing from a github-available Alzheimer's Disease dataset. <br> \n",
"Here we also define the freq_cutoff, which will be used for filtering later. "
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 3,
"id": "3b15eca2",
"metadata": {
"tags": [
Expand All @@ -74,8 +87,8 @@
"id": "9bc64629",
"metadata": {},
"source": [
"## Load data.\n",
"Clinical data:"
"## Load data\n",
"View your clinical data:"
]
},
{
Expand All @@ -98,7 +111,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Proteomics data:"
"View your omics data:"
]
},
{
Expand All @@ -125,7 +138,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"If data is already filtered and/or imputed, skip this step."
"In this step, omics data is filtered based on your previously determined freq_cutoff, which indicates the percentage of values which need to be present for a feature to not be filtered. <br> \n",
"Protein group names are also reduced to only display the first protein of the protein group, and intensities are log2-transformed. <br> \n",
"If data is already filtered and/or imputed, you can skip this step."
]
},
{
Expand Down Expand Up @@ -156,19 +171,12 @@
"omics"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Consider replacing with the filter from the acore package!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Preparing metadata\n",
"add both relevant clinical information to the omics data"
"Add relevant clinical information to the omics data. In this case, 'AD' is the Alzheimer's Disease status, and gender ('male') and age are covariates of interest."
]
},
{
Expand Down Expand Up @@ -205,7 +213,7 @@
"metadata": {},
"source": [
"### Checking missing data\n",
"... between two AD groups (after previous filtering)"
"We now compare missing data between the two AD groups (after previous filtering)."
]
},
{
Expand All @@ -229,7 +237,7 @@
"id": "7b673a65",
"metadata": {},
"source": [
"Plot number of missing values per group, ordered by proportion of non-misisng values\n",
"Plot number of missing values per group, ordered by proportion of non-misisng values.\n",
"in non-Alzheimer disease group"
]
},
Expand All @@ -251,7 +259,7 @@
"id": "a6fa14ee",
"metadata": {},
"source": [
"Plot 20 protein groups with biggest difference in missing values between groups"
"We can then plot the 20 protein groups with biggest difference in missing values between groups."
]
},
{
Expand Down Expand Up @@ -292,7 +300,7 @@
"metadata": {},
"source": [
"## Running ANCOVA analysis\n",
"Use combined dataset for ANCOVA analysis."
"We use the combined dataset to run ANCOVA analysis, i.e. the omics data which includes the dependent variable AD and the covariates."
]
},
{
Expand All @@ -313,7 +321,7 @@
"id": "12a07afd",
"metadata": {},
"source": [
"metadata here is of type integer. All floats are proteomics measurements."
"The metadata here is integer type. All floats are proteomics measurements."
]
},
{
Expand Down Expand Up @@ -350,7 +358,7 @@
"id": "a0ff095c",
"metadata": {},
"source": [
"run ANCOVA analysis"
"Run ANCOVA analysis."
]
},
{
Expand All @@ -359,19 +367,16 @@
"metadata": {},
"outputs": [],
"source": [
"# omics_and_clinic = omics_and_clinic.astype(float)\n",
"# ? this is no needed for run_ancova (the regex where groups are joined)\n",
"ancova = (\n",
" ad.run_ancova(\n",
" omics_and_clinic.astype({\"AD\": str}), # ! target needs to be of type str\n",
" # subject='Sample ID', # not used\n",
" omics_and_clinic.astype({\"AD\": str}), # hint: target needs to be of type str\n",
" drop_cols=[],\n",
" group=\"AD\", # needs to be a string\n",
" group=\"AD\", \n",
" covariates=covariates,\n",
" )\n",
" .set_index(\"identifier\")\n",
" .sort_values(by=\"posthoc padj\")\n",
") # need to be floats?\n",
") \n",
"ancova_acore = ancova\n",
"ancova"
]
Expand All @@ -381,17 +386,24 @@
"id": "b7cee413",
"metadata": {},
"source": [
"The first columns contain group averages for each group for the specific\n",
"protein group\n",
"ancova.iloc[:, :6]"
"The first columns contain group averages for each group for the specific protein group:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ancova.iloc[:,:6]"
]
},
{
"cell_type": "markdown",
"id": "3cc00d86",
"metadata": {},
"source": [
"The others contain the test results (based on a linear model) for each protein group\n",
"The last columns contain the test results (based on a linear model) for each protein group\n",
"(on each row). Some information is duplicated."
]
},
Expand Down Expand Up @@ -436,7 +448,7 @@
"metadata": {},
"source": [
"## ANOVA analysis\n",
"not controlling for covariates\n",
"The ANOVA analysis compares groups similarly to the ANCOVA, but without factoring in covariates. To compare significant hits when analyzing with and without covariates, we are running an ANOVA.\n",
"> To check: pvalues for proteins with missing mean values? some merging issue?"
]
},
Expand Down Expand Up @@ -464,7 +476,7 @@
"id": "28434ecb",
"metadata": {},
"source": [
"view averages per protein group"
"The output table is similar to that of ANCOVA. In the first columns, averages per protein group are listed."
]
},
{
Expand All @@ -487,7 +499,7 @@
"id": "3d362360",
"metadata": {},
"source": [
"Test results"
"The last columns contain the test results for each protein group (the identifiers)."
]
},
{
Expand All @@ -512,7 +524,7 @@
"id": "d02676e9",
"metadata": {},
"source": [
"Other information"
"The remaining columns are related to test method."
]
},
{
Expand All @@ -535,7 +547,7 @@
"metadata": {},
"source": [
"## Comparing ANOVA and ANCOVA results\n",
"Cross tabulated results after FDR correction for both ANOVA and ANCOVA"
"Cross-tabulated results after FDR correction for both ANOVA and ANCOVA."
]
},
{
Expand Down Expand Up @@ -565,7 +577,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "acore",
"display_name": ".venv",
"language": "python",
"name": "python3"
},
Expand All @@ -579,7 +591,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.10"
"version": "3.9.6"
}
},
"nbformat": 4,
Expand Down

0 comments on commit 47a650c

Please sign in to comment.