Dataset from https://www.kaggle.com/datasets/raghadalharbi/breast-cancer-gene-expression-profiles-metabric. Code for the biomarker exercise on breast cancer survival.
- Compute a PCA (marked with TODO in the code)
- Explain what happened in the code
- Change the gene panel to whatever you want and report on your results
You can either clone this repository and work locally on your notebook or work in Google Colab. Locally, we advise you to create a conda environment or a virtual environment. In this, you need to install notebook, pandas, numpy, scikit-learn, scikit-survival, and seaborn. You can, e.g., do this with:
python3 -m venv myvenv
source myvenv/bin/activate
pip install notebook pandas numpy scikit-learn scikit-survival matplotlib seaborn
On Colab, only scikit-survival is not installed. You can install it by inserting a line:
!pip install scikit-survival
You can upload the data on the left (folder icon).