Internet2
diff --git a/‎02-custom-data-analysis/README.md
Lines changed: 4 additions & 2 deletions b/‎02-custom-data-analysis/README.md
Lines changed: 4 additions & 2 deletions
diff --git a/‎02-custom-data-analysis/bank-marketing-notebook.ipynb
Lines changed: 208 additions & 0 deletions b/‎02-custom-data-analysis/bank-marketing-notebook.ipynb
Lines changed: 208 additions & 0 deletions
diff --git a/‎02-custom-data-analysis/custom-data-notebook.ipynb
Lines changed: 22 additions & 3 deletions b/‎02-custom-data-analysis/custom-data-notebook.ipynb
Lines changed: 22 additions & 3 deletions
@@ -1,13 +1,15 @@
-# Custom Data Notebook
+# Custom Data Analysis
 
 Inside this folder, you'll find the Jupyter notebook `custom-data-notebook.ipynb`. This notebook is designed to be your starting point for your data science journey, leveraging the knowledge you've gained from our previous workshop sessions. Here, you'll have the opportunity to apply your skills in prompt engineering and data science using a dataset of your choice.
 
 ## Notebook Structure
 
-The notebook is structured into three main sections: set-up, data analysis, and data modeling. It follows the same format as the notebooks in `01-cancer-data-analysis` directory, with the additional inclusion of the data modeling section. This new section is dedicated to building machine learning models tailored for datasets that exhibit predictability potential.
+The notebooks are structured into three main sections: set-up, data analysis, and data modeling. It follows the same format as the notebooks in `01-cancer-data-analysis` directory, with the additional inclusion of the data modeling section. This new section is dedicated to building machine learning models tailored for datasets that exhibit predictability potential.
 
 ## Recommended datasets
 
+In addition to the `custom-data-notebook.ipynb` notebook, there are several other notebooks in this directory. They already have the data loading step completed so you can jump straight into the fun stuff (exploration, processing, visualization/analysis, modeling).
+
 ### [RODA](https://registry.opendata.aws/)
 
 The Registry of Open Data on AWS (RODA) makes it easy for people to find datasets that are publicly available through AWS. Below are recommended datasets from RODA:
 
@@ -0,0 +1,208 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Power Up Research Software Development with Github Copilot\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In this notebook, you will have the creative freedom to work with any dataset of your interest. Below are some sources for datasets that may be fun to work with.\n",
+    "\n",
+    "- [RODA](https://registry.opendata.aws/) -  The Registry of Open Data on AWS (RODA) makes it easy for people to find datasets that are publicly available through AWS.\n",
+    "\n",
+    "- [UCI Machine Learning Repository](https://archive.ics.uci.edu/datasets) - The UCI Machine Learning Repository is a collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms.\n",
+    "\n",
+    "- [scikit](https://scikit-learn.org/stable/datasets.html) - Scikit-learn is a popular machine learning library in Python. It provides various datasets for practice and experimentation, often used in tutorials and examples to demonstrate machine learning algorithms and techniques."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 1.0 Set-up"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "import seaborn as sns"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install ucimlrepo"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from ucimlrepo import fetch_ucirepo \n",
+    "  \n",
+    "# fetch dataset \n",
+    "bank_marketing = fetch_ucirepo(id=222) \n",
+    "  \n",
+    "# data (as pandas dataframes) \n",
+    "X = bank_marketing.data.features \n",
+    "y = bank_marketing.data.targets \n",
+    "  \n",
+    "# metadata \n",
+    "print(bank_marketing.metadata) \n",
+    "  \n",
+    "# variable information \n",
+    "print(bank_marketing.variables) \n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 2.0 Data analysis"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### 2.1 Data exploration"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### 2.2 Data processing"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### 2.3 Data visualization"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### 2.4 Additional analysis"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 3.0 Data Modelling"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "githubcopilotworkshop",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.14"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
@@ -29,10 +29,15 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 1,
    "metadata": {},
    "outputs": [],
-   "source": []
+   "source": [
+    "import pandas as pd\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "import seaborn as sns"
+   ]
   },
   {
    "cell_type": "code",
@@ -149,8 +154,22 @@
   }
  ],
  "metadata": {
+  "kernelspec": {
+   "display_name": "githubcopilotworkshop",
+   "language": "python",
+   "name": "python3"
+  },
   "language_info": {
-   "name": "python"
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.14"
   }
  },
  "nbformat": 4,