Skip to content

Commit bf2c711

Browse files
committed
Upload custom data analysis notebooks
1 parent 4a7bf0f commit bf2c711

File tree

5 files changed

+638
-5
lines changed

5 files changed

+638
-5
lines changed

02-custom-data-analysis/README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,15 @@
1-
# Custom Data Notebook
1+
# Custom Data Analysis
22

33
Inside this folder, you'll find the Jupyter notebook `custom-data-notebook.ipynb`. This notebook is designed to be your starting point for your data science journey, leveraging the knowledge you've gained from our previous workshop sessions. Here, you'll have the opportunity to apply your skills in prompt engineering and data science using a dataset of your choice.
44

55
## Notebook Structure
66

7-
The notebook is structured into three main sections: set-up, data analysis, and data modeling. It follows the same format as the notebooks in `01-cancer-data-analysis` directory, with the additional inclusion of the data modeling section. This new section is dedicated to building machine learning models tailored for datasets that exhibit predictability potential.
7+
The notebooks are structured into three main sections: set-up, data analysis, and data modeling. It follows the same format as the notebooks in `01-cancer-data-analysis` directory, with the additional inclusion of the data modeling section. This new section is dedicated to building machine learning models tailored for datasets that exhibit predictability potential.
88

99
## Recommended datasets
1010

11+
In addition to the `custom-data-notebook.ipynb` notebook, there are several other notebooks in this directory. They already have the data loading step completed so you can jump straight into the fun stuff (exploration, processing, visualization/analysis, modeling).
12+
1113
### [RODA](https://registry.opendata.aws/)
1214

1315
The Registry of Open Data on AWS (RODA) makes it easy for people to find datasets that are publicly available through AWS. Below are recommended datasets from RODA:
Lines changed: 208 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,208 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Power Up Research Software Development with Github Copilot\n"
8+
]
9+
},
10+
{
11+
"cell_type": "markdown",
12+
"metadata": {},
13+
"source": [
14+
"In this notebook, you will have the creative freedom to work with any dataset of your interest. Below are some sources for datasets that may be fun to work with.\n",
15+
"\n",
16+
"- [RODA](https://registry.opendata.aws/) - The Registry of Open Data on AWS (RODA) makes it easy for people to find datasets that are publicly available through AWS.\n",
17+
"\n",
18+
"- [UCI Machine Learning Repository](https://archive.ics.uci.edu/datasets) - The UCI Machine Learning Repository is a collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms.\n",
19+
"\n",
20+
"- [scikit](https://scikit-learn.org/stable/datasets.html) - Scikit-learn is a popular machine learning library in Python. It provides various datasets for practice and experimentation, often used in tutorials and examples to demonstrate machine learning algorithms and techniques."
21+
]
22+
},
23+
{
24+
"cell_type": "markdown",
25+
"metadata": {},
26+
"source": [
27+
"### 1.0 Set-up"
28+
]
29+
},
30+
{
31+
"cell_type": "code",
32+
"execution_count": null,
33+
"metadata": {},
34+
"outputs": [],
35+
"source": [
36+
"import pandas as pd\n",
37+
"import numpy as np\n",
38+
"import matplotlib.pyplot as plt\n",
39+
"import seaborn as sns"
40+
]
41+
},
42+
{
43+
"cell_type": "code",
44+
"execution_count": null,
45+
"metadata": {},
46+
"outputs": [],
47+
"source": [
48+
"%pip install ucimlrepo"
49+
]
50+
},
51+
{
52+
"cell_type": "code",
53+
"execution_count": null,
54+
"metadata": {},
55+
"outputs": [],
56+
"source": [
57+
"from ucimlrepo import fetch_ucirepo \n",
58+
" \n",
59+
"# fetch dataset \n",
60+
"bank_marketing = fetch_ucirepo(id=222) \n",
61+
" \n",
62+
"# data (as pandas dataframes) \n",
63+
"X = bank_marketing.data.features \n",
64+
"y = bank_marketing.data.targets \n",
65+
" \n",
66+
"# metadata \n",
67+
"print(bank_marketing.metadata) \n",
68+
" \n",
69+
"# variable information \n",
70+
"print(bank_marketing.variables) \n"
71+
]
72+
},
73+
{
74+
"cell_type": "code",
75+
"execution_count": null,
76+
"metadata": {},
77+
"outputs": [],
78+
"source": []
79+
},
80+
{
81+
"cell_type": "markdown",
82+
"metadata": {},
83+
"source": [
84+
"### 2.0 Data analysis"
85+
]
86+
},
87+
{
88+
"cell_type": "markdown",
89+
"metadata": {},
90+
"source": [
91+
"#### 2.1 Data exploration"
92+
]
93+
},
94+
{
95+
"cell_type": "code",
96+
"execution_count": null,
97+
"metadata": {},
98+
"outputs": [],
99+
"source": []
100+
},
101+
{
102+
"cell_type": "code",
103+
"execution_count": null,
104+
"metadata": {},
105+
"outputs": [],
106+
"source": []
107+
},
108+
{
109+
"cell_type": "markdown",
110+
"metadata": {},
111+
"source": [
112+
"#### 2.2 Data processing"
113+
]
114+
},
115+
{
116+
"cell_type": "markdown",
117+
"metadata": {},
118+
"source": []
119+
},
120+
{
121+
"cell_type": "code",
122+
"execution_count": null,
123+
"metadata": {},
124+
"outputs": [],
125+
"source": []
126+
},
127+
{
128+
"cell_type": "markdown",
129+
"metadata": {},
130+
"source": [
131+
"#### 2.3 Data visualization"
132+
]
133+
},
134+
{
135+
"cell_type": "markdown",
136+
"metadata": {},
137+
"source": []
138+
},
139+
{
140+
"cell_type": "code",
141+
"execution_count": null,
142+
"metadata": {},
143+
"outputs": [],
144+
"source": []
145+
},
146+
{
147+
"cell_type": "markdown",
148+
"metadata": {},
149+
"source": [
150+
"#### 2.4 Additional analysis"
151+
]
152+
},
153+
{
154+
"cell_type": "markdown",
155+
"metadata": {},
156+
"source": []
157+
},
158+
{
159+
"cell_type": "code",
160+
"execution_count": null,
161+
"metadata": {},
162+
"outputs": [],
163+
"source": []
164+
},
165+
{
166+
"cell_type": "markdown",
167+
"metadata": {},
168+
"source": [
169+
"### 3.0 Data Modelling"
170+
]
171+
},
172+
{
173+
"cell_type": "code",
174+
"execution_count": null,
175+
"metadata": {},
176+
"outputs": [],
177+
"source": []
178+
},
179+
{
180+
"cell_type": "code",
181+
"execution_count": null,
182+
"metadata": {},
183+
"outputs": [],
184+
"source": []
185+
}
186+
],
187+
"metadata": {
188+
"kernelspec": {
189+
"display_name": "githubcopilotworkshop",
190+
"language": "python",
191+
"name": "python3"
192+
},
193+
"language_info": {
194+
"codemirror_mode": {
195+
"name": "ipython",
196+
"version": 3
197+
},
198+
"file_extension": ".py",
199+
"mimetype": "text/x-python",
200+
"name": "python",
201+
"nbconvert_exporter": "python",
202+
"pygments_lexer": "ipython3",
203+
"version": "3.10.14"
204+
}
205+
},
206+
"nbformat": 4,
207+
"nbformat_minor": 2
208+
}

02-custom-data-analysis/custom-data-notebook.ipynb

Lines changed: 22 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,10 +29,15 @@
2929
},
3030
{
3131
"cell_type": "code",
32-
"execution_count": null,
32+
"execution_count": 1,
3333
"metadata": {},
3434
"outputs": [],
35-
"source": []
35+
"source": [
36+
"import pandas as pd\n",
37+
"import numpy as np\n",
38+
"import matplotlib.pyplot as plt\n",
39+
"import seaborn as sns"
40+
]
3641
},
3742
{
3843
"cell_type": "code",
@@ -149,8 +154,22 @@
149154
}
150155
],
151156
"metadata": {
157+
"kernelspec": {
158+
"display_name": "githubcopilotworkshop",
159+
"language": "python",
160+
"name": "python3"
161+
},
152162
"language_info": {
153-
"name": "python"
163+
"codemirror_mode": {
164+
"name": "ipython",
165+
"version": 3
166+
},
167+
"file_extension": ".py",
168+
"mimetype": "text/x-python",
169+
"name": "python",
170+
"nbconvert_exporter": "python",
171+
"pygments_lexer": "ipython3",
172+
"version": "3.10.14"
154173
}
155174
},
156175
"nbformat": 4,

0 commit comments

Comments
 (0)