Skip to content

Commit 9e80987

Browse files
committed
restructure + formatting
1 parent 6bd29cb commit 9e80987

7 files changed

+2671
-1
lines changed

README.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -72,4 +72,5 @@ Here are other Python workshops offered by the D-Lab:
7272
* [Python Data Wrangling and Manipulation with Pandas](https://dlab.berkeley.edu/events/python-data-wrangling-and-manipulation-pandas/2024-10-10)
7373

7474
# Contributors
75-
* Aaron Culich
75+
* Aaron Culich
76+
* Tom van Nuenen

lessons/.DS_Store

6 KB
Binary file not shown.

lessons/1_SQLite-Setup.ipynb

+379
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,379 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "6d9b77b7-650a-44c5-a5f6-f89a8fafbc5f",
6+
"metadata": {},
7+
"source": [
8+
"# SQLite in Python: Intro & Setup\n",
9+
"\n",
10+
"* * * \n",
11+
"\n",
12+
"<div class=\"alert alert-success\"> \n",
13+
" \n",
14+
"### Learning Objectives \n",
15+
"\n",
16+
"* Navigate and understand the structure of a relational database.\n",
17+
"* Become familiar with relational database concepts.\n",
18+
"* Setting up SQLite in Jupyter\n",
19+
"* Setting up SQLite in browser\n",
20+
"* Converting a spreadsheet to SQLite\n",
21+
"</div>\n",
22+
"\n",
23+
"### Icons Used in This Notebook\n",
24+
"🔔 **Question**: A quick question to help you understand what's going on.<br>\n",
25+
"💡 **Tip**: How to do something a bit more efficiently or effectively.<br>\n",
26+
"📝 **Poll:** A Zoom poll to check in with you.<br>\n",
27+
"\n",
28+
"### Sections\n",
29+
"1. [This workshop](#this)\n",
30+
"2. [What is SQL?](#what)\n",
31+
"3. [Why SQLite?](#why)\n",
32+
"4. [SQLite Jupyter Notebook Setup](#setup)\n",
33+
"5. [SQLite Browser Setup](#browser)\n",
34+
"6. [Converting a Spreadsheet to SQLite](#convert)"
35+
]
36+
},
37+
{
38+
"cell_type": "markdown",
39+
"id": "9ce7639d-2285-4b26-a5bf-972169372175",
40+
"metadata": {},
41+
"source": [
42+
"<a id='this'></a>\n",
43+
"\n",
44+
"# This workshop\n",
45+
"\n",
46+
"This workshop introduces the fundamentals of SQL, with a focus on using SQLite (the most ubiquitous database on the planet) for data science tasks. We'll explore how SQL can be used to query and manipulate relational databases. This hands-on workshop includes exercises based on real-world datasets.\n"
47+
]
48+
},
49+
{
50+
"cell_type": "markdown",
51+
"id": "98b3152f-8322-4dbc-bd4c-d4f1f7d59525",
52+
"metadata": {},
53+
"source": [
54+
"<a id='what'></a>\n",
55+
"\n",
56+
"# What is SQL?\n",
57+
"\n",
58+
"Imagine you have a large spreadsheet filled with **thousands of rows of data**. It quickly becomes overwhelming to find exactly what you’re looking for, update details, or combine information from different sheets. \n",
59+
"\n",
60+
"**SQL (Structured Query Language)** solves this problem. It’s like having a conversation with your data. You ask clear, structured questions, and your data gives precise answers. SQL lets you quickly organize, filter, and analyze large amounts of data without the hassle of scrolling through endless rows.\n",
61+
"\n",
62+
"SQL is a **standard programming language** used to manage, query, and interact with databases. It allows you to perform tasks such as:\n",
63+
"\n",
64+
"* Storing data: Create and populate tables.\n",
65+
"* Querying data: Retrieve information based on specific conditions.\n",
66+
"* Manipulating data: Insert, update, or delete records.\n",
67+
"* Analyzing data: Summarize, aggregate, and join tables to extract insights.\n",
68+
"\n",
69+
"SQL makes working with structured data efficient and organized.\n",
70+
"\n",
71+
"Common SQL operations include:\n",
72+
"* SELECT: Retrieve data from tables\n",
73+
"* INSERT: Add new data\n",
74+
"* UPDATE: Modify existing data\n",
75+
"* DELETE: Remove data\n",
76+
"* JOIN: Combine data from multiple tables\n"
77+
]
78+
},
79+
{
80+
"cell_type": "markdown",
81+
"id": "7ca29a25-f696-4c76-8f0e-72ed98f67ebc",
82+
"metadata": {},
83+
"source": [
84+
"<a id='why'></a>\n",
85+
"\n",
86+
"# Why SQLite?\n",
87+
"\n",
88+
"SQLite is a lightweight, open-source database system ideal for learning, prototyping, and small- to medium-sized applications. It’s a popular choice for data science workshops because:\n",
89+
"* Self-contained: Requires no server or complex setup.\n",
90+
"* Zero-configuration: Easy to install and use immediately.\n",
91+
"* Portable: Stores data in a single file, easily shareable.\n",
92+
"* Widely used: Powers many applications and devices (browsers, mobile apps, etc.).\n",
93+
"* Great for learning: Simple, intuitive, and fast—perfect for beginners and teaching.\n",
94+
"\n",
95+
"SQLite enables you to quickly apply SQL in a hands-on environment without the overhead of heavier database systems."
96+
]
97+
},
98+
{
99+
"cell_type": "markdown",
100+
"id": "193c6046-ab41-49bd-a052-0fa88e94d8ee",
101+
"metadata": {},
102+
"source": [
103+
"<a id='setup'></a>\n",
104+
"\n",
105+
"# SQLite Jupyter Notebook Setup\n",
106+
"\n",
107+
"### 1. Install Python Packages"
108+
]
109+
},
110+
{
111+
"cell_type": "code",
112+
"execution_count": null,
113+
"id": "360b0379-7df8-4e67-bf82-69d2bb358d93",
114+
"metadata": {},
115+
"outputs": [],
116+
"source": [
117+
"%pip install ipython-sql\n",
118+
"%pip install sqlalchemy\n",
119+
"%pip install prettytable==3.11.0"
120+
]
121+
},
122+
{
123+
"cell_type": "markdown",
124+
"id": "6f1ecf9c-ed62-4303-8030-ba723d47dc1f",
125+
"metadata": {},
126+
"source": [
127+
"### 2. Load SQL extension for Jupyter"
128+
]
129+
},
130+
{
131+
"cell_type": "code",
132+
"execution_count": null,
133+
"id": "a3e9f8af-6fbd-431f-aa4c-be49ce5fda45",
134+
"metadata": {},
135+
"outputs": [],
136+
"source": [
137+
"%load_ext sql"
138+
]
139+
},
140+
{
141+
"cell_type": "markdown",
142+
"id": "cab3c394-edaa-4f6b-a48f-c3c86326e772",
143+
"metadata": {},
144+
"source": [
145+
"### 3. Open the `simplefolks.db` SQLite database"
146+
]
147+
},
148+
{
149+
"cell_type": "code",
150+
"execution_count": null,
151+
"id": "ce9f43af-2dbd-4425-bfa8-bd2434861c2c",
152+
"metadata": {},
153+
"outputs": [],
154+
"source": [
155+
"%sql sqlite:///data/simplefolks.db"
156+
]
157+
},
158+
{
159+
"cell_type": "markdown",
160+
"id": "b7fcd957-424b-446b-8f4e-ca2bcc0e2e73",
161+
"metadata": {},
162+
"source": [
163+
"### 4. List the names of all tables in the database"
164+
]
165+
},
166+
{
167+
"cell_type": "code",
168+
"execution_count": null,
169+
"id": "de7403d2-f46f-4677-b049-9a48934a65e8",
170+
"metadata": {},
171+
"outputs": [],
172+
"source": [
173+
"%sql SELECT name FROM sqlite_master;"
174+
]
175+
},
176+
{
177+
"cell_type": "markdown",
178+
"id": "57d8c882-f86c-4460-9b70-63d6e36335d1",
179+
"metadata": {},
180+
"source": [
181+
"### 5. List tables with schema CREATE statements"
182+
]
183+
},
184+
{
185+
"cell_type": "code",
186+
"execution_count": null,
187+
"id": "e373b297-5ffd-4b13-a163-6b004fa5ae08",
188+
"metadata": {},
189+
"outputs": [],
190+
"source": [
191+
"%sql SELECT * FROM sqlite_master;"
192+
]
193+
},
194+
{
195+
"cell_type": "markdown",
196+
"id": "a89a37bd-89d7-423d-b51c-f506c666258b",
197+
"metadata": {},
198+
"source": [
199+
"<a id='browser'></a>\n",
200+
"\n",
201+
"# SQLite Browser Setup Instructions\n",
202+
"\n",
203+
"### 1. Install SQLite Browser\n",
204+
"\n",
205+
"**SQLite Browser** is a free, open-source tool that allows you to create, design, and edit SQLite database files.\n",
206+
"\n",
207+
"- **For Windows:**\n",
208+
" - Download the [Standard installer for 64-bit Windows](https://download.sqlitebrowser.org/DB.Browser.for.SQLite-v3.13.1-win64.msi) from the [SQLite Browser Download Page](https://sqlitebrowser.org/dl/).\n",
209+
" - Run the installer and follow the on-screen instructions.\n",
210+
"\n",
211+
"- **For macOS:**\n",
212+
" - Download the `.dmg` file called [DB Browser for SQLite (Universal)](https://download.sqlitebrowser.org/DB.Browser.for.SQLite-v3.13.1.dmg) from the [SQLite Browser Download Page](https://sqlitebrowser.org/dl/).\n",
213+
" - Open the `.dmg` file and drag the app to your Applications folder.\n",
214+
" *Or*\n",
215+
" - Install via Homebrew:\n",
216+
" ```bash\n",
217+
" brew install --cask db-browser-for-sqlite\n",
218+
" ```\n",
219+
"\n",
220+
"- **For Linux:**\n",
221+
" - Install via your package manager. For example, on Ubuntu:\n",
222+
" ```bash\n",
223+
" sudo apt-get install sqlitebrowser\n",
224+
" ```\n",
225+
"\n",
226+
"### 2. Open the Database in SQLite Browser\n",
227+
"\n",
228+
"- Launch **SQLite Browser**.\n",
229+
"- Go to `File` > `Open Database`.\n",
230+
"- Navigate to where you downloaded the SQLite database files `simplefolks.db` and `northwind.db` and open it.\n",
231+
"\n",
232+
"---\n"
233+
]
234+
},
235+
{
236+
"cell_type": "markdown",
237+
"id": "dc376eb2-7994-4989-89af-004400d72648",
238+
"metadata": {},
239+
"source": [
240+
"<a id='convert'></a>\n",
241+
"\n",
242+
"# Converting a Spreadsheet to SQLite\n",
243+
"Try converting an .xlsx Excel Spreadsheet (downloaded from Google Sheets) to a SQLite Database.\n",
244+
"\n",
245+
"Make sure `openpyxl` is installed."
246+
]
247+
},
248+
{
249+
"cell_type": "code",
250+
"execution_count": null,
251+
"id": "7b8dff20-78cf-472a-b4d0-81c4c6d69886",
252+
"metadata": {},
253+
"outputs": [],
254+
"source": [
255+
"%pip install pandas openpyxl"
256+
]
257+
},
258+
{
259+
"cell_type": "markdown",
260+
"id": "2dece4a7-e3b9-461d-bf2a-68fadf7a5aed",
261+
"metadata": {},
262+
"source": [
263+
"## 1. Download Google Sheet as Excel File\n",
264+
"\n",
265+
"- Open this [google sheet](https://docs.google.com/spreadsheets/d/1OzM6Sp92cDG_jkO4_WGEacuD0rwYF_g2kHOIuq6NgZM/view).\n",
266+
"- Click **File > Download > Microsoft Excel (.xlsx)**\n",
267+
"- Save the file to the data folder"
268+
]
269+
},
270+
{
271+
"cell_type": "markdown",
272+
"id": "e8750ea2-8bd7-4580-abf6-989510990660",
273+
"metadata": {},
274+
"source": [
275+
"## 2. Convert Excel to SQLite using openpyxl and Pandas\n",
276+
"[openpyxl](https://openpyxl.readthedocs.io/en/stable/) is a Python library designed for reading and writing Excel 2010 files (with extensions .xlsx, .xlsm, .xltx, .xltm). It enables Python programs to interact with Excel files, allowing for the creation, modification, and extraction of data.\n",
277+
"\n",
278+
"In the context of converting an Excel spreadsheet with multiple worksheets into a SQLite database, openpyxl is particularly useful for:\n",
279+
" - **Reading Excel Files:** It facilitates the loading of Excel workbooks and accessing individual sheets, enabling the extraction of data from each worksheet.\n",
280+
" - **Data Manipulation:** By integrating with libraries like pandas, openpyxl allows for the conversion of worksheet data into DataFrames, which can then be processed and stored in a SQLite database. \n",
281+
"\n",
282+
"By utilizing openpyxl in conjunction with pandas, you can efficiently read data from each worksheet of an Excel file and transfer it into corresponding tables within a SQLite database, with appropriate data type inference."
283+
]
284+
},
285+
{
286+
"cell_type": "code",
287+
"execution_count": null,
288+
"id": "28c8e145-72a8-48e5-ad64-b56c623db387",
289+
"metadata": {},
290+
"outputs": [],
291+
"source": [
292+
"import pandas as pd\n",
293+
"import sqlite3\n",
294+
"\n",
295+
"# Path to your Excel file\n",
296+
"excel_file = 'data/simplefolks.xlsx'\n",
297+
"\n",
298+
"# Connect to the SQLite database (or create it if it doesn't exist)\n",
299+
"conn = sqlite3.connect('data/sheet2simplefolks.db')\n",
300+
"\n",
301+
"# Read all sheets into a dictionary of DataFrames\n",
302+
"sheets = pd.read_excel(excel_file, sheet_name=None, engine='openpyxl')\n",
303+
"\n",
304+
"# Iterate over the dictionary and write each DataFrame to a SQLite table\n",
305+
"for sheet_name, df in sheets.items():\n",
306+
" df.to_sql(sheet_name, conn, if_exists='replace', index=False)\n",
307+
"\n",
308+
"# Close the database connection\n",
309+
"conn.close()\n",
310+
"\n",
311+
"print(\"All sheets have been successfully converted to SQLite tables.\")"
312+
]
313+
},
314+
{
315+
"cell_type": "code",
316+
"execution_count": null,
317+
"id": "ec65e1b2-f880-4962-aa0e-0fda795a0988",
318+
"metadata": {},
319+
"outputs": [],
320+
"source": [
321+
"%sql sqlite:///data/sheet2simplefolks.db"
322+
]
323+
},
324+
{
325+
"cell_type": "code",
326+
"execution_count": null,
327+
"id": "fabc752a-fecc-4a52-9ae0-24c53e67c904",
328+
"metadata": {},
329+
"outputs": [],
330+
"source": [
331+
"%sql SELECT * FROM sqlite_master;"
332+
]
333+
},
334+
{
335+
"cell_type": "markdown",
336+
"id": "0bf7ee4e-fbbe-453e-8015-8380f3874602",
337+
"metadata": {},
338+
"source": [
339+
"## Optional: SQLite from the Command Line\n",
340+
"The SQLite project provides a simple command-line program named sqlite3 (or sqlite3.exe on Windows) that allows the user to manually enter and execute SQL statements against an SQLite database or against a ZIP archive. This document provides a brief introduction on how to use the sqlite3 program.\n",
341+
"\n",
342+
"Start the sqlite3 program by typing \"sqlite3\" at the command prompt, optionally followed by the name of the file that holds the SQLite database (or ZIP archive). If the named file does not exist, a new database file with the given name will be created automatically. If no database file is specified on the command-line, a temporary database is created and automatically deleted when the \"sqlite3\" program exits.\n",
343+
"\n",
344+
"Read more here: https://www.sqlite.org/cli.html</details>"
345+
]
346+
},
347+
{
348+
"cell_type": "code",
349+
"execution_count": null,
350+
"id": "ffb47489-fa52-4d70-b096-da9264d815d8",
351+
"metadata": {},
352+
"outputs": [],
353+
"source": [
354+
"!echo .tables | sqlite3 data/sheet2simplefolks.db"
355+
]
356+
}
357+
],
358+
"metadata": {
359+
"kernelspec": {
360+
"display_name": "Python 3 (ipykernel)",
361+
"language": "python",
362+
"name": "python3"
363+
},
364+
"language_info": {
365+
"codemirror_mode": {
366+
"name": "ipython",
367+
"version": 3
368+
},
369+
"file_extension": ".py",
370+
"mimetype": "text/x-python",
371+
"name": "python",
372+
"nbconvert_exporter": "python",
373+
"pygments_lexer": "ipython3",
374+
"version": "3.11.7"
375+
}
376+
},
377+
"nbformat": 4,
378+
"nbformat_minor": 5
379+
}

0 commit comments

Comments
 (0)