Skip to content

Commit 2e51c11

Browse files
committed
Added functionality to convert Excel spreadsheets to SQLite databases using openpyxl and pandas.
1 parent ab053c1 commit 2e51c11

File tree

1 file changed

+131
-1
lines changed

1 file changed

+131
-1
lines changed

SQLite-Setup.ipynb

Lines changed: 131 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -137,10 +137,140 @@
137137
"---\n"
138138
]
139139
},
140+
{
141+
"cell_type": "markdown",
142+
"id": "dc376eb2-7994-4989-89af-004400d72648",
143+
"metadata": {},
144+
"source": [
145+
"# Optional: Converting a Spreadsheet to SQLite\n",
146+
"Try converting an .xlsx Excel Spreadsheet (downloaded from Google Sheets) to a SQLite Database.\n",
147+
"\n",
148+
"Make sure `openpyxl` is installed."
149+
]
150+
},
151+
{
152+
"cell_type": "code",
153+
"execution_count": null,
154+
"id": "7b8dff20-78cf-472a-b4d0-81c4c6d69886",
155+
"metadata": {},
156+
"outputs": [],
157+
"source": [
158+
"%pip install pandas openpyxl"
159+
]
160+
},
161+
{
162+
"cell_type": "markdown",
163+
"id": "2dece4a7-e3b9-461d-bf2a-68fadf7a5aed",
164+
"metadata": {},
165+
"source": [
166+
"## 1. Download Google Sheet as Excel File\n",
167+
"\n",
168+
"- Open this [google sheet](https://docs.google.com/spreadsheets/d/1OzM6Sp92cDG_jkO4_WGEacuD0rwYF_g2kHOIuq6NgZM/view).\n",
169+
"- Click **File > Download > Microsoft Excel (.xlsx)**\n",
170+
"- Save the file to the data folder"
171+
]
172+
},
173+
{
174+
"cell_type": "markdown",
175+
"id": "e8750ea2-8bd7-4580-abf6-989510990660",
176+
"metadata": {},
177+
"source": [
178+
"## 2. Convert Excel to SQLite using openpyxl and Pandas\n",
179+
"[openpyxl](https://openpyxl.readthedocs.io/en/stable/) is a Python library designed for reading and writing Excel 2010 files (with extensions .xlsx, .xlsm, .xltx, .xltm). It enables Python programs to interact with Excel files, allowing for the creation, modification, and extraction of data.\n",
180+
"\n",
181+
"In the context of converting an Excel spreadsheet with multiple worksheets into a SQLite database, openpyxl is particularly useful for:\n",
182+
" - **Reading Excel Files:** It facilitates the loading of Excel workbooks and accessing individual sheets, enabling the extraction of data from each worksheet.\n",
183+
" - **Data Manipulation:** By integrating with libraries like pandas, openpyxl allows for the conversion of worksheet data into DataFrames, which can then be processed and stored in a SQLite database. \n",
184+
"\n",
185+
"By utilizing openpyxl in conjunction with pandas, you can efficiently read data from each worksheet of an Excel file and transfer it into corresponding tables within a SQLite database, with appropriate data type inference."
186+
]
187+
},
188+
{
189+
"cell_type": "code",
190+
"execution_count": null,
191+
"id": "28c8e145-72a8-48e5-ad64-b56c623db387",
192+
"metadata": {},
193+
"outputs": [],
194+
"source": [
195+
"import pandas as pd\n",
196+
"import sqlite3\n",
197+
"\n",
198+
"# Path to your Excel file\n",
199+
"excel_file = 'data/simplefolks.xlsx'\n",
200+
"\n",
201+
"# Connect to the SQLite database (or create it if it doesn't exist)\n",
202+
"conn = sqlite3.connect('data/sheet2simplefolks.db')\n",
203+
"\n",
204+
"# Read all sheets into a dictionary of DataFrames\n",
205+
"sheets = pd.read_excel(excel_file, sheet_name=None, engine='openpyxl')\n",
206+
"\n",
207+
"# Iterate over the dictionary and write each DataFrame to a SQLite table\n",
208+
"for sheet_name, df in sheets.items():\n",
209+
" df.to_sql(sheet_name, conn, if_exists='replace', index=False)\n",
210+
"\n",
211+
"# Close the database connection\n",
212+
"conn.close()\n",
213+
"\n",
214+
"print(\"All sheets have been successfully converted to SQLite tables.\")"
215+
]
216+
},
217+
{
218+
"cell_type": "code",
219+
"execution_count": null,
220+
"id": "ec65e1b2-f880-4962-aa0e-0fda795a0988",
221+
"metadata": {},
222+
"outputs": [],
223+
"source": [
224+
"%sql sqlite:///data/sheet2simplefolks.db"
225+
]
226+
},
227+
{
228+
"cell_type": "code",
229+
"execution_count": null,
230+
"id": "fabc752a-fecc-4a52-9ae0-24c53e67c904",
231+
"metadata": {},
232+
"outputs": [],
233+
"source": [
234+
"%sql SELECT * FROM sqlite_master;"
235+
]
236+
},
237+
{
238+
"cell_type": "markdown",
239+
"id": "0bf7ee4e-fbbe-453e-8015-8380f3874602",
240+
"metadata": {},
241+
"source": [
242+
"## Optional: SQLite from the Command Line\n",
243+
"\n",
244+
"The SQLite project provides a simple command-line program named sqlite3 (or sqlite3.exe on Windows) that allows the user to manually enter and execute SQL statements against an SQLite database or against a ZIP archive. This document provides a brief introduction on how to use the sqlite3 program.\n",
245+
"\n",
246+
"Start the sqlite3 program by typing \"sqlite3\" at the command prompt, optionally followed by the name of the file that holds the SQLite database (or ZIP archive). If the named file does not exist, a new database file with the given name will be created automatically. If no database file is specified on the command-line, a temporary database is created and automatically deleted when the \"sqlite3\" program exits.\n",
247+
"\n",
248+
"Read more here: https://www.sqlite.org/cli.html"
249+
]
250+
},
251+
{
252+
"cell_type": "code",
253+
"execution_count": 2,
254+
"id": "ffb47489-fa52-4d70-b096-da9264d815d8",
255+
"metadata": {},
256+
"outputs": [
257+
{
258+
"name": "stdout",
259+
"output_type": "stream",
260+
"text": [
261+
"exercises people politicians \n",
262+
"homes pets spreadsheet_data\n"
263+
]
264+
}
265+
],
266+
"source": [
267+
"!echo .tables | sqlite3 data/sheet2simplefolks.db"
268+
]
269+
},
140270
{
141271
"cell_type": "code",
142272
"execution_count": null,
143-
"id": "1548534c-c1b7-498a-92d2-a7c9098f9a13",
273+
"id": "f018bc2b-d482-4fd6-95ab-8fe1ff2dfa9e",
144274
"metadata": {},
145275
"outputs": [],
146276
"source": []

0 commit comments

Comments
 (0)