Skip to content

Files

59 lines (38 loc) · 2.12 KB

Pandasql_Query_Excel_Using_SQL.md

File metadata and controls

59 lines (38 loc) · 2.12 KB



Template request | Bug report | Generate Data Product

Tags: #pandas #excel #snippet #read #dataframe #sql #pandasql #operations

Author: Minura Punchihewa

Description: This notebook demonstrates how to use Pandasql to query Excel files as if they were relational databases, using SQL syntax. The aim is to provide an alternative to traditional Pandas methods for filtering, grouping, and aggregating data, and make it easier for users who are familiar with SQL to perform these tasks.

Input

Import libraries

import pandas as pd
try:
    from pandasql import sqldf
except:
    !pip install pandasql --user
    from pandasql import sqldf

Setup Variables

# Inputs
file_path = "/home/minura/Documents/data/iris.xlsx"
query = """SELECT sepal_length, sepal_width, variety FROM df;"""  # query to be executed

Model

Read the Excel file into a DataFrame

# Read the Excel file into a DataFrame
df = pd.read_excel(file_path)
df

Use sqldf to execute the query

# Use sqldf to execute the query
output_df = sqldf(query)

Output

Display the resulting DataFrame

The output of the code will be a DataFrame containing only the A column from the original DataFrame df.

output_df