Template request | Bug report | Generate Data Product
Tags: #pandas #parquet #snippet #read #dataframe #sql #pandasql #operations
Author: Minura Punchihewa
Description: This notebook demonstrates how to use Pandasql to query Parquet files as if they were relational databases, using SQL syntax. The aim is to provide an alternative to traditional Pandas methods for filtering, grouping, and aggregating data, and make it easier for users who are familiar with SQL to perform these tasks.
import pandas as pd
try:
from pandasql import sqldf
except:
!pip install pandasql --user
from pandasql import sqldf
# Inputs
file_path = "/home/minura/Documents/data/iris.parquet"
query = """SELECT sepal_length, sepal_width, variety FROM df;""" # query to be executed
# Read Parquet file into DataFrame
df = pd.read_parquet(file_path)
df
# Use sqldf to execute the query
output_df = sqldf(query)
The output of the code will be a DataFrame containing only the A column from the original DataFrame df.
output_df