Template request | Bug report | Generate Data Product

Tags: #pandas #parquet #snippet #read #dataframe #sql #pandasql #operations

Description: This notebook demonstrates how to use Pandasql to query Parquet files as if they were relational databases, using SQL syntax. The aim is to provide an alternative to traditional Pandas methods for filtering, grouping, and aggregating data, and make it easier for users who are familiar with SQL to perform these tasks.

Input

Import libraries

import pandas as pd
try:
    from pandasql import sqldf
except:
    !pip install pandasql --user
    from pandasql import sqldf

Setup Variables

# Inputs
file_path = "/home/minura/Documents/data/iris.parquet"
query = """SELECT sepal_length, sepal_width, variety FROM df;"""  # query to be executed

Model

Read Parquet file into DataFrame

# Read Parquet file into DataFrame
df = pd.read_parquet(file_path)
df

Use sqldf to execute the query

# Use sqldf to execute the query
output_df = sqldf(query)

Output

Display the resulting DataFrame

The output of the code will be a DataFrame containing only the A column from the original DataFrame df.

output_df

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!