Skip to content

Pandas makes bad DESCRIBE query when using SQLAlchemy #184

Open
@freud14-tm

Description

@freud14-tm

When using the SQLAlchemy engine with Pandas, it seems that Pandas makes a bad DESCRIBE query. Here is the code:

import os

import pandas as pd

from sqlalchemy import create_engine


server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME")
http_path = os.getenv("DATABRICKS_HTTP_PATH")
access_token = os.getenv("DATABRICKS_TOKEN")
engine = create_engine(
    f"databricks://token:{access_token}@{server_hostname}?http_path={http_path}&catalog=hive_metastore&schema=default",
)
with engine.connect() as connection:
    print(pd.read_sql("SELECT * FROM test", connection))

Here are the two query resulting from that code:
image

It does not do that when using SQL connector instead:

import os

import pandas as pd

from databricks import sql


with sql.connect(
    server_hostname=os.getenv("DATABRICKS_SERVER_HOSTNAME"),
    http_path=os.getenv("DATABRICKS_HTTP_PATH"),
    access_token=os.getenv("DATABRICKS_TOKEN"),
) as connection:
    print(pd.read_sql("SELECT * FROM test", connection))

Here are version numbers:

In [1]: import sqlalchemy

In [2]: sqlalchemy.__version__
Out[2]: '1.4.49'

In [3]: from databricks import sql

In [4]: sql.__version__
Out[4]: '2.8.0'

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions