-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix BUG: read_sql tries to convert blob/varbinary to string with pyarrow backend #60105
base: main
Are you sure you want to change the base?
Changes from 27 commits
2244869
bd00fc5
6a23f05
2f261c8
8f900b8
536b1ed
8965a1d
a32b4a6
f412c72
a0200d0
10ca030
ba2e82d
8dd45ca
602194a
f8ae285
02514e0
ef5c2ed
da7817a
de5c604
e025d84
3d83bab
6ea5785
88accb3
c066ebf
2afeed5
de286a4
a99365a
a8f69e5
2f8ac63
1a0b2cf
1da6ee4
4c1c282
fe4bdba
71d94e5
8813f77
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4358,3 +4358,46 @@ def test_xsqlite_if_exists(sqlite_buildin): | |
(5, "E"), | ||
] | ||
drop_table(table_name, sqlite_buildin) | ||
|
||
|
||
@pytest.mark.parametrize("con", all_connectable) | ||
@pytest.mark.parametrize("dtype_backend", ["pyarrow", "numpy_nullable", lib.no_default]) | ||
kastkeepitjumpinlikekangaroos marked this conversation as resolved.
Show resolved
Hide resolved
kastkeepitjumpinlikekangaroos marked this conversation as resolved.
Show resolved
Hide resolved
|
||
def test_bytes_column(con, dtype_backend, request): | ||
# GitHub Issue #59242 | ||
conn = request.getfixturevalue(con) | ||
pa = pytest.importorskip("pyarrow") | ||
|
||
dtype = "O" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm why is this defaulting to object out here? That seems like something to avoid for the pyarrow backend. I think you are doing that in this test anyway, but the structure is confusing. I would just set up the expected data type from the backend up front, so something like: if dtype_backend == "pyarrow":
exp_dtype = ...
else:
... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. makes sense, let me know if the modified structure is less confusing! |
||
val = b"\x01#Eg\x89\xab\xcd\xef\x01#Eg\x89\xab\xcd\xef" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is the raw bytes returned from the hex string |
||
|
||
if "postgres" in con: | ||
val = ( | ||
b"\x00\x00\x00\x80\x01#Eg\x89\xab\xcd\xef\x01#Eg\x89\xab\xcd\xef" | ||
if "adbc" in con | ||
else "0000000100100011010001010110011110001001101010" | ||
"11110011011110111100000001001000110100010101100" | ||
"11110001001101010111100110111101111" | ||
) | ||
if dtype_backend == "pyarrow": | ||
dtype = ( | ||
pd.ArrowDtype(pa.string()) | ||
if "adbc" not in con | ||
else pd.ArrowDtype(pa.opaque(pa.binary(), "bit", "PostgreSQL")) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this an opaque type instead of just binary? That seems like a bug somewhere in this implementation or the driver There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is coming from
|
||
) | ||
|
||
if "psycopg2" in con: | ||
if dtype_backend == "numpy_nullable": | ||
dtype = pd.StringDtype() | ||
elif dtype_backend == lib.no_default and pd.options.future.infer_string: | ||
dtype = pd.StringDtype(storage="pyarrow", na_value=np.nan) | ||
|
||
if "postgres" not in con and dtype_backend == "pyarrow": | ||
dtype = pd.ArrowDtype(pa.binary()) | ||
|
||
expected = DataFrame([{"a": val}], dtype=dtype) | ||
df = pd.read_sql( | ||
"select x'0123456789abcdef0123456789abcdef' a", | ||
conn, | ||
dtype_backend=dtype_backend, | ||
) | ||
tm.assert_frame_equal(df, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm AFAIK the OpaqueType is returned when there is no suitable database type, but isn't binary already returned directly? It is a bit unclear what this additional
isinstance
check is forThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to add this because
adbc-driver-postgresql
is returning anOpaqueType
- without this included an exception is thrown