-
Notifications
You must be signed in to change notification settings - Fork 2
Type conversion on load datetime64[ns] ->datetime64[ns, UTC] #216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @shraik. Thanks for another report about potential type mapping improvements. We will look into it.
We will probably start investigating by comparing against PostgreSQL in order to get a feeling whether the behavior is intended with CrateDB, or if anything else should be improved, most likely within the SQLAlchemy dialect implementation. |
I checked on the docker version postgres:13, time zone is not added. For testing I added the library: pip install psycopg2 and changed 2 lines
Maybe this will help you in testing. |
Hi again. We've investigated your observations, thank you again. The outcome is that it is currently expected behavior, because CrateDB does not store DATE types natively. They will be stored as BIGINT types, in the same spirit like TIMESTAMP types, and on their way back, they naturally converge into timezone-aware DATETIME types, because that's probably the default mapping. Weird, but in this case expected. However, pandas provides easy workaround support to adjust the type mapping for date and datetime columns, using the pd.read_sql_table("test_date", conn, parse_dates={"date_1": "date", "date_2": "date"}) I think it is a good idea to add this to our documentation in one way or another, so let's keep the issue open as a notice for that. |
Yes, I can work with this date conversion option, it's not a problem. PS: |
Hi @shraik.
Excellent, thanks.
I see, and I also kind of expected that. Without doing schema introspection before, it is certainly inconvenient. We may add such an "universal wrapper" to I think the snippet you've shared above would already make an excellent start. df2 = df_load.select_dtypes("datetimetz")
df_load[df2.columns] = df2.apply(lambda x: x.dt.tz_convert(None))
We will be happy to improve anything where you can spot flaws, in order to incrementally improve. Is it in this case a particular spot in the documentation you are referring to? |
Ah! Currently, when storing |
Thank you for clarifying. We will see if we can improve on those little details here, given that DATE is in a twilight zone anyway.
I don't see a reason not to converge |
I wonder if it's the custom JSON encoder in the lower level Python driver that would need to be improved here, specifically where |
In my opinion, those dates as strings that you insert in the first place should be indeed be stored into a This topic is always weird. For PostgreSQL if you store |
That's all true, thank you. Still,
So, when looking at this specific detail, even with or without the other obstacles about loosing the timezone information when storing dates or times, which is always the case, I think it does not matter much which data type will be selected, i.e. it won't harm to choose the timezone-aware one in order to get rid of this miniature I/O anomaly? |
Compare
I've used your example program, now also at pandas_cratedb_date_type.py, to check and compare PostgreSQL and CrateDB. ObservationsBoth store ingress DATE types in this context using the same data type, which is TIMESTAMP WITHOUT TIME ZONE. PostgreSQL
CrateDB
Recap@matriv: There is an I/O anomaly when using pandas and CrateDB, the outcome is different than with PostgreSQL. What we are looking at here is if we could possibly improve the situation? PostgreSQL
CrateDB
|
@amotl Could you please clarify what do you propose? For me DATE should be Unless, I'm confused, and you propose something different. |
Hi. I think it's clear that both database servers behave in the same way, using the data type TIMESTAMP WITHOUT TIME ZONE for storing this field, so it's all good on this end. However, as outlined above, when using CrateDB with pandas, the returned data type is timezone-aware ( My intention is to find the flaw, and mitigate it when possible, because it's confusing to users. |
So the fix should be there, when we return date and timestamp WITHOUT time zone we shouldn't add the
which imho is not the way to go. |
Yes, you are right. Hereby I am retracting my previous statement officially. Sorry if that stirred confusion. CrateDB does the same like PostgreSQL, so it's all right in this regard. The fix needs to be applied somewhere in the Python client layers, when possible. Thanks! 🍀 |
Another example of strange behavior. Using: crate 2.0.0 Test-example:
Output:
|
Thank you very much for your report again. We will also look into this. It feels like the type mapper needs more improvements. |
Hi again, we just identified the spot in pandas where CrateDB/SQLAlchemy follows a different code path than PostgreSQL/SQLAlchemy. The CrateDB dialect currently apparently returns # we have a timezone capable type
if not sqltype.timezone:
return datetime
return DatetimeTZDtype We will see what we can do about it. With kind regards, |
That's a little pure-SQLAlchemy reproducer which demonstrates the problem around def reflect():
dburi = "crate://"
#dburi = "postgresql://postgres@localhost:5433/"
engine = sa.create_engine(dburi)
with engine.connect() as conn:
conn.execute(sa.text("CREATE TABLE IF NOT EXISTS t2 (date TIMESTAMP WITHOUT TIME ZONE)"))
conn.commit()
metadata = sa.MetaData()
inspector = sa.inspect(engine)
table = sa.Table("t2", metadata)
inspector.reflect_table(table, None)
for column in table.columns:
print("column:", column, column.type, column.type.timezone) |
Well, that's an obvious and silly mixup flaw coming from GH-24, where it sqlalchemy-cratedb/src/sqlalchemy_cratedb/dialect.py Lines 44 to 46 in cccf39e
|
We just applied a fix per 04f475d, and released sqlalchemy-cratedb==0.42.0.dev2. Can you to validate that this resolves the problem you observed? |
Hi, I checked, all the problems I described are now solved.
Thanks for the correction.
сб, 12 апр. 2025 г. в 02:36, Andreas Motl ***@***.***>:
… We just applied a fix per 04f475d
<04f475d>,
and released sqlalchemy-cratedb==0.42.0.dev2
<https://pypi.org/project/sqlalchemy-cratedb/0.42.0.dev2/>. Can you to
validate that this resolves the problem you observed?
—
Reply to this email directly, view it on GitHub
<#216 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADIVDNFBPGMNUH354KADTMT2ZAKU3AVCNFSM6AAAAAB2CNZWHCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOOJXHA3DMOBUHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
*amotl* left a comment (crate/sqlalchemy-cratedb#216)
<#216 (comment)>
We just applied a fix per 04f475d
<04f475d>,
and released sqlalchemy-cratedb==0.42.0.dev2
<https://pypi.org/project/sqlalchemy-cratedb/0.42.0.dev2/>. Can you to
validate that this resolves the problem you observed?
—
Reply to this email directly, view it on GitHub
<#216 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADIVDNFBPGMNUH354KADTMT2ZAKU3AVCNFSM6AAAAAB2CNZWHCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOOJXHA3DMOBUHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
When loading from pandas in the table with dates, the UTC timezone is added to the dtype.
This is confusing.
Is this correct or a bug?
Package Version
crate 2.0.0
pandas 2.2.3
SQLAlchemy 2.0.39
sqlalchemy-cratedb 0.42.0.dev0
test
Schema

Output:
After loading, to remove the time zone, I do this
The text was updated successfully, but these errors were encountered: