Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatch in Delta Lake timestamp Mapping to Presto #24367

Open
imjalpreet opened this issue Jan 15, 2025 · 2 comments
Open

Mismatch in Delta Lake timestamp Mapping to Presto #24367

imjalpreet opened this issue Jan 15, 2025 · 2 comments
Assignees

Comments

@imjalpreet
Copy link
Member

imjalpreet commented Jan 15, 2025

Starting from Delta Kernel 3.2,

Delta Lake supports two distinct timestamp types:
1. timestamp: Includes time zone context.
2. timestamp_ntz: Represents a timestamp without time zone context. (Unsupported in Presto)

It appears that the Delta Lake timestamp data type is being mapped to Presto’s TIMESTAMP type, which does not include time zone information. However, the Delta Lake timestamp type inherently includes time zone context, and this mapping may lead to potential inconsistencies or data misinterpretations when querying Delta tables via Presto.

else if (deltaType instanceof TimestampType) {
return TIMESTAMP;
}

https://docs.databricks.com/en/sql/language-manual/data-types/timestamp-type.html
https://docs.databricks.com/en/sql/language-manual/data-types/timestamp-ntz-type.html

Your Environment

  • Presto version used: Latest master (0.291-SNAPSHOT)
  • Storage (HDFS/S3/GCS..): N.A
  • Data source and connector used: Delta Lake Connector
  • Deployment (Cloud or On-prem): N.A

Expected Behavior

Delta Lake’s timestamp type should map to Presto’s TIMESTAMP WITH TIME ZONE to preserve the time zone context and ensure the data is interpreted correctly.

Current Behavior

Delta Lake’s timestamp type is mapped to Presto’s TIMESTAMP, which does not account for time zone information.

Possible Solution

Update the Delta Lake connector in Presto to map Delta Lake timestamp to Presto TIMESTAMP WITH TIME ZONE. This would ensure accurate representation of the data and align with Delta Lake’s data type semantics.

Map Delta timestamp_ntz to Presto’s TIMESTAMP.

Context

This issue could lead to incorrect query results when dealing with time zone-sensitive data in Delta Lake tables, potentially causing significant inaccuracies in time-based calculations or reports.

@infvg
Copy link
Contributor

infvg commented Jan 16, 2025

Working on this

@imjalpreet
Copy link
Member Author

TimestampNTZ support: PR #24418

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🆕 Unprioritized
Development

No branches or pull requests

2 participants