Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support INTERVAL data type in DB-API, arrow, and pandas connectors #836

Open
7 tasks
tswast opened this issue Jul 29, 2021 · 2 comments
Open
7 tasks

Support INTERVAL data type in DB-API, arrow, and pandas connectors #836

tswast opened this issue Jul 29, 2021 · 2 comments
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. external This issue is blocked on a bug with the actual product. status: blocked Resolving the issue is dependent on other work. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@tswast
Copy link
Contributor

tswast commented Jul 29, 2021

Follow-up to #826 since pandas and Arrow do not yet have a structured type that aligns with INTERVAL. The existing Timedelta support would work for INTERVALS with only a time component, but it is not calendar aware, so supporting year, month, and day intervals would require some mapping to timedelta, which is not ideal.

Why is a new data type needed?

  • YEAR: Leap years are a thing. Not every year is 365 days long.
  • MONTH: Not every month is the same length.
  • DAY: Daylight savings is a thing. Not every day is 24 hours long.

Note: DB-API support is included here because it uses the BigQuery Storage API, where we use the Arrow wire format.

TODO:

  • Auto-detect data type in DB-API query parameters
    • Might be possible to do this before reading INTERVAL columns is supported.
  • Row data is converted to relevant type in DB-API
  • Row data is converted to relevant type in to_dataframe
    • Might need to be object, since timedelta64 doesn't have years/months.
  • Check if to_arrow type is expected datatype
  • Convert data type in insert_rows_from_dataframe
  • Convert data type in load_rows_from_dataframe (CSV)
  • Convert data type in load_rows_from_dataframe (Parquet)
@tswast tswast added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Jul 29, 2021
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Jul 29, 2021
@tswast
Copy link
Contributor Author

tswast commented Jul 29, 2021

Update: Maybe pandas does have the right data type? https://pandas.pydata.org/docs/reference/api/pandas.tseries.offsets.DateOffset.html

@tswast tswast added external This issue is blocked on a bug with the actual product. status: blocked Resolving the issue is dependent on other work. labels Jul 29, 2021
@tswast
Copy link
Contributor Author

tswast commented Feb 25, 2022

Note: There is currently a bug with the interval type in pyarrow. https://issues.apache.org/jira/projects/ARROW/issues/ARROW-15783

Sounds like we need to call pa.array([1]) to initialize some pandas code before attempting to convert an interval column to pandas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. external This issue is blocked on a bug with the actual product. status: blocked Resolving the issue is dependent on other work. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

2 participants