-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement default time resolution for CF time encoding/decoding #9580
Closed
Closed
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
fa5d2e3
implement default precision timestamp using predefined options with d…
kmuehlbauer b0a325d
match test warning
kmuehlbauer f194093
fix test issues, work around pandas _as_unit/as_unit issue
kmuehlbauer 11ed195
fix mypy in options
kmuehlbauer 6f4d194
fix typing
kmuehlbauer cecd561
refactor out _check_date_for_units_since_refdate
kmuehlbauer File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered the case where
flat_num_dates
has a floating point dtype? In that situation one could end up needing finer precision than thetime_units
specified in the encoding.But maybe another question at this stage is whether we would want the encoding information (reference date or time units) to affect the datetime precision we decode to—currently it appears one could end up with non-default precision if the time units in the file specified something finer than the user-specified precision via
set_options
, which could be surprising.In the long term, however, I agree that it would be nice to have some way of inferring this automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my perspective the decoding process should just do what it is supposed to do , which is decode into the designated datetimes as given by
time_unit since reference_date
. As our final dtype (np.datetime64
) is based on unix epoch (1970-01-01
, with different possible resolutions) we need to find the fitting resolution.days since 1970-01-01 00:00:00
->"s"
(reference_date takes precedence over time_unit since it is in second resolution)seconds since 1970-01-01
->"s"
(time_unit take precedence over reference_date)milliseconds since 1970-01-01 00:00:00.000000001
->"ns"
(reference_date takes precedence over time_unit)For now we have a third possible element to consider, the default resolution, which currently is "ns" (or might be changed to something else now).
The current proposed workflow with the
default resolution
should finally not be needed any more, when all issues have been ironed out. Or, we use something likeNone
to specify automatic inferring.Maybe the idea of a default resolution should only come into consideration in the encoding process, when the user does not specify any
units
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or no default resolution at all and try to infer everything (if that's possible at all)?
A good example showing the ambiguities between on-disk and in-memory is something like
[0, 1, 2, 3]
withdays since 1970-01-01 12:00:00
will translate to:
[12, 36, 60, 84]
withdatetime64[h]
"s"
:[ 43200, 129600, 216000, 302400]
withdatetime64[s]
A default resolution of seconds
s
would be not problem for the above, but for this example:seconds since 1970-01-01 00:00:00.001
will translate to:
datetime64[ms]
"s"
:[0, 1, 2, 3]
withdatetime64[s]
we would run into issues. The more I think about this, the more think inferring as much as possible is the way to go, to not round/cutoff or otherwise get time mismatches between on-disk and in-memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed I agree long-term some kind of inference would be best—ideally we do not lose precision based on what was written on disk. It is just somewhat complicated (in the case of floating point times, one would need to take into account the values themselves in addition to the metadata), but we can try to hash out now what that could look like (or punt and just say floats are always decoded to a user-configurable precision, defaulting to nanosecond).
For context, I interpreted this PR as temporarily aiming to maintain a consistent NumPy datetime precision throughout xarray, while at least allowing for some flexibility (hence the modifications to the casting logic in
variable.py
to also respect thetime_resolution
set inset_options
), so it was surprising to see one place where datetime precisions other than that specified inset_options
could appear.In your eyes, at what point do we remove the casting logic and fully allow mixed precision datetime values? Is this merely just an issue with the tests at this stage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@spencerkclark Sorry, went down that rabbit hole the last days. I think I'm on a good way to have things sorted out. There are different locations in the code and tests were the gears mesh together.
The overall decoding and encoding logic seem to work in my local feature branch. I'll update this PR when I've fixed all the bits and pieces. Need to dig further now, back into the hole 🐰
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries—if at any point you feel it would be helpful to discuss anything in a more interactive way we could try to find a time to meet. Happy to keep things asynchronous / open though too. Thanks again for your work on this!