API: timestamp resolution inference - default to one unit (if possible) instead of being data-dependent?

After https://github.com/pandas-dev/pandas/pull/55901, we now do inference of the best resolution, and so allow to create non-nanosecond data by default (instead of raising for out of bounds data). 

To be clear, it is a _very_ nice improvement to stop raising those OutOfBounds errors while the timestamp would perfectly fit in another resolution. But I do think we could maybe reconsider the exact logic of how to determine the resolution.

With the latest changes you get the following:

```python
>>> pd.to_datetime(["2024-03-22 11:43:01"]).dtype
dtype('<M8[s]')
>>> pd.to_datetime(["2024-03-22 11:43:01.002"]).dtype
dtype('<M8[ms]')
>>> pd.to_datetime(["2024-03-22 11:43:01.002003"]).dtype
dtype('<M8[us]')
```

The resulting dtype instance depends on the exact input value (not type). I do think this has some downsides:

- The result dtype becomes very data dependent (while in general we want to avoid value dependent behavior)
- You can very easily get multiple datetime dtypes in a workflow, causing more casting (to different unit) than necessary

The fact that pandas by default truncates the string repr of datetimes (i.e. we don't show the subsecond parts if they are all zero, regardless of the actual resolution), in contrast to numpy, also means that round-tripping through a text representation (eg CSV) will very often lead to a change in dtype.

As a potential alternative, we could also decide to have a fixed _default resolution_ (e.g. microseconds), and then the logic for inferring the resolution could be: try to use the default resolution, and only if that does not work (either out of bounds or too much precision, i.e. nanoseconds present), use the inferred resolution from the data. 

That still gives some values dependent behaviour, but I think this would make it _a lot_ less common to see. And using a resolution like microseconds is sufficient for by far most use cases (in terms of bounds it supports: [290301 BC, 294241 AD])

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

Uh oh!

API: timestamp resolution inference - default to one unit (if possible) instead of being data-dependent? #58989

15 remaining items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

API: timestamp resolution inference - default to one unit (if possible) instead of being data-dependent? #58989

Description

Activity

jorisvandenbossche commented on Jun 17, 2024

WillAyd commented on Jun 17, 2024

bashtage commented on Jun 17, 2024

jorisvandenbossche commented on Jun 17, 2024

Pranav-Wadhwa commented on Jul 19, 2024

Pranav-Wadhwa commented on Jul 19, 2024

Pranav-Wadhwa commented on Aug 2, 2024

WillAyd commented on Aug 3, 2024

jbrockmendel commented on Aug 4, 2024

15 remaining items

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions