Skip to content

BUG: Timestamp.normalize() can overflow silently #60583

@wjandrea

Description

@wjandrea
Contributor

Pandas version checks

  • I have checked that this issue has not already been reported.
    • Query: is:issue normalize overflow
  • I have confirmed this bug exists on the latest version of pandas.
    I have confirmed this bug exists on the main branch of pandas.

Issue description

I was trying to get the minimum date, so I tried pd.Timestamp.min.normalize() before realizing that would be out of range, but what's weird is that it overflows without warning:

>>> pd.Timestamp.min
Timestamp('1677-09-21 00:12:43.145224193')
>>> pd.Timestamp.min.normalize()
Timestamp('2262-04-11 23:34:33.709551616')

For reference:

>>> pd.Timestamp.max
Timestamp('2262-04-11 23:47:16.854775807')

Compare that to .floor('D'), which I thought would be equivalent, but it raises:

>>> pd.Timestamp.min.floor('D')
OverflowError: value too large


The above exception was the direct cause of the following exception:

OutOfBoundsDatetime: Cannot round 1677-09-21 00:12:43.145224193 to freq=<Day> without overflow

as well as .round('D') (same error).

Sidenote: What I really wanted was .ceil('D'), which gets the first whole day:

>>> pd.Timestamp.min.ceil('D')
Timestamp('1677-09-22 00:00:00')

Installed Versions

INSTALLED VERSIONS
------------------
commit                : d41884b2dd0823dc6288ab65d06650302e903c6b
python                : 3.12.8
python-bits           : 64
OS                    : Linux
OS-release            : 5.4.0-202-generic
Version               : #222-Ubuntu SMP Fri Nov 8 14:45:04 UTC 2024
machine               : x86_64
processor             : x86_64
byteorder             : little
LC_ALL                : None
LANG                  : fr_CA.UTF-8
LOCALE                : fr_CA.UTF-8

pandas                : 3.0.0.dev0+1777.gd41884b2dd
numpy                 : 2.1.0.dev0+git20240403.e59c074
dateutil              : 2.9.0.post0
pip                   : 24.3.1
Cython                : None
sphinx                : None
IPython               : 8.22.2
adbc-driver-postgresql: None
adbc-driver-sqlite    : None
bs4                   : None
blosc                 : None
bottleneck            : None
fastparquet           : None
fsspec                : None
html5lib              : None
hypothesis            : None
gcsfs                 : None
jinja2                : None
lxml.etree            : None
matplotlib            : None
numba                 : None
numexpr               : None
odfpy                 : None
openpyxl              : None
psycopg2              : None
pymysql               : None
pyarrow               : None
pyreadstat            : None
pytest                : None
python-calamine       : None
pytz                  : 2024.1
pyxlsb                : None
s3fs                  : None
scipy                 : None
sqlalchemy            : None
tables                : None
tabulate              : None
xarray                : None
xlrd                  : None
xlsxwriter            : None
zstandard             : None
tzdata                : 2024.1
qtpy                  : None
pyqt5                 : None

Activity

added
Needs TriageIssue that has not been reviewed by a pandas team member
on Dec 16, 2024
changed the title [-]Is this a bug? `Timestamp.normalize()`[/-] [+]Is this a bug? `Timestamp.normalize()` can overflow silently (vs `.floor()` which raises)[/+] on Dec 16, 2024
shashankrushiya

shashankrushiya commented on Dec 19, 2024

@shashankrushiya

I don't think the above mention is a bug. The reason is because-
Here, when we use normalize() method, it sets the time part of the timestamp to 00:00:00. However, pd.Timestamp.min is already extremely close to the lower boundary of supported date times. And talking about floor() and round() methods, both are more strict in their handling. They basically check whether the operation would cause an overflow and raise an OverFlowError if it would.
If you consider ceil('D'), it will work as this method rounds up to the next day, which is still within the valid range for pd.Timestamp.

wjandrea

wjandrea commented on Dec 19, 2024

@wjandrea
ContributorAuthor

@shashankrushiya I'm not following. How does that mean it's not a bug?

talking about floor() and round() methods, both are more strict in their handling. They basically check whether the operation would cause an overflow and raise an OverFlowError if it would.

So then why doesn't .normalize() check? Is there a good reason or is it just an oversight? (i.e. a bug, or maybe not a bug per se but a potential enhancement)

BTW, note that pd.Timestamp.min is the lower boundary, not "extremely close" to the boundary.

added
Timestamppd.Timestamp and associated methods
and removed
Needs TriageIssue that has not been reviewed by a pandas team member
on Dec 20, 2024
changed the title [-]Is this a bug? `Timestamp.normalize()` can overflow silently (vs `.floor()` which raises)[/-] [+]BUG: `Timestamp.normalize()` can overflow silently[/+] on Dec 20, 2024
rhshadrach

rhshadrach commented on Dec 20, 2024

@rhshadrach
Member

Thanks for the report! Agreed normalize should raise an OverflowError here. PRs to fix are welcome!

@shashankrushiya Sorry, is this text AI-generated? It's not very helpful.

I view this as a violation of pandas' Code of conduct and ask you to refrain from making such statements in the future. We'd like to make pandas a welcome place for all contributors, and this type of comment works against this goal.

wjandrea

wjandrea commented on Dec 20, 2024

@wjandrea
ContributorAuthor

@rhshadrach My bad. I've edited my above comment to be respectful and more constructive. Sorry @shashankrushiya!

anishkarki

anishkarki commented on Dec 22, 2024

@anishkarki
Contributor

take

Anurag-Varma

Anurag-Varma commented on Feb 21, 2025

@Anurag-Varma
Contributor

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

BugTimestamppd.Timestamp and associated methods

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Participants

    @wjandrea@anishkarki@rhshadrach@Anurag-Varma@shashankrushiya

    Issue actions

      BUG: `Timestamp.normalize()` can overflow silently · Issue #60583 · pandas-dev/pandas