-
-
Notifications
You must be signed in to change notification settings - Fork 147
GH1327 Add overloead for date_range and timedelta_range #1333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH1327 Add overloead for date_range and timedelta_range #1333
Conversation
tests/test_timefuncs.py
Outdated
day1 = pd.Timedelta(1, unit="D") | ||
day10 = pd.Timedelta(10, unit="D") | ||
pd.timedelta_range( | ||
day1, day10, periods=10, freq="D" # type: ignore[call-overload] # pyright: ignore[reportArgumentType] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing all four should be banned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've added new overloads, but I don't see the benefit of them, without tests that don't work with the current stubs, but would work with your changes.
def date_range( | ||
end: str | DateAndDatetimeLike, | ||
periods: int, | ||
freq: str | timedelta | Timedelta | BaseOffset | None = None, | ||
tz: TimeZones = None, | ||
normalize: bool = False, | ||
name: Hashable | None = None, | ||
inclusive: IntervalClosedType = "both", | ||
unit: TimeUnit | None = None, | ||
) -> DatetimeIndex: ... | ||
@overload | ||
def date_range( | ||
start: str | DateAndDatetimeLike, | ||
periods: int, | ||
freq: str | timedelta | Timedelta | BaseOffset | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These would only work if the arguments are keyword arguments, so I think you need to put an asterisk (*
) right after the opening parenthesis.
def date_range( | ||
start: str | DateAndDatetimeLike | None, | ||
end: str | DateAndDatetimeLike | None, | ||
freq: str | timedelta | Timedelta | BaseOffset | None = None, | ||
tz: TimeZones = None, | ||
normalize: bool = False, | ||
name: Hashable | None = None, | ||
inclusive: IntervalClosedType = "both", | ||
unit: TimeUnit | None = None, | ||
) -> DatetimeIndex: ... | ||
@overload | ||
def date_range( | ||
start: str | DateAndDatetimeLike | None = None, | ||
end: str | DateAndDatetimeLike | None = None, | ||
periods: int | None = None, | ||
freq: str | timedelta | Timedelta | BaseOffset = "D", | ||
start: str | DateAndDatetimeLike, | ||
end: str | DateAndDatetimeLike, | ||
periods: int, | ||
freq: None = None, | ||
tz: TimeZones = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the point here of having separate overloads. Also, you need to put back the default values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the original goal was to only allow three out of the four values among start
, end
, periods
and freq
according to the docs.
Although the freq
argument is often not passed.
start: TimedeltaConvertibleTypes, | ||
end: TimedeltaConvertibleTypes, | ||
periods: int, | ||
freq: None = None, | ||
name: Hashable | None = None, | ||
closed: Literal["left", "right"] | None = None, | ||
*, | ||
unit: None | str = ..., | ||
) -> TimedeltaIndex: ... | ||
@overload | ||
def timedelta_range( | ||
start: TimedeltaConvertibleTypes, | ||
periods: int, | ||
freq: None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're going to have a similar issue in terms of mixing positional vs. keyword arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now fixed.
I have added two tests for Type invalid because this is what we want to check (that passing four arguments is not allowed). |
end: str | DateAndDatetimeLike = ..., | ||
periods: int = ..., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would allow pd.date_range()
to be accepted.
The error message is " Of the four parameters: start, end, periods, and freq, exactly three must be specified".
So if you want this to work, you have to have overloads where 3 are required. You can't have = ...
on any of them (or default values either), because that means the argument is optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point where I am a bit unsure is that most of the usage I have seen it just about passing start and end, freq is often left alone so the usage does not reflect the docs.
Should we force the user to use three out of the four or allow only two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think the docs and error message are a bit misleading.
You'll need to do a test to determine when 2 are allowed. I think the rule is as follows:
- Whether or not
freq
is specified, you need at least 2 out of the 3 ofstart
,end
,periods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I look at the docs, they don't even use three exactly, when looking at the code they are using the fact that freq defaults to None but is then replaced by D for day.
https://pandas.pydata.org/docs/dev/reference/api/pandas.timedelta_range.html#pandas.timedelta_range
See https://github.com/pandas-dev/pandas/blob/78ec27668163808b3488da6ed2c64d0b8e9bfbc8/pandas/core/indexes/timedeltas.py#L330-L331 for the code where they change freq=None to freq="D"
Should we force the user to do 3 args even if the docs example only uses 2.
Happy to open a PR in the pandas repo to ask for clarification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry took a bit of time to find all the docs and you had answered before.
Let me setup a quick test to validate that we need two out of three of start
, end
, and periods
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the freq is indeed the argument that does not need to be passed (it is also the last one in the order).
I think the correct way to type hint is like you said forcing 2 out of the 3 with freq being an str | None = ...
without the need to be passed when you pass two other. I have also tried passing four and that works as long as one of the three is None (yet it would be a strange use case of passing None since it is a default).
>>> pd.date_range("2023-04-05", "2023-04-07", freq="D")
DatetimeIndex(['2023-04-05', '2023-04-06', '2023-04-07'], dtype='datetime64[ns]', freq='D')
>>> pd.date_range("2023-04-05", "2023-04-07")
DatetimeIndex(['2023-04-05', '2023-04-06', '2023-04-07'], dtype='datetime64[ns]', freq='D')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have asked for clarifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have read and reread the docs, it is a bit confusing at first glance but I will add all the tests for it.
start: TimedeltaConvertibleTypes = ..., | ||
end: TimedeltaConvertibleTypes = ..., | ||
periods: int = ..., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same issue here as with date_range()
tests/test_timefuncs.py
Outdated
if TYPE_CHECKING_INVALID_USAGE: | ||
day1 = pd.Timedelta(1, unit="D") | ||
day10 = pd.Timedelta(10, unit="D") | ||
pd.timedelta_range( | ||
day1, day10, 10, "D" # type: ignore[call-overload] # pyright: ignore[reportArgumentType] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should add checks where only 0, 1 or 2 of the 3 arguments are specified, as that should fail as well.
tests/test_timefuncs.py
Outdated
if TYPE_CHECKING_INVALID_USAGE: | ||
day1 = pd.Timestamp("2023-04-03") | ||
day10 = pd.Timedelta("2023-04-08") | ||
pd.date_range( | ||
day1, day10, 10, "D" # type: ignore[call-overload] # pyright: ignore[reportCallIssue] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment - need to check that 0, 1 or 2 arguments specified causes a failure.
Rewrote the tests and the overloads properly with 2 (when omitting freq), 3 and forbidding 1 and 4 arguments among |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @loicdiridollou
assert_type()
to assert the type of any return value