-
-
Notifications
You must be signed in to change notification settings - Fork 146
Add defaults for parameters #1293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
start: TimedeltaConvertibleTypes = ..., | ||
end: TimedeltaConvertibleTypes = ..., | ||
periods: int | None = ..., | ||
start: TimedeltaConvertibleTypes | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring says the default is None so I had to change the annotation
Based on the docstring, this should probably be rewritten to have overloads:
Of the four parameters
start
,end
,periods
, andfreq
,
exactly three must be specified.
@@ -35,13 +35,22 @@ class TimelikeOps: | |||
def unit(self) -> TimeUnit: ... | |||
def as_unit(self, unit: TimeUnit) -> Self: ... | |||
def round( | |||
self, freq, ambiguous: TimeAmbiguous = ..., nonexistent: TimeNonexistent = ... | |||
self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why your PR would change the formatting? I don't see a change in the argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what happened there, either libCST changed the whitespace (which is weird but it's happened before) or the pre-commit did.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as the pre-commit passes that's not a big deal, I was trying to find a change and kept comparing the line but could not find any!
Just for my own knowledge, what is adding the defaults instead of the ellipsis trying to achieve? Is it to help when looking at the docs in the IDE, for pyrefly, or something else? |
@@ -802,8 +808,8 @@ class DataFrame(NDFrame, OpsMixin, _GetItemHack): | |||
self, | |||
other: NDFrameT, | |||
join: AlignJoin = ..., | |||
axis: Axis | None = ..., | |||
level: Level | None = ..., | |||
axis: Axis | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This annotation changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the kind of weird thing is, I can't actually find the docstring anywhere in the pandas repo. But at runtime pandas.core.frame.DataFrame.shift.__doc__
is:
Shift index by desired number of periods with an optional time `freq`.
When `freq` is not passed, shift the index without realigning the data.
If `freq` is passed (in this case, the index must be date or datetime,
or it will raise a `NotImplementedError`), the index will be
increased using the periods and the `freq`. `freq` can be inferred
when specified as "infer" as long as either freq or inferred_freq
attribute is set in the index.
Parameters
----------
periods : int or Sequence
Number of periods to shift. Can be positive or negative.
If an iterable of ints, the data will be shifted once by each int.
This is equivalent to shifting by one value at a time and
concatenating all resulting frames. The resulting columns will have
the shift suffixed to their column names. For multiple periods,
axis must not be 1.
freq : DateOffset, tseries.offsets, timedelta, or str, optional
Offset to use from the tseries module or time rule (e.g. \'EOM\').
If `freq` is specified then the index values are shifted but the
data is not realigned. That is, use `freq` if you would like to
extend the index when shifting and preserve the original data.
If `freq` is specified as "infer" then it will be inferred from
the freq or inferred_freq attributes of the index. If neither of
those attributes exist, a ValueError is thrown.
axis : {0 or \'index\', 1 or \'columns\', None}, default None
Shift direction. For `Series` this parameter is unused and defaults to 0.
fill_value : object, optional
The scalar value to use for newly introduced missing values.
the default depends on the dtype of `self`.
For numeric data, ``np.nan`` is used.
For datetime, timedelta, or period data, etc. :attr:`NaT` is used.
For extension dtypes, ``self.dtype.na_value`` is used.
suffix : str, optional
If str and periods is an iterable, this is added after the column
name and before the shift value for each shifted column name.
Returns
-------
DataFrame
Copy of input object, shifted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually when the functions are not in the correct file for example pd.Series.shift
does not exist in the series.py
file, that often means that they are in the class it inherits from (NDFrame which sits in the generic.py file).
): ... | ||
def floor( | ||
self, freq, ambiguous: TimeAmbiguous = ..., nonexistent: TimeNonexistent = ... | ||
self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IDK why the pre-commit did these whitespace changes
@@ -1133,7 +1141,7 @@ class Series(IndexOpsMixin[S1], NDFrame): | |||
self, | |||
periods: int | Sequence[int] = ..., | |||
freq: DateOffset | timedelta | _str | None = ..., | |||
axis: Axis = ..., | |||
axis: Axis | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this annotation changed
It's to help with looking at docs/signature in the IDE. This work is unrelated to Pyrefly. Here's a thread on typeshed w/ some discussion on the pros and cons of defaults: python/typeshed#8988 The consensus seems to be that if the default is simple (like None, or some literal) then it's fine, but if it's complex we use The second reason I made the PR is just to test out this way of doing LLM-assisted codemods. I have another PR in the works that uses the same approach to add a few hundred types for un-annotated parameters and returns, but that one requires more manual filtering to make sure the generated types are good. |
Closes #1292
This PR uses a semi-automated/LLM-assisted method to add a few hundred parameter defaults to pandas-stubs, replacing what was previously
...
.There were 3 annotations that had to change because their type signatures are not compatible with the default.
The codemod that made these transformations is here: https://gist.github.com/yangdanny97/0d81e16f6374582413dd5bcf90d902fd
How it works is:
...
The main thing that needs careful reviewing is the
default_type_map
which maps the docstring snippet to the type annotation we replace the...
with. That part was generated with chatGPT's help (I gave it every snippet I got after step 2), and I commented out the mappings that seemed wrong or unclear. So, even though this was "LLM-assisted" the script is deterministic and will always give you the same thing each time, and the LLM is only involved in a one-time manual step.