-
Notifications
You must be signed in to change notification settings - Fork 136
feat: Spec'd-out CompliantExpr
#2119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- `pyright` now gives me **54** errors in `expr.py` - Resolving those will identify what is currently **under-specified**
Shouldn't be `Expr`, e.g. `PandasLikeExpr` is narrower
Can't type these correctly until we have generic `Expr` namespaces (f725248#diff-997700f1cad523f14606cfbaa39c13c2c4ed5b142c4ed4cd369fa5ca214c50ec)
- I haven't worked out how these are being handled currently - Just following through warnings I'm now getting from `pyright`
Haven't followed through to `.join()` to see if that can also be widen-ed
Was surpised to find after (26b967f) - that there is no handling for this
- Need to solve the `property` issue later - Want to avoid making a descriptor (if possible)
- Big overlap with the gaps in `DaskExpr` - Probably should move bits into #2044 (comment)
- Very similar to `DuckDBExpr`
`.otherwise` being set is not part of the typing?
Was causing incompatible override
It'll do for now, but doesn't provide an easy way to retrieve `__narwhals_not_implemented__`
narwhals/_dask/expr.py
Outdated
@property | ||
def name(self: Self) -> DaskExprNameNamespace: | ||
return DaskExprNameNamespace(self) | ||
|
||
arg_min = not_implemented("arg_min") | ||
arg_max = not_implemented("arg_max") | ||
arg_true = not_implemented("arg_true") | ||
head = not_implemented("head") | ||
tail = not_implemented("tail") | ||
mode = not_implemented("mode") | ||
sort = not_implemented("sort") | ||
rank = not_implemented("rank") | ||
sample = not_implemented("sample") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note
If we went with this - we'd also need to revise https://github.com/narwhals-dev/narwhals/blob/77a0150ffb5121086d74b0f9a10ec26781162529/utils/generate_backend_completeness.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason for this implementation over a not_implemented
decorator? There could definitely be a technical detail that I missed.
My thoughts boil down to
- Eliminate the repetition of the name, we wouldn't need to repeat the name as both class variable and as a string
- Searchability, I often use simple greps through code bases so if I was trying to find the
arg_min
function/method I would use the specific pattern'def arg_min'
which wouldn't work here.
alternatively, we could have not_implemented
return a descriptor- this would keep the usage pattern the exact same but let us avoid needing to repeat the name.
Here's a couple of examples for the current impl vs decorator vs descriptor.
# current
class T:
arg_min = not_implemented("arg_min")
# decorator pattern
class T:
@not_implemented
def arg_min(self):
... # elipsis to indicate that this just hasn't been written yet
# descriptor
class T:
arg_min = not_implemented()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@camriddell glad to see you here!
I will try to respond in a bit more detail tomorrow, but on this point specifically:
Eliminate the repetition of the name, we wouldn't need to repeat the name as both class variable and as a string
I've got a descriptor implementation that solves that issue
#2119 (comment)
I'm leaning towards that most strongly at the moment - but it is mainly used in tests only at the moment (not_implemented_alt
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Searchability, I often use simple greps through code bases so if I was trying to find the arg_min function/method I would use the specific pattern 'def arg_min' which wouldn't work here.
@camriddell I've added this (no talking) video to try and show off what the improved typing lets you do instead of grepping the code base:
2025-03-04.13-13-19.-.Expr-Not.Implemented-30-1.mp4
Important
We can now statically know what is and is not implemented in VSCode
I imagine other IDEs probably have a similar functionality as well
Is there a particular reason for this implementation over a
not_implemented
decorator? There could definitely be a technical detail that I missed.
This gives us a clear line between not/implemented while also being very compact.
Adding decorated signatures for just 4 methods creates this diff:
diff --git a/narwhals/_dask/expr.py b/narwhals/_dask/expr.py
index a1bfb0b8..41c86132 100644
--- a/narwhals/_dask/expr.py
+++ b/narwhals/_dask/expr.py
@@ -25,6 +25,7 @@ from narwhals.utils import Implementation
from narwhals.utils import generate_temporary_column_name
from narwhals.utils import not_implemented
from narwhals.utils import not_implemented_alt
+from narwhals.utils import unstable # <------------------------ using just for syntactically valid decorator demo
if TYPE_CHECKING:
try:
@@ -615,10 +616,27 @@ class DaskExpr(CompliantExpr["DaskLazyFrame", "dx.Series"]): # pyright: ignore[
sample = not_implemented("sample")
map_batches = not_implemented("map_batches")
ewm_mean = not_implemented("ewm_mean")
- rolling_sum = not_implemented("rolling_sum")
- rolling_mean = not_implemented("rolling_mean")
- rolling_var = not_implemented("rolling_var")
- rolling_std = not_implemented("rolling_std")
+
+ @unstable
+ def rolling_mean(
+ self, window_size: int, *, min_samples: int | None, center: bool
+ ) -> Self: ...
+
+ @unstable
+ def rolling_std(
+ self, window_size: int, *, min_samples: int | None, center: bool, ddof: int
+ ) -> Self: ...
+
+ @unstable
+ def rolling_sum(
+ self, window_size: int, *, min_samples: int | None, center: bool
+ ) -> Self: ...
+
+ @unstable
+ def rolling_var(
+ self, window_size: int, *, min_samples: int | None, center: bool, ddof: int
+ ) -> Self: ...
+
gather_every = not_implemented("gather_every")
replace_strict = not_implemented_alt()
They now all fail type checking - because ...
was misused:
>>> mypy
narwhals/_dask/expr.py:621: error: Missing return statement [empty-body]
def rolling_mean(
^
narwhals/_dask/expr.py:621: note: If the method is meant to be abstract, use @abc.abstractmethod
narwhals/_dask/expr.py:626: error: Missing return statement [empty-body]
def rolling_std(
^
narwhals/_dask/expr.py:626: note: If the method is meant to be abstract, use @abc.abstractmethod
narwhals/_dask/expr.py:631: error: Missing return statement [empty-body]
def rolling_sum(
^
narwhals/_dask/expr.py:631: note: If the method is meant to be abstract, use @abc.abstractmethod
narwhals/_dask/expr.py:636: error: Missing return statement [empty-body]
def rolling_var(
^
narwhals/_dask/expr.py:636: note: If the method is meant to be abstract, use @abc.abstractmethod
Found 4 errors in 1 file (checked 348 source files)
But lets say that wasn't an issue.
We now have the additional maintenance burden of keeping the signature annotations in sync - for a no-op π
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@camriddell
the big thing that's changed since your work on #1807 (for which I'm super grateful π) is the expanded CompliantExpr
protocol in (https://github.com/narwhals-dev/narwhals/pull/2119/files#diff-705564b396213c63fc71e45d1bf5e3836b72b13d290ffa52b442fcaf05625b4d)
In my big PR description ramble I mentioned that this was where the PR started.
It just so happened to tie in really well with your not_implemented
idea - but with the goalposts moved this tweak on it seemed like the best compromise to me personally
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
@property | ||
def str(self) -> Any: ... | ||
@property | ||
def name(self) -> Any: ... | ||
@property | ||
def dt(self) -> Any: ... | ||
@property | ||
def cat(self) -> Any: ... | ||
@property | ||
def list(self) -> Any: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return type for these would be this protocol
Lines 150 to 157 in fb1c9f2
class _ExprNamespace( # type: ignore[misc] # noqa: PYI046 | |
_StoresCompliant[CompliantExprT_co], Protocol[CompliantExprT_co] | |
): | |
_compliant_expr: CompliantExprT_co | |
@property | |
def compliant(self) -> CompliantExprT_co: | |
return self._compliant_expr |
An obvious extension to this would be sub-protocols for each of the accessors
narwhals/typing.py
Outdated
if TYPE_CHECKING: | ||
import sys | ||
|
||
if sys.version_info >= (3, 13): | ||
from warnings import deprecated | ||
else: | ||
from typing_extensions import deprecated | ||
else: | ||
|
||
def deprecated(message: str, /) -> Callable[[_Fn], _Fn]: # noqa: ARG001 | ||
def wrapper(func: _Fn, /) -> _Fn: | ||
return func | ||
|
||
return wrapper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This came as a surprise - getting the IDE support like (vega/altair#3455) - but without a runtime dependency on typing_extensions
π
def __init__(self, alias: str | None = None, /) -> None: | ||
# NOTE: Don't like this | ||
# Trying to workaround `mypy` requiring `@property` everywhere | ||
self._alias: str | None = alias | ||
|
||
def __repr__(self) -> str: | ||
return f"<{type(self).__name__}>: {self._name_owner}.{self._name}" | ||
|
||
def __set_name__(self, owner: type[_T], name: str) -> None: | ||
# https://docs.python.org/3/howto/descriptor.html#customized-names | ||
self._name_owner: str = owner.__name__ | ||
self._name: str = self._alias or name | ||
|
||
def __get__( | ||
self, instance: _T | Literal["raise"] | None, owner: type[_T] | None = None, / | ||
) -> Any: | ||
if instance is None: | ||
# NOTE: Branch for `cls._name` | ||
# We can check that to see if an instance of `type(self)` for | ||
# https://narwhals-dev.github.io/narwhals/api-completeness/expr/ | ||
return self | ||
# NOTE: Prefer not exposing the actual class we're defining in | ||
# `_implementation` may not be available everywhere | ||
who = getattr(instance, "_implementation", self._name_owner) | ||
raise _not_implemented_error(self._name, who) | ||
|
||
def __call__(self, *args: Any, **kwds: Any) -> Any: | ||
# NOTE: Purely to duck-type as assignable to **any** instance method | ||
# Wouldn't be reachable through *regular* attribute access | ||
return self.__get__("raise") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO
- Remove/reduce comments before merging
No longer causing a commotion #2119 (comment)
No longer causing a commotion #2119 (comment)
Reverts (990ee7b) Was able to get it under the 10mb limit for GitHub by reducing 60->30fps
@MarcoGorelli I'm happy with where the PR is at now - planning to merge shortly I've got a few ideas for follow-ups, but I really wanna get this part in before things start deviating from this version of the protocol. E.g, these two PRs will require updating things
Follow-up ideasEstablish some protocol member order
Moving
|
Follow-up to #2119 (comment)
Resolves: (#2149 (comment)) Doing this properly surfaced lots of issues - `Polars*` classes are *similar* to `Compliant*` - Not close enough to support typing - Lots of stuff is behind `__getattr__` - We don't need `PolarsExpr` to do a lot of the work `CompliantExpr` does - `nw.Series._compliant_series` is still a big PR away - Repeat of (#2119) - Last big hurdle to get (#2104 (comment))
What type of PR is this? (check all applicable)
Related issues
TypeVar
used in(SparkLike|DuckDB)Expr
Β #2044 (comment)Checklist
If you have comments or can explain your changes, please do so below
CompliantExpr
This all came together really quickly after wondering if it would be possible to improve the typing in
expr.py
.For example, we currently get no feedback from a type checker here:
narwhals/narwhals/expr.py
Lines 1070 to 1074 in 77a0150
That seemed pretty simple to solve - all we needed was to annotate
_to_compliant_expr
(0a95012).However, that change introduced 54 errors in
expr.py
alone:Whole lotta errors
Resolving all of that took me down the rabbit hole of fully spec-ing out
CompliantExpr
.So now we get a happy IDE:
not_implemented
During that process - I spotted a link between (#2044) and (#1807).
I've borrowed from @camriddell's
not_implemented
idea (thanks btw π) and implemented a very low-tech version of it.With some extra magic - things are looking quite interesting now
Linking back to the
(Eager|Lazy)Expr
idea - a lot of the new code can be moved into those protocols.Which would leave us with less repetition between the backends
General IDE demo
2025-03-04.13-13-19.-.Expr-Not.Implemented-30-1.mp4