Skip to content

feat: Spec'd-out CompliantExpr #2119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 48 commits into from
Mar 5, 2025
Merged

feat: Spec'd-out CompliantExpr #2119

merged 48 commits into from
Mar 5, 2025

Conversation

dangotbanned
Copy link
Member

@dangotbanned dangotbanned commented Feb 28, 2025

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

CompliantExpr

This all came together really quickly after wondering if it would be possible to improve the typing in expr.py.

For example, we currently get no feedback from a type checker here:

narwhals/narwhals/expr.py

Lines 1070 to 1074 in 77a0150

return self._from_callable(
lambda plx: self._to_compliant_expr(plx).replace_strict(
old, new, return_dtype=return_dtype
)
)

image

That seemed pretty simple to solve - all we needed was to annotate _to_compliant_expr (0a95012).

However, that change introduced 54 errors in expr.py alone:

Whole lotta errors

image

Resolving all of that took me down the rabbit hole of fully spec-ing out CompliantExpr.
So now we get a happy IDE:

image

not_implemented

During that process - I spotted a link between (#2044) and (#1807).
I've borrowed from @camriddell's not_implemented idea (thanks btw πŸ˜‰) and implemented a very low-tech version of it.

With some extra magic - things are looking quite interesting now

image

Linking back to the (Eager|Lazy)Expr idea - a lot of the new code can be moved into those protocols.
Which would leave us with less repetition between the backends

General IDE demo

2025-03-04.13-13-19.-.Expr-Not.Implemented-30-1.mp4

- `pyright` now gives me **54** errors in `expr.py`
- Resolving those will identify what is currently **under-specified**
Shouldn't be `Expr`, e.g. `PandasLikeExpr` is narrower
- I haven't worked out how these are being handled currently
- Just following through warnings I'm now getting from `pyright`
Haven't followed through to `.join()` to see if that can also be widen-ed
Was surpised to find after (26b967f) - that there is no handling for this
- Need to solve the `property` issue later
- Want to avoid making a descriptor (if possible)
- Big overlap with the gaps in `DaskExpr`
- Probably should move bits into #2044 (comment)
- Very similar to `DuckDBExpr`
`.otherwise` being set is not part of the typing?
Was causing incompatible override
It'll do for now, but doesn't provide an easy way to retrieve `__narwhals_not_implemented__`
Comment on lines 593 to 605
@property
def name(self: Self) -> DaskExprNameNamespace:
return DaskExprNameNamespace(self)

arg_min = not_implemented("arg_min")
arg_max = not_implemented("arg_max")
arg_true = not_implemented("arg_true")
head = not_implemented("head")
tail = not_implemented("tail")
mode = not_implemented("mode")
sort = not_implemented("sort")
rank = not_implemented("rank")
sample = not_implemented("sample")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@camriddell camriddell Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason for this implementation over a not_implemented decorator? There could definitely be a technical detail that I missed.

My thoughts boil down to

  1. Eliminate the repetition of the name, we wouldn't need to repeat the name as both class variable and as a string
  2. Searchability, I often use simple greps through code bases so if I was trying to find the arg_min function/method I would use the specific pattern 'def arg_min' which wouldn't work here.

alternatively, we could have not_implemented return a descriptor- this would keep the usage pattern the exact same but let us avoid needing to repeat the name.

Here's a couple of examples for the current impl vs decorator vs descriptor.

# current
class T:
    arg_min = not_implemented("arg_min")

# decorator pattern
class T:
    @not_implemented
    def arg_min(self):
        ... # elipsis to indicate that this just hasn't been written yet

# descriptor
class T:
    arg_min = not_implemented()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@camriddell glad to see you here!

I will try to respond in a bit more detail tomorrow, but on this point specifically:

Eliminate the repetition of the name, we wouldn't need to repeat the name as both class variable and as a string

I've got a descriptor implementation that solves that issue
#2119 (comment)

I'm leaning towards that most strongly at the moment - but it is mainly used in tests only at the moment (not_implemented_alt)

Copy link
Member Author

@dangotbanned dangotbanned Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Searchability, I often use simple greps through code bases so if I was trying to find the arg_min function/method I would use the specific pattern 'def arg_min' which wouldn't work here.

@camriddell I've added this (no talking) video to try and show off what the improved typing lets you do instead of grepping the code base:

2025-03-04.13-13-19.-.Expr-Not.Implemented-30-1.mp4

Important

We can now statically know what is and is not implemented in VSCode
I imagine other IDEs probably have a similar functionality as well

Is there a particular reason for this implementation over a not_implemented decorator? There could definitely be a technical detail that I missed.

This gives us a clear line between not/implemented while also being very compact.

Adding decorated signatures for just 4 methods creates this diff:

diff --git a/narwhals/_dask/expr.py b/narwhals/_dask/expr.py
index a1bfb0b8..41c86132 100644
--- a/narwhals/_dask/expr.py
+++ b/narwhals/_dask/expr.py
@@ -25,6 +25,7 @@ from narwhals.utils import Implementation
 from narwhals.utils import generate_temporary_column_name
 from narwhals.utils import not_implemented
 from narwhals.utils import not_implemented_alt
+from narwhals.utils import unstable  # <------------------------ using just for syntactically valid decorator demo
 
 if TYPE_CHECKING:
     try:
@@ -615,10 +616,27 @@ class DaskExpr(CompliantExpr["DaskLazyFrame", "dx.Series"]):  # pyright: ignore[
     sample = not_implemented("sample")
     map_batches = not_implemented("map_batches")
     ewm_mean = not_implemented("ewm_mean")
-    rolling_sum = not_implemented("rolling_sum")
-    rolling_mean = not_implemented("rolling_mean")
-    rolling_var = not_implemented("rolling_var")
-    rolling_std = not_implemented("rolling_std")
+
+    @unstable
+    def rolling_mean(
+        self, window_size: int, *, min_samples: int | None, center: bool
+    ) -> Self: ...
+
+    @unstable
+    def rolling_std(
+        self, window_size: int, *, min_samples: int | None, center: bool, ddof: int
+    ) -> Self: ...
+
+    @unstable
+    def rolling_sum(
+        self, window_size: int, *, min_samples: int | None, center: bool
+    ) -> Self: ...
+
+    @unstable
+    def rolling_var(
+        self, window_size: int, *, min_samples: int | None, center: bool, ddof: int
+    ) -> Self: ...
+
     gather_every = not_implemented("gather_every")
     replace_strict = not_implemented_alt()
 

They now all fail type checking - because ... was misused:

>>> mypy
narwhals/_dask/expr.py:621: error: Missing return statement  [empty-body]
        def rolling_mean(
        ^
narwhals/_dask/expr.py:621: note: If the method is meant to be abstract, use @abc.abstractmethod
narwhals/_dask/expr.py:626: error: Missing return statement  [empty-body]
        def rolling_std(
        ^
narwhals/_dask/expr.py:626: note: If the method is meant to be abstract, use @abc.abstractmethod
narwhals/_dask/expr.py:631: error: Missing return statement  [empty-body]
        def rolling_sum(
        ^
narwhals/_dask/expr.py:631: note: If the method is meant to be abstract, use @abc.abstractmethod
narwhals/_dask/expr.py:636: error: Missing return statement  [empty-body]
        def rolling_var(
        ^
narwhals/_dask/expr.py:636: note: If the method is meant to be abstract, use @abc.abstractmethod
Found 4 errors in 1 file (checked 348 source files)

But lets say that wasn't an issue.
We now have the additional maintenance burden of keeping the signature annotations in sync - for a no-op 😞

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@camriddell
the big thing that's changed since your work on #1807 (for which I'm super grateful 😍) is the expanded CompliantExpr protocol in (https://github.com/narwhals-dev/narwhals/pull/2119/files#diff-705564b396213c63fc71e45d1bf5e3836b72b13d290ffa52b442fcaf05625b4d)

In my big PR description ramble I mentioned that this was where the PR started.
It just so happened to tie in really well with your not_implemented idea - but with the goalposts moved this tweak on it seemed like the best compromise to me personally

This comment was marked as outdated.

Comment on lines +209 to +218
@property
def str(self) -> Any: ...
@property
def name(self) -> Any: ...
@property
def dt(self) -> Any: ...
@property
def cat(self) -> Any: ...
@property
def list(self) -> Any: ...
Copy link
Member Author

@dangotbanned dangotbanned Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type for these would be this protocol

narwhals/narwhals/utils.py

Lines 150 to 157 in fb1c9f2

class _ExprNamespace( # type: ignore[misc] # noqa: PYI046
_StoresCompliant[CompliantExprT_co], Protocol[CompliantExprT_co]
):
_compliant_expr: CompliantExprT_co
@property
def compliant(self) -> CompliantExprT_co:
return self._compliant_expr

An obvious extension to this would be sub-protocols for each of the accessors

Comment on lines 50 to 63
if TYPE_CHECKING:
import sys

if sys.version_info >= (3, 13):
from warnings import deprecated
else:
from typing_extensions import deprecated
else:

def deprecated(message: str, /) -> Callable[[_Fn], _Fn]: # noqa: ARG001
def wrapper(func: _Fn, /) -> _Fn:
return func

return wrapper
Copy link
Member Author

@dangotbanned dangotbanned Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This came as a surprise - getting the IDE support like (vega/altair#3455) - but without a runtime dependency on typing_extensions πŸ˜…

@dangotbanned dangotbanned marked this pull request as draft March 4, 2025 17:17
Comment on lines +1452 to +1481
def __init__(self, alias: str | None = None, /) -> None:
# NOTE: Don't like this
# Trying to workaround `mypy` requiring `@property` everywhere
self._alias: str | None = alias

def __repr__(self) -> str:
return f"<{type(self).__name__}>: {self._name_owner}.{self._name}"

def __set_name__(self, owner: type[_T], name: str) -> None:
# https://docs.python.org/3/howto/descriptor.html#customized-names
self._name_owner: str = owner.__name__
self._name: str = self._alias or name

def __get__(
self, instance: _T | Literal["raise"] | None, owner: type[_T] | None = None, /
) -> Any:
if instance is None:
# NOTE: Branch for `cls._name`
# We can check that to see if an instance of `type(self)` for
# https://narwhals-dev.github.io/narwhals/api-completeness/expr/
return self
# NOTE: Prefer not exposing the actual class we're defining in
# `_implementation` may not be available everywhere
who = getattr(instance, "_implementation", self._name_owner)
raise _not_implemented_error(self._name, who)

def __call__(self, *args: Any, **kwds: Any) -> Any:
# NOTE: Purely to duck-type as assignable to **any** instance method
# Wouldn't be reachable through *regular* attribute access
return self.__get__("raise")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO

  • Remove/reduce comments before merging

@dangotbanned dangotbanned marked this pull request as ready for review March 4, 2025 18:29
dangotbanned added a commit that referenced this pull request Mar 4, 2025
No longer causing a commotion #2119 (comment)
@dangotbanned dangotbanned mentioned this pull request Mar 4, 2025
10 tasks
MarcoGorelli pushed a commit that referenced this pull request Mar 4, 2025
No longer causing a commotion #2119 (comment)
@dangotbanned
Copy link
Member Author

dangotbanned commented Mar 5, 2025

@MarcoGorelli I'm happy with where the PR is at now - planning to merge shortly

I've got a few ideas for follow-ups, but I really wanna get this part in before things start deviating from this version of the protocol.

E.g, these two PRs will require updating things

Follow-up ideas

Establish some protocol member order

  • Current order is just in-order of responding to pyright warnings
  • Could be alphabetical, categorical (by operation or private/public/property/method etc)
    • Or some combination of the two

Moving Compliant* into _compliant package

  • typing can still import from there
  • _selectors -> _compliant.selectors
  • typing.CompliantExpr -> _compliant.expr.CompliantExpr
    • Gives a natural home to start developing (Eager|Lazy)Expr
    • We can also move some of the not_implemented up into LazyExpr to reduce repetition
  • The other protocols can also map to the same kind of module structure
    • With an eventual goal of giving those the same treatment Expr has had here

@dangotbanned dangotbanned merged commit b986bfd into main Mar 5, 2025
29 checks passed
@dangotbanned dangotbanned deleted the compliant-expr-spec branch March 5, 2025 14:01
dangotbanned added a commit that referenced this pull request Mar 5, 2025
dangotbanned added a commit that referenced this pull request Mar 8, 2025
Resolves: (#2149 (comment))

Doing this properly surfaced lots of issues
- `Polars*` classes are *similar* to `Compliant*`
  - Not close enough to support typing
  - Lots of stuff is behind `__getattr__`
  - We don't need `PolarsExpr` to do a lot of the work `CompliantExpr` does
- `nw.Series._compliant_series` is still a big PR away
  - Repeat of (#2119)
  - Last big hurdle to get (#2104 (comment))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants