-
Notifications
You must be signed in to change notification settings - Fork 136
feat: IbisLazyFrame
support
#2000
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: IbisLazyFrame
support
#2000
Conversation
I want ibis support in narwhals and can help to get this landed. Ping me if you want a review or help! Cc for visibility @cpcloud, the maintainer of ibis |
thanks @NickCrews for your help! sure - we're in the middle of some large refactors so it may be prudent to wait a bit to avoid too many merge conflicts, but we will get to this |
|
Hey @rwhitten577 - we're done with the big refactors, so if it interests you to continue, we'd love to ship this 🚢 🚀 I'd suggest not worrying about doing anything different for selectors just yet (it looks to me like they call |
How would this land? (Just considering to be ready when ibis ships to Narwhals :) ) Something like this: NarwhalsFrame = from_native(ibis_frame) .. do narwhals operations |
yup,that's right! we'd just translate to the ibis api, so whatever backend is set there would be used |
Hey @MarcoGorelli, I will try to pick this back up in the next few weeks. I haven't looked through all the refactor changes yet, but do you think I'd be better off rebasing this old branch or are the changes significant enough where I'd want to start over and use duckdb or spark as an example? |
awesome, thanks @rwhitten577 ! i think it might be salvageable to continue from here, but still using |
b0763ae
to
82c41f8
Compare
5b12e12
to
15e89c9
Compare
@MarcoGorelli do we need to update some config to fix this? https://results.pre-commit.ci/run/github/760058710/1744817752.vQ_lZmv5RqOHrGGzLqhyoQ I thought it was fine to import when we're in the backend package |
yes you could update narwhals/utils/import_check.py Lines 21 to 27 in aa48faa
|
narwhals/_ibis/expr.py
Outdated
from narwhals.utils import _FullContext | ||
|
||
|
||
class IbisExpr(LazyExpr["IbisLazyFrame", "ir.Expr"]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using ir.Expr
seems to be the main source of typing issues.
I was initially thinking of changing the definition of NativeExpr
narwhals/narwhals/_compliant/expr.py
Lines 80 to 89 in 7059f5b
class NativeExpr(Protocol): | |
"""An `Expr`-like object from a package with [Lazy-only support](https://narwhals-dev.github.io/narwhals/extending/#levels-of-support). | |
Protocol members are chosen *purely* for matching statically - as they | |
are common to all currently supported packages. | |
""" | |
def between(self, *args: Any, **kwds: Any) -> Any: ... | |
def isin(self, *args: Any, **kwds: Any) -> Any: ... | |
But most of the methods being called don't exist on ir.Expr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g. here, all other the methods being called on expr
are referring to something other than ir.Expr
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think the Ibis types reported are going to be specific e.g. IntegerColumn
or IntegerValue
. Will explore a TypeAlias that covers these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we might want to use one (or more) of Value
, Scalar
, Column
:
- https://github.com/ibis-project/ibis/blob/d7cd846627269b1f4c901cc3f0e380ee9024f796/ibis/expr/types/generic.py#L39
- https://github.com/ibis-project/ibis/blob/d7cd846627269b1f4c901cc3f0e380ee9024f796/ibis/expr/types/generic.py#L1368
- https://github.com/ibis-project/ibis/blob/d7cd846627269b1f4c901cc3f0e380ee9024f796/ibis/expr/types/generic.py#L1487
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think the Ibis types reported are going to be specific e.g.
IntegerColumn
orIntegerValue
. Will explore a TypeAlias that covers these
Right I see, they have things quite a bit more strongly typed.
With methods split up by class - rather than a unified Expr
or Column
or Series
with namespace accessors 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or at least coming up with some common patterns to follow, where we block certain ops and raise consistent errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A basic solution would be like ArrowSeries.median
narwhals/narwhals/_arrow/series.py
Lines 322 to 331 in 7059f5b
def median(self: Self, *, _return_py_scalar: bool = True) -> float: | |
from narwhals.exceptions import InvalidOperationError | |
if not self.dtype.is_numeric(): | |
msg = "`median` operation not supported for non-numeric input type." | |
raise InvalidOperationError(msg) | |
return maybe_extract_py_scalar( | |
pc.approximate_median(self.native), _return_py_scalar | |
) |
Not sure what kind of overhead this might have if we need to check frequently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarcoGorelli we may need to consider abstracting this out - since IIUC we might run into AttributeErrors if we aren't careful
we already don't allow aggregating scalar-like expressions
In [2]: nw.col('a').mean().mean()
---------------------------------------------------------------------------
InvalidOperationError Traceback (most recent call last)
Cell In[2], line 1
----> 1 nw.col('a').mean().mean()
File ~/scratch/.venv/lib/python3.12/site-packages/narwhals/expr.py:560, in Expr.mean(self)
541 def mean(self: Self) -> Self:
542 """Get mean value.
543
544 Returns:
(...)
558 └──────────────────┘
559 """
--> 560 return self._with_aggregation(lambda plx: self._to_compliant_expr(plx).mean())
File ~/scratch/.venv/lib/python3.12/site-packages/narwhals/expr.py:83, in Expr._with_aggregation(self, to_compliant_expr)
81 if self._metadata.kind.is_scalar_like():
82 msg = "Aggregations can't be applied to scalar-like expressions."
---> 83 raise InvalidOperationError(msg)
84 return self.__class__(
85 to_compliant_expr, self._metadata.with_kind(ExprKind.AGGREGATION)
86 )
InvalidOperationError: Aggregations can't be applied to scalar-like expressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok pyright should be fixed now. Ibis' type system is funky but did my best to set the correct types or cast to something that made sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much appreciated @rwhitten577!
It seems the casts might be unavoidable for now (#2000 (comment)) 😔
I had to do a lot of that in
which eventually led to
That is to say, identifying the issues is a big step and could even lead to improvements down the line 😅
def diff(self) -> Self: | ||
def _func(window_inputs: WindowInputs) -> ir.NumericValue: | ||
expr = cast("ir.NumericColumn", window_inputs.expr) | ||
return expr - cast( | ||
"ir.NumericColumn", expr.lag().over(ibis.window(following=0)) | ||
) | ||
|
||
return self._with_window_function(_func) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NickCrews it seems like some of our issues (#2000 (comment)) might be solved by changing the return type of some ibis
methods to Self
.
Just throwing this out there 🙂
Picking one at random, but say we start with:
expr: ir.NumericColumn
We first call:
op_1: ir.Column = expr.lag()
Then:
op_2: ir.Value = op_1.over(ibis.window(following=0))
But at that point we can't do the binary op, since the right-hand side isn't a ir.NumericValue
:
# Operator "-" not supported for types "NumericColumn" and "Value"Pylance[reportOperatorIssue]
result = expr - op_2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep that would help greatly here to return Self
from funcs like over
, name
, etc which currently lose the original column type.
Just added this guard for the few places |
@rwhitten577 I fumbled big time on the spelling, but I've used (#2000 (comment)) for these in |
narwhals/_ibis/dataframe.py
Outdated
def __init__( | ||
self: Self, | ||
df: ir.Table, | ||
*, | ||
backend_version: tuple[int, ...], | ||
version: Version, | ||
self, df: ir.Table, *, backend_version: tuple[int, ...], version: Version | ||
) -> None: | ||
self._native_frame: ir.Table = df | ||
self._version = version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @MarcoGorelli
You might have noticed I've been gradually doing this where it isn't needed.
Now seemed like a good isolated time to remove them all for ibis
🙂
I've been leaving them on the public API, in case there was some benefit I didn't know about
@MarcoGorelli we can do
I'm not sure how to disable it only on the |
What type of PR is this? (check all applicable)
Related issues
Checklist
If you have comments or can explain your changes, please do so below