Skip to content

feat: IbisLazyFrame support #2000

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 47 commits into
base: main
Choose a base branch
from

Conversation

rwhitten577
Copy link

@rwhitten577 rwhitten577 commented Feb 12, 2025

What type of PR is this? (check all applicable)

  • 💾 Refactor
  • ✨ Feature
  • 🐛 Bug Fix
  • 🔧 Optimization
  • 📝 Documentation
  • ✅ Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

@NickCrews
Copy link

I want ibis support in narwhals and can help to get this landed. Ping me if you want a review or help!

Cc for visibility @cpcloud, the maintainer of ibis

@MarcoGorelli
Copy link
Member

thanks @NickCrews for your help! sure - we're in the middle of some large refactors so it may be prudent to wait a bit to avoid too many merge conflicts, but we will get to this

@dangotbanned
Copy link
Member

ibis.selectors

I'm interested in how we might be able to take advantage of ibis.selectors.
I noted in (#2064 (comment)) ibis is an outlier because of their native support

Initial thoughts are that our adaptation could work quite differently to the other backends - maybe even being more performant

Since (#2064) every backend besides polars is sharing a lot of code - which recently moved to nw._compliant.selectors.
Most of this is simply performing operations on nw.Schema

@MarcoGorelli
Copy link
Member

Hey @rwhitten577 - we're done with the big refactors, so if it interests you to continue, we'd love to ship this 🚢 🚀

I'd suggest not worrying about doing anything different for selectors just yet (it looks to me like they call .schema anyway?), mirroring what we do for duckdb / sqlframe / pyspark should be fine 👍

@afrisgaard
Copy link

How would this land? (Just considering to be ready when ibis ships to Narwhals :) )

Something like this:
ibis_frame = ibis.use_backend("BigQuery"/"DuckDb"/etc)

NarwhalsFrame = from_native(ibis_frame)

.. do narwhals operations

@MarcoGorelli
Copy link
Member

yup,that's right! we'd just translate to the ibis api, so whatever backend is set there would be used

@rwhitten577
Copy link
Author

Hey @MarcoGorelli, I will try to pick this back up in the next few weeks. I haven't looked through all the refactor changes yet, but do you think I'd be better off rebasing this old branch or are the changes significant enough where I'd want to start over and use duckdb or spark as an example?

@MarcoGorelli
Copy link
Member

awesome, thanks @rwhitten577 !

i think it might be salvageable to continue from here, but still using _spark_like / _duckdb as a reference. looks like most of the merge conflicts are in the tests anyway

@rwhitten577 rwhitten577 force-pushed the feature/initial-ibis-lazyframe branch from b0763ae to 82c41f8 Compare April 14, 2025 16:40
@rwhitten577 rwhitten577 force-pushed the feature/initial-ibis-lazyframe branch from 5b12e12 to 15e89c9 Compare April 14, 2025 21:15
@dangotbanned
Copy link
Member

@MarcoGorelli do we need to update some config to fix this? https://results.pre-commit.ci/run/github/760058710/1744817752.vQ_lZmv5RqOHrGGzLqhyoQ

I thought it was fine to import when we're in the backend package

@MarcoGorelli
Copy link
Member

yes you could update

ALLOWED_IMPORTS = {
"_pandas_like": {"pandas", "numpy"},
"_arrow": {"pyarrow", "pyarrow.compute", "pyarrow.parquet"},
"_dask": {"dask.dataframe", "pandas", "dask_expr"},
"_polars": {"polars"},
"_duckdb": {"duckdb"},
}

from narwhals.utils import _FullContext


class IbisExpr(LazyExpr["IbisLazyFrame", "ir.Expr"]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using ir.Expr seems to be the main source of typing issues.

I was initially thinking of changing the definition of NativeExpr

class NativeExpr(Protocol):
"""An `Expr`-like object from a package with [Lazy-only support](https://narwhals-dev.github.io/narwhals/extending/#levels-of-support).
Protocol members are chosen *purely* for matching statically - as they
are common to all currently supported packages.
"""
def between(self, *args: Any, **kwds: Any) -> Any: ...
def isin(self, *args: Any, **kwds: Any) -> Any: ...

But most of the methods being called don't exist on ir.Expr

https://github.com/ibis-project/ibis/blob/d7cd846627269b1f4c901cc3f0e380ee9024f796/ibis/expr/types/core.py#L67-L920

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g. here, all other the methods being called on expr are referring to something other than ir.Expr?

https://github.com/rwhitten577/narwhals/blob/6e44a8b54c331f6056580a79f2bedf950e78d825/narwhals/_ibis/expr.py#L136-L152

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think the Ibis types reported are going to be specific e.g. IntegerColumn or IntegerValue. Will explore a TypeAlias that covers these

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think the Ibis types reported are going to be specific e.g. IntegerColumn or IntegerValue. Will explore a TypeAlias that covers these

Right I see, they have things quite a bit more strongly typed.
With methods split up by class - rather than a unified Expr or Column or Series with namespace accessors 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or at least coming up with some common patterns to follow, where we block certain ops and raise consistent errors

Copy link
Member

@dangotbanned dangotbanned Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A basic solution would be like ArrowSeries.median

def median(self: Self, *, _return_py_scalar: bool = True) -> float:
from narwhals.exceptions import InvalidOperationError
if not self.dtype.is_numeric():
msg = "`median` operation not supported for non-numeric input type."
raise InvalidOperationError(msg)
return maybe_extract_py_scalar(
pc.approximate_median(self.native), _return_py_scalar
)

Not sure what kind of overhead this might have if we need to check frequently

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcoGorelli we may need to consider abstracting this out - since IIUC we might run into AttributeErrors if we aren't careful

we already don't allow aggregating scalar-like expressions

In [2]: nw.col('a').mean().mean()
---------------------------------------------------------------------------
InvalidOperationError                     Traceback (most recent call last)
Cell In[2], line 1
----> 1 nw.col('a').mean().mean()

File ~/scratch/.venv/lib/python3.12/site-packages/narwhals/expr.py:560, in Expr.mean(self)
    541 def mean(self: Self) -> Self:
    542     """Get mean value.
    543
    544     Returns:
   (...)
    558         └──────────────────┘
    559     """
--> 560     return self._with_aggregation(lambda plx: self._to_compliant_expr(plx).mean())

File ~/scratch/.venv/lib/python3.12/site-packages/narwhals/expr.py:83, in Expr._with_aggregation(self, to_compliant_expr)
     81 if self._metadata.kind.is_scalar_like():
     82     msg = "Aggregations can't be applied to scalar-like expressions."
---> 83     raise InvalidOperationError(msg)
     84 return self.__class__(
     85     to_compliant_expr, self._metadata.with_kind(ExprKind.AGGREGATION)
     86 )

InvalidOperationError: Aggregations can't be applied to scalar-like expressions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok pyright should be fixed now. Ibis' type system is funky but did my best to set the correct types or cast to something that made sense

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much appreciated @rwhitten577!

It seems the casts might be unavoidable for now (#2000 (comment)) 😔

I had to do a lot of that in

which eventually led to

That is to say, identifying the issues is a big step and could even lead to improvements down the line 😅

Comment on lines +566 to +573
def diff(self) -> Self:
def _func(window_inputs: WindowInputs) -> ir.NumericValue:
expr = cast("ir.NumericColumn", window_inputs.expr)
return expr - cast(
"ir.NumericColumn", expr.lag().over(ibis.window(following=0))
)

return self._with_window_function(_func)
Copy link
Member

@dangotbanned dangotbanned Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NickCrews it seems like some of our issues (#2000 (comment)) might be solved by changing the return type of some ibis methods to Self.
Just throwing this out there 🙂

Picking one at random, but say we start with:

expr: ir.NumericColumn

We first call:

https://github.com/ibis-project/ibis/blob/d7cd846627269b1f4c901cc3f0e380ee9024f796/ibis/expr/types/generic.py#L2707-L2711

op_1: ir.Column = expr.lag()

Then:

https://github.com/ibis-project/ibis/blob/d7cd846627269b1f4c901cc3f0e380ee9024f796/ibis/expr/types/generic.py#L817-L125

op_2: ir.Value = op_1.over(ibis.window(following=0))

But at that point we can't do the binary op, since the right-hand side isn't a ir.NumericValue:

https://github.com/ibis-project/ibis/blob/d7cd846627269b1f4c901cc3f0e380ee9024f796/ibis/expr/types/numeric.py#L646-L648

# Operator "-" not supported for types "NumericColumn" and "Value"Pylance[reportOperatorIssue]
result = expr - op_2

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that would help greatly here to return Self from funcs like over, name, etc which currently lose the original column type.

@rwhitten577
Copy link
Author

if we're adding initial support, i'd advocate for keeping it simple and just raising if self._backend_version < (0,10) for some operations, we can always lower the minimum version later

Just added this guard for the few places ibis.cases is used

@dangotbanned
Copy link
Member

#2000 (comment)

@rwhitten577 I fumbled big time on the spelling, but I've used (#2000 (comment)) for these in

Comment on lines 52 to 56
def __init__(
self: Self,
df: ir.Table,
*,
backend_version: tuple[int, ...],
version: Version,
self, df: ir.Table, *, backend_version: tuple[int, ...], version: Version
) -> None:
self._native_frame: ir.Table = df
self._version = version
Copy link
Member

@dangotbanned dangotbanned Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @MarcoGorelli

You might have noticed I've been gradually doing this where it isn't needed.

Now seemed like a good isolated time to remove them all for ibis 🙂

I've been leaving them on the public API, in case there was some benefit I didn't know about

@dangotbanned
Copy link
Member

yeah CI check would at least need to be green before merging

we can't run this in every environment that we measure coverage on due to upper-bound constraits

they accepted my pr to remove upper-bound constraints, so we can probably include ibis in the coverage metrics

@MarcoGorelli we can do ibis coverage on 3.11 & 3.13 jobs

I'm not sure how to disable it only on the 3.8 & 3.9 jobs 😞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Request for contributions: Ibis support
5 participants