-
Notifications
You must be signed in to change notification settings - Fork 624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: compilation failure in table expression mixing ibis.row_number()
, .isin()
and .sample()
#8058
Comments
AttributeError: 'str' object has no attribute 'alias'
compiling a table expression that involves .isin()
and .sample()
AttributeError: 'str' object has no attribute 'alias'
compiling a table expression that involves ibis.row_number()
, .isin()
and .sample()
looks like this might already be fixed on WITH t0 AS (
SELECT
t2.a AS a,
ROW_NUMBER() OVER () - 1 AS _id
FROM ibis_pandas_memtable_w4odh7uo5nesdc3vigve5xmk7u AS t2
)
SELECT
t1.a,
t1._id,
t1.is_test
FROM (
SELECT
t0.a AS a,
t0._id AS _id,
t0._id IN (
SELECT
t2_1._id
FROM t2 AS t2_1 TABLESAMPLE BERNOULLI (25.0 PERCENT) REPEATABLE (0)
) AS is_test
FROM t0
) AS t1
WHERE
NOT t1.is_test |
it does fail on Show error
|
Yeah, this was fixed in |
cc @kszucs Can you take a look at this one for |
I tried with the >>> import ibis
...
... t = ibis.memtable({"a": range(30)})
... t = t.mutate(_id=ibis.row_number())
... test = t.sample(fraction=0.25, seed=0)
... t = t.mutate(is_test=t._id.isin(test._id))
... train = t.filter(~t.is_test)
... train.execute()
Traceback (most recent call last):
File ~/miniforge3/envs/dev/lib/python3.11/site-packages/sqlalchemy/engine/base.py:1910 in _execute_context
self.dialect.do_execute(
File ~/miniforge3/envs/dev/lib/python3.11/site-packages/sqlalchemy/engine/default.py:736 in do_execute
cursor.execute(statement, parameters)
File ~/miniforge3/envs/dev/lib/python3.11/site-packages/duckdb_engine/__init__.py:160 in execute
self.__c.execute(statement, parameters)
CatalogException: Catalog Error: Table with name t2 does not exist!
Did you mean "temp.information_schema.tables"?
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
Cell In[28], line 8
train.execute()
File ~/code/ibis/ibis/expr/types/core.py:322 in execute
return self._find_backend(use_default=True).execute(
File ~/code/ibis/ibis/backends/base/sql/__init__.py:342 in execute
with self._safe_raw_sql(sql, **kwargs) as cursor:
File ~/miniforge3/envs/dev/lib/python3.11/contextlib.py:137 in __enter__
return next(self.gen)
File ~/code/ibis/ibis/backends/base/sql/alchemy/__init__.py:205 in _safe_raw_sql
yield con.execute(*args, **kwargs)
File ~/miniforge3/envs/dev/lib/python3.11/site-packages/sqlalchemy/engine/base.py:1385 in execute
return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
File ~/miniforge3/envs/dev/lib/python3.11/site-packages/sqlalchemy/sql/elements.py:334 in _execute_on_connection
return connection._execute_clauseelement(
File ~/miniforge3/envs/dev/lib/python3.11/site-packages/sqlalchemy/engine/base.py:1577 in _execute_clauseelement
ret = self._execute_context(
File ~/miniforge3/envs/dev/lib/python3.11/site-packages/sqlalchemy/engine/base.py:1953 in _execute_context
self._handle_dbapi_exception(
File ~/miniforge3/envs/dev/lib/python3.11/site-packages/sqlalchemy/engine/base.py:2134 in _handle_dbapi_exception
util.raise_(
File ~/miniforge3/envs/dev/lib/python3.11/site-packages/sqlalchemy/util/compat.py:211 in raise_
raise exception
File ~/miniforge3/envs/dev/lib/python3.11/site-packages/sqlalchemy/engine/base.py:1910 in _execute_context
self.dialect.do_execute(
File ~/miniforge3/envs/dev/lib/python3.11/site-packages/sqlalchemy/engine/default.py:736 in do_execute
cursor.execute(statement, parameters)
File ~/miniforge3/envs/dev/lib/python3.11/site-packages/duckdb_engine/__init__.py:160 in execute
self.__c.execute(statement, parameters)
ProgrammingError: (duckdb.duckdb.CatalogException) Catalog Error: Table with name t2 does not exist!
Did you mean "temp.information_schema.tables"?
[SQL: WITH t0 AS
(SELECT t2.a AS a, row_number() OVER () - ? AS _id
FROM ibis_pandas_memtable_6rxmv3tjrjauvgll5a6mzbefse AS t2)
SELECT t1.a, t1._id, t1.is_test
FROM (SELECT t0.a AS a, t0._id AS _id, t0._id IN (SELECT t2_1._id
FROM t2 AS t2_1 TABLESAMPLE bernoulli(25.0 PERCENT) REPEATABLE (0)) AS is_test
FROM t0) AS t1
WHERE NOT t1.is_test]
[parameters: (1,)]
(Background on this error at: https://sqlalche.me/e/14/f405) |
@cpcloud @lostmygithubaccount in case you had not seen my last comment, this expression still fails on |
Thanks @ogrisel. The part of the compiler this is related to is pretty complex. I'll take a look to see if there's a quick fix we can do on |
Ok, after looking at this for a bit I'm going to say this is too hairy a bug to fix in the SQLAlchemy compiler at this time (the issue has to do with aliases being used incorrectly when compiling subqueries, but only sometimes). I suggest we focus on fixing this bug in A cleaned up reproducible example: import ibis
from ibis import _
query = (
ibis.memtable({"a": range(30)})
.mutate(id=ibis.row_number())
.sample(fraction=0.25, seed=0)
.mutate(is_test=_.id.isin(_.id))
.filter(~_.is_test)
)
# Generate SQL for duckdb (currently fails here)
print(ibis.to_sql(query))
# Execute on duckdb
query.compute() @kszucs: this currently fails on |
AttributeError: 'str' object has no attribute 'alias'
compiling a table expression that involves ibis.row_number()
, .isin()
and .sample()
ibis.row_number()
, .isin()
and .sample()
Sounds good to me. |
Resolved by #8124 |
What happened?
The following snippet that uses
table.sample
raises a low level SQLAlchemy exception.raises:
EDIT: on
main
the SQL generation works but querying DuckDB fails: #8058 (comment)while the same code with a non-sample based selection of the test set such as the following works as expected:
or alternatively, keeping the call to
sample
but not usingibis.row_number
:Note: the first snippet was suggested when discussing the possibility to pass a fixed integer seed to
ibis.random
:ibis.random
#8054 (comment)What version of ibis are you using?
What backend(s) are you using, if any?
DuckDB
Relevant log output
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: