Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: surprising semantics for ibis.random() expressions #8055

Closed
1 task done
ogrisel opened this issue Jan 21, 2024 · 1 comment · Fixed by #8112
Closed
1 task done

bug: surprising semantics for ibis.random() expressions #8055

ogrisel opened this issue Jan 21, 2024 · 1 comment · Fixed by #8112
Assignees
Labels
bug Incorrect behavior inside of ibis

Comments

@ogrisel
Copy link
Contributor

ogrisel commented Jan 21, 2024

What happened?

I am not sure if this is a bug or not, but if it is not, at least the docstring for ibis.random() could clarify the expected semantics.

Consider the following:

>>> import ibis
>>> ibis.options.interactive = True

>>> t = ibis.memtable({"a": range(10)})
>>> r_a = ibis.random()
>>> t.mutate(random_a=r_a, random_b=r_a).mutate(random_c=ibis._.random_a).order_by(r_a)
┏━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓
┃ arandom_arandom_brandom_c ┃
┡━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩
│ int64float64float64float64  │
├───────┼──────────┼──────────┼──────────┤
│     10.2383280.4797950.238328 │
│     30.9517080.9824690.951708 │
│     90.9230800.6388010.923080 │
│     80.3388510.2703370.338851 │
│     50.4696030.4186770.469603 │
│     70.2479540.0439700.247954 │
│     00.5586760.5663430.558676 │
│     60.2765630.2206460.276563 │
│     20.0684360.0454290.068436 │
│     40.3601410.8461830.360141 │
└───────┴──────────┴──────────┴──────────┘

What I find surprising:

  • despite the reuse of the r_a instance for both random_a and random_b columns, their values are not the same;
  • similarly, despite the reuse of the r_a instance to order the table, the random_a column values are not sorted in increasing order.

random_c on the other hand has identical values with random_a as expected.

I would have expected the above results to happen with independent instances of ibis.random(), that is with the following snippet where all column values and ordering as explicitly made independent:

t = ibis.memtable({"a": range(10)})
t.mutate(random_a=ibis.random(), random_b=ibis.random()).mutate(
    random_c=ibis._.random_a
).order_by(ibis.random())

Related: #8054.

What version of ibis are you using?

7.2.0

What backend(s) are you using, if any?

DuckDB

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@ogrisel ogrisel added the bug Incorrect behavior inside of ibis label Jan 21, 2024
@NickCrews
Copy link
Contributor

Not sure about what we want or is possible (wait for a maintainer to chime in), but I have the explanation: this has to do with the semantics of the underlying duckdb:

import ibis
ibis.options.interactive = True
t = ibis.memtable({"a": range(10)})
r_a = ibis.random()
t2 = t.mutate(random_a=r_a, random_b=r_a).mutate(random_c=ibis._.random_a).order_by(r_a)
ibis.to_sql(t2)

gives

WITH t0 AS (
  SELECT
    t2.a AS a,
    RANDOM() AS random_a,
    RANDOM() AS random_b
  FROM ibis_pandas_memtable_clvyqxmilvg3tnmhfadqgybpxi AS t2
)
SELECT
  t1.a,
  t1.random_a,
  t1.random_b,
  t1.random_c
FROM (
  SELECT
    t0.a AS a,
    t0.random_a AS random_a,
    t0.random_b AS random_b,
    t0.random_a AS random_c
  FROM t0
) AS t1
ORDER BY
  RANDOM() ASC

and every individual RANDOM() call is totally unrelated to the others.

I believe that you can always work around this limitation by .mutate()ing earlier/more often and then using the new columns, eg t.mutate(random_a=r_a).mutate( random_b=ibis._.random_a).mutate(random_c=ibis._.random_a).order_by(ibis._.random_a). I agree this is less beautiful than what you posted originally, but if you;re just trying to get something done...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants