Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: ibis.memtable(<Structs>) doesn't preserve struct field order #8166

Closed
1 task done
NickCrews opened this issue Jan 31, 2024 · 6 comments
Closed
1 task done

bug: ibis.memtable(<Structs>) doesn't preserve struct field order #8166

NickCrews opened this issue Jan 31, 2024 · 6 comments
Labels
bug Incorrect behavior inside of ibis duckdb The DuckDB backend

Comments

@NickCrews
Copy link
Contributor

What happened?

import ibis

d1 = {
    "structs": [
        {"price": 100, "sku": "A"},
        {"price": 400, "sku": "B"},
    ],
}
d2 = {
    "structs": [
        {"sku": "A", "price": 100},
        {"sku": "B", "price": 400},
    ],
}
ibis.memtable(d1).structs.type(), ibis.memtable(d2).structs.type()
# ('Struct([('price', Int64(nullable=True)), ('sku', String(nullable=True))], nullable=True),
# 'Struct([('price', Int64(nullable=True)), ('sku', String(nullable=True))], nullable=True))

Both field orders are the same, I assume in alphabetical order. I would expect the field order to be preserved from the key order in the dict. Maybe this is an upstream issue with duckdb?

What version of ibis are you using?

main, 80e13b4

What backend(s) are you using, if any?

No response

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@NickCrews NickCrews added the bug Incorrect behavior inside of ibis label Jan 31, 2024
@cpcloud
Copy link
Member

cpcloud commented Jan 31, 2024

Thanks for the issue.

I initially thought this was a repr issue, but when I create physical tables in a file their field order is preserved.

❯ duckdb test.ddb
v0.9.2
Enter ".help" for usage hints.
D create or replace table t2 as select {'sku':'A', 'price':100} as structs union select {'sku':'B', 'price': 400} as structs;
D create or replace table t1 as select {'price':100, 'sku':'A'} as structs union select {'price': 400, 'sku':'B'} as structs;
D

Then in IPython

In [1]: from ibis.interactive import *
c
In [2]: con = ibis.duckdb.connect("test.ddb")

In [3]: t1 = con.tables.t1

In [4]: t1
Out[4]:
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ structs                           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ struct<price: int32, sku: string> │
├───────────────────────────────────┤
│ {'price': 400, 'sku': 'B'}        │
│ {'price': 100, 'sku': 'A'}        │
└───────────────────────────────────┘

In [5]: t2 = con.tables.t2

In [6]: t2
Out[6]:
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ structs                           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ struct<sku: string, price: int32> │
├───────────────────────────────────┤
│ {'sku': 'A', 'price': 100}        │
│ {'sku': 'B', 'price': 400}        │
└───────────────────────────────────┘

The next place to look is in ibis.memtable to see if any funny business is happening there.

@cpcloud cpcloud added the duckdb The DuckDB backend label Feb 1, 2024
@cpcloud
Copy link
Member

cpcloud commented Feb 1, 2024

This appears to be the result of PyArrow's type inference, which is what we're using to infer the schema of data in memtables.

It looks like PyArrow sorts the fields, I haven't looked into why.

I'm not sure what we can do about this. Other than being perhaps a bit surprising, is this impeding your work in some way?

@NickCrews
Copy link
Contributor Author

It was perplexing me as I wrote tests that were weirdly failing, but it wasn't that hard to workaround once I figured out what it was. Not a high priority for me at the moment to fix this.

@NickCrews
Copy link
Contributor Author

I'd be fine with filing an issue with pyarrow and then closing this out, waiting for them to fix it upstream. Adding in compat code here in ibis seems like overkill

@cpcloud
Copy link
Member

cpcloud commented Feb 1, 2024

Thanks, I think we're probably unlikely to address this on the Ibis side anytime soon.

@cpcloud cpcloud closed this as not planned Won't fix, can't repro, duplicate, stale Feb 1, 2024
@github-project-automation github-project-automation bot moved this from backlog to done in Ibis planning and roadmap Feb 1, 2024
@NickCrews
Copy link
Contributor Author

noting that this is caused by

  1. converting the given dict a pd DF. fields are still in the right order
  2. then we infer the ibis schema from that. We call pa.array(series, from_pandas=True), and that is where the problem happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis duckdb The DuckDB backend
Projects
Archived in project
Development

No branches or pull requests

2 participants