Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected schema for polars Array type of dimensionality > 1 #1908

Open
2 of 3 tasks
hsuominen opened this issue Feb 11, 2025 · 1 comment · May be fixed by #1909
Open
2 of 3 tasks

Unexpected schema for polars Array type of dimensionality > 1 #1908

hsuominen opened this issue Feb 11, 2025 · 1 comment · May be fixed by #1909
Labels
bug Something isn't working

Comments

@hsuominen
Copy link

Describe the bug
Unexpected shape of polars Array type when dimensionality is greater than 1.

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandera.
  • (optional) I have confirmed this bug exists on the main branch of pandera.

Code Sample, a copy-pastable example

import polars as pl
import pandera.polars as pa

class Schema(pa.DataFrameModel):
    array: pl.Array = pa.Field(dtype_kwargs={"inner": pl.Int64(), "shape": (2, 2)})

Schema.to_schema().columns["array"].dtype

resolves as DataType(Array(Int64, shape=(2, 2, 2)))

If shape is an int or tuple of size 1 behaviour is as expected. The size of the inferred DataType grows with increasing tuple length, e.g. a tuple with 3 elements yields a datatype with a length of 5.

Expected behavior

Polars evaluation matches the expected behaviour.

pl.Array(pl.Int64(), shape=(2, 2))

returns Array(Int64, shape=(2, 2))

Desktop (please complete the following information):

  • OS: macOS Sequoia 15.3
  • Python: 3.12
  • Pandera version: 0.22.1
  • Polars version: 1.22.0
@hsuominen hsuominen added the bug Something isn't working label Feb 11, 2025
@hsuominen
Copy link
Author

hsuominen commented Feb 11, 2025

A slightly more minimal repro that might point to the source of the issue:

from pandera.engines.polars_engine import Engine
from pandera.engines import engine
import polars as pl

engine.Engine.dtype(Engine, pl.Array(pl.Int64(), shape=(2, 2)))

and one step deeper:

from pandera.engines.polars_engine import Array
Array.from_parametrized_dtype(pl.Array(pl.Int64(), shape=(2, 2)))

which looks like it points to the root cause:

dt = pl.Array(pl.Int64(), shape=(2, 2))

dt.inner = Array(Int64, shape=(2,)) while at the same time dt.shape = (2, 2)

It looks like polars explictly handles tuples in inner differently

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant