Skip to content

Fix datasets task 7309: pin pyarrow < 20.0 for geoparquet compatibility#43

Merged
akhatua2 merged 1 commit intocooperbench:mainfrom
AlienKevin:fix/datasets-t7309-pyarrow
Mar 21, 2026
Merged

Fix datasets task 7309: pin pyarrow < 20.0 for geoparquet compatibility#43
akhatua2 merged 1 commit intocooperbench:mainfrom
AlienKevin:fix/datasets-t7309-pyarrow

Conversation

@AlienKevin
Copy link
Contributor

Summary

  • The Dockerfile for datasets task 7309 pins pyarrow==20.0.0
  • PyArrow 20.0 changed the default Arrow string type mapping, causing geoparquet geometry columns to report as large_string instead of string
  • The pre-existing test test_parquet_read_geoparquet asserts dataset.features[feature].dtype == 'string', which fails with large_string
  • This breaks all feature pairs for this task (both feature 1 and feature 2 tests fail on the same pre-existing test)

Fix

Change the pyarrow pin in the Dockerfile from:

RUN uv pip uninstall --system pyarrow && uv pip install --system "pyarrow==20.0.0"

to:

RUN uv pip uninstall --system pyarrow && uv pip install --system "pyarrow>=17.0.0,<20.0.0"

The Docker image (akhatua/cooperbench-huggingface-datasets:task7309) needs to be rebuilt and pushed to DockerHub after this change.

Test plan

  • Rebuild Docker image with pinned pyarrow
  • Oracle test passes for datasets task 7309 (both features)

PyArrow 20.0 changed default string type mapping, causing geoparquet
geometry columns to report as 'large_string' instead of 'string'. The
pre-existing test_parquet_read_geoparquet asserts dtype == 'string',
breaking all feature pairs for this task.

Pin pyarrow to >=17.0.0,<20.0.0 to keep a modern version while avoiding
the large_string default change. The Docker image needs to be rebuilt.
@akhatua2 akhatua2 merged commit ef6b3e6 into cooperbench:main Mar 21, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants