Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: ibis.read_csv errors with "parsy.ParseError: expected 'EOF' at 0:3" when specifying columns argument #10695

Closed
1 task done
amoeba opened this issue Jan 21, 2025 · 3 comments · Fixed by #10696
Closed
1 task done
Labels
bug Incorrect behavior inside of ibis
Milestone

Comments

@amoeba
Copy link
Contributor

amoeba commented Jan 21, 2025

What happened?

Code that works fine under Ibis 8.0.0 no longer does under Ibis 9.5.0 and it appears due to specifying an INTEGER column type. My actual code is different but I was able to reduce it down to this:

>>> ibis.read_csv("doesntexist.csv", columns = {"whatever": "INTEGER"})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/bryce/tmp/ibis-read-csv-bug/.venv/lib/python3.12/site-packages/ibis/expr/api.py", line 1437, in read_csv
    return con.read_csv(sources, table_name=table_name, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/bryce/tmp/ibis-read-csv-bug/.venv/lib/python3.12/site-packages/ibis/backends/duckdb/__init__.py", line 765, in read_csv
    options.append(C.columns.eq(make_struct_argument(columns)))
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/bryce/tmp/ibis-read-csv-bug/.venv/lib/python3.12/site-packages/ibis/backends/duckdb/__init__.py", line 752, in make_struct_argument
    typ = dt.dtype(typ)
          ^^^^^^^^^^^^^
  File "/Users/bryce/tmp/ibis-read-csv-bug/.venv/lib/python3.12/site-packages/ibis/common/dispatch.py", line 140, in call
    return impl(arg, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/bryce/tmp/ibis-read-csv-bug/.venv/lib/python3.12/site-packages/ibis/expr/datatypes/core.py", line 77, in from_string
    return DataType.from_string(value)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/bryce/tmp/ibis-read-csv-bug/.venv/lib/python3.12/site-packages/ibis/expr/datatypes/core.py", line 172, in from_string
    return parse(value)
           ^^^^^^^^^^^^
  File "/Users/bryce/tmp/ibis-read-csv-bug/.venv/lib/python3.12/site-packages/ibis/expr/datatypes/parse.py", line 210, in parse
    return ty.parse(text)
           ^^^^^^^^^^^^^^
  File "/Users/bryce/tmp/ibis-read-csv-bug/.venv/lib/python3.12/site-packages/parsy/__init__.py", line 98, in parse
    (result, _) = (self << eof).parse_partial(stream)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/bryce/tmp/ibis-read-csv-bug/.venv/lib/python3.12/site-packages/parsy/__init__.py", line 112, in parse_partial
    raise ParseError(result.expected, stream, result.furthest)
parsy.ParseError: expected 'EOF' at 0:3

Some notes,

  • It doesn't matter if a file exists at the specified path
  • The error doesn't happen if I don't pass the columns kwarg
  • The error doesn't happen if I specify STRING instead of INTEGER

What version of ibis are you using?

  • 8.0.0: Bug doesn't show up
  • 9.5.0: Bug does show up

What backend(s) are you using, if any?

DuckDB

Relevant log output

$ uv python pin 3.12
Pinned `.python-version` to `3.12`

~/tmp/ibis-read-csv-bug
$ uv venv
Using CPython 3.12.7
Creating virtual environment at: .venv
Activate with: source .venv/bin/activate.fish

~/tmp/ibis-read-csv-bug
$ source .venv/bin/activate.fish

~/tmp/ibis-read-csv-bug
.venv $ uv pip install "ibis-framework[duckdb]==8.0.0"
Resolved 25 packages in 85ms
Installed 25 packages in 109ms
 + atpublic==4.1.0
 + bidict==0.23.1
 + duckdb==0.10.3
 + duckdb-engine==0.15.0
 + ibis-framework==8.0.0
 + markdown-it-py==3.0.0
 + mdurl==0.1.2
 + multipledispatch==1.0.0
 + numpy==1.26.4
 + packaging==24.2
 + pandas==2.2.3
 + parsy==2.1
 + pyarrow==15.0.2
 + pyarrow-hotfix==0.6
 + pygments==2.19.1
 + python-dateutil==2.9.0.post0
 + pytz==2024.2
 + rich==13.9.4
 + six==1.17.0
 + sqlalchemy==2.0.37
 + sqlalchemy-views==0.3.2
 + sqlglot==20.11.0
 + toolz==0.12.1
 + typing-extensions==4.12.2
 + tzdata==2024.2

~/tmp/ibis-read-csv-bug
.venv $ python
Python 3.12.7 (main, Oct 16 2024, 07:12:08) [Clang 18.1.8 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import ibis
i>>> ibis.__version__
'8.0.0'
>>> ibis.read_csv("data.csv", columns = {"whatever": "INTEGER"})
DatabaseTable: ibis_read_csv_kz4cq22gqnbbha65cs2zejizfa
  whatever int32
>>> ^D

~/tmp/ibis-read-csv-bug 25s
.venv $ uv pip install "ibis-framework[duckdb]==9.5.0"
Resolved 20 packages in 12ms
Uninstalled 2 packages in 400ms
Installed 2 packages in 24ms
 - ibis-framework==8.0.0
 + ibis-framework==9.5.0
 - sqlglot==20.11.0
 + sqlglot==25.20.2

~/tmp/ibis-read-csv-bug
.venv $ python
Python 3.12.7 (main, Oct 16 2024, 07:12:08) [Clang 18.1.8 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import ibis
>>> ibis.__version__
'9.5.0'
>>> ibis.read_csv("data.csv", columns = {"whatever": "INTEGER"})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/bryce/tmp/ibis-read-csv-bug/.venv/lib/python3.12/site-packages/ibis/expr/api.py", line 1437, in read_csv
    return con.read_csv(sources, table_name=table_name, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/bryce/tmp/ibis-read-csv-bug/.venv/lib/python3.12/site-packages/ibis/backends/duckdb/__init__.py", line 765, in read_csv
    options.append(C.columns.eq(make_struct_argument(columns)))
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/bryce/tmp/ibis-read-csv-bug/.venv/lib/python3.12/site-packages/ibis/backends/duckdb/__init__.py", line 752, in make_struct_argument
    typ = dt.dtype(typ)
          ^^^^^^^^^^^^^
  File "/Users/bryce/tmp/ibis-read-csv-bug/.venv/lib/python3.12/site-packages/ibis/common/dispatch.py", line 140, in call
    return impl(arg, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/bryce/tmp/ibis-read-csv-bug/.venv/lib/python3.12/site-packages/ibis/expr/datatypes/core.py", line 77, in from_string
    return DataType.from_string(value)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/bryce/tmp/ibis-read-csv-bug/.venv/lib/python3.12/site-packages/ibis/expr/datatypes/core.py", line 172, in from_string
    return parse(value)
           ^^^^^^^^^^^^
  File "/Users/bryce/tmp/ibis-read-csv-bug/.venv/lib/python3.12/site-packages/ibis/expr/datatypes/parse.py", line 210, in parse
    return ty.parse(text)
           ^^^^^^^^^^^^^^
  File "/Users/bryce/tmp/ibis-read-csv-bug/.venv/lib/python3.12/site-packages/parsy/__init__.py", line 98, in parse
    (result, _) = (self << eof).parse_partial(stream)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/bryce/tmp/ibis-read-csv-bug/.venv/lib/python3.12/site-packages/parsy/__init__.py", line 112, in parse_partial
    raise ParseError(result.expected, stream, result.furthest)
parsy.ParseError: expected 'EOF' at 0:3
>>> ibis.read_csv("data.csv", columns = {"whatever": "STRING"})
DatabaseTable: ibis_read_csv_33s6j4mt55horeoglmqdct7jma
  whatever string

Code of Conduct

  • I agree to follow this project's Code of Conduct
@amoeba amoeba added the bug Incorrect behavior inside of ibis label Jan 21, 2025
@cpcloud
Copy link
Member

cpcloud commented Jan 21, 2025

Interesting, thanks for the report, I can reproduce it!

@cpcloud
Copy link
Member

cpcloud commented Jan 21, 2025

It looks like this was introduced along with the ability to pass the columns and types arguments in c1dcf67, and the parsing bit hasn't changed since it was merged.

This is good: it means that we won't break any code that wasn't already failing after 8.0.0 because the implementation has been slightly broken ever since it was introduced.

@amoeba
Copy link
Contributor Author

amoeba commented Jan 21, 2025

Sounds great. Thanks for the quick look and fix.

@github-project-automation github-project-automation bot moved this from backlog to done in Ibis planning and roadmap Jan 22, 2025
@github-actions github-actions bot added this to the 10.0 milestone Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis
Projects
Status: done
2 participants