Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misidentification of column data type leads to certain sheets being unreadable #43

Open
ivanfsimeon opened this issue Nov 10, 2024 · 3 comments

Comments

@ivanfsimeon
Copy link

Sample column values:

ref_pfr_no
211110015349060
211150015009060
DIV
221400000671023
221170000731023
DIV

When a column starts with a numeric-looking text, and there are other text values within the column, querying & displaying the values gives the error "duckdb.duckdb.InvalidInputException: Invalid Input Error: Failed to cast value: Could not convert string 'DIV' to DOUBLE"

Adjusting the query to cast the values doesn't resolve the issue:
con.sql("SELECT ref_pfr_no::text FROM read_gsheet('', sheet='');").show()

@archiewood
Copy link
Member

Yes, this is a bug

Thanks for reporting.

possible options:

  • everything is a varchar on read, let users cast to other types
  • read more rows before deciding type
  • use whatever duckdb uses in the csv sniffer for this

@archiewood
Copy link
Member

@Alex-Monahan makes the point that we are already looping through every row for bool/double processing, so we might as well do it for column identification as well!

@mharrisb1
Copy link
Contributor

This is partially solved now with #55

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants