-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Create new empty external table #7228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think OPTON 2 would be ideal and the easiest to use. Maybe we could even add a HINT in the message if |
I agree UX needs work. An additional problem is ctx.register_json(
"json_table_sink",
out_path,
NdJsonReadOptions::default()
.insert_mode(ListingTableInsertMode::AppendNewFiles)
)
.await?;
ctx.sql("insert into json_table_sink ....") Registering a table via SQL doesn't expose this setting, so it always defaults to ListingTableInsertModeAppendToFile. I am thinking Option3 makes the most sense to solve these issues. My concern with option2 is we will have to add a lot of new SQL grammar to support these and future write options (e.g. compression, parquet row group size ect...). For parquet read configs, we have a lot of options controlled via session config, and we could follow a similar pattern for default write behaviors (both parquet and non parquet specific write behaviors). For option 2, maybe some syntax like |
I would like to implement this feature, I'll try work on it this weekend. for now I prefer to implement option3, any advice are welcome at any time. thx! |
The recent work on data export looks awesome 👀 |
This syntax looks really nice 👍🏼 |
I think option 3 should be a relatively straightforward exercise of:
|
I did not intend to close this issue with my recent PR. Can we reopen it? #7276 provides a possible workaround to the issue here, but not a complete solution. |
Uh oh!
There was an error while loading. Please reload this page.
Is your feature request related to a problem or challenge?
Thanks to the great work from @metesynnada @devinjdangelo and others, it is now possible to have DataFusion insert data into CSV and JSON tables.
However, the User Experience / UX is tough as it is not easy to write datasets because: External tables require an existing location to exist, even if empty
So for example, if I wanted to write to json files in
/tmp/my_table
I currently need to create a the target file / directory externallyDescribe the solution you'd like
I would like to be able to have datafusion create the targets directly
Something like this, without any setup:
Describe alternatives you've considered
Option 1: Automatically create the target
One option is simply to remove the existence check on
CREATE EXTERNAL TABLE
The downside is that the reason
CREATE EXTERNAL TABLE
errors if the target doesn't exist is to help people debug errors when readingOption 2: Add new DDL to
CREATE EXTERNAL TABLE
Perhaps we could add a phrase to
CREATE EXTERNAL TABLE
likeFOR WRITE
The semantics would be if the target file/directory doesn't already exist, then create it rather than error
Option 3: Add a config parameter to control the default behavior
We could also add a config parameter like
with the same semantics as option 2 (create it target doesn't exist)
Additional context
No response
The text was updated successfully, but these errors were encountered: