-
Notifications
You must be signed in to change notification settings - Fork 1.5k
RFC: Implement initial support for COPY ... TO ...
statement
#6313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -450,6 +457,36 @@ impl SessionContext { | |||
self.read_batch(record_batch) | |||
} | |||
|
|||
// Execute a COPY TO statement, returning the number of rows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is the mechanism as used by CREATE TABLE AS SELECT
(aka LogicalPlan::CreateMemTable). It is different than the mechanism used by
INSERT INTO ... ` (added in #5520 by @metesynnada ) that uses an ExecutionPlan.
The difference bothers me, but I can see the benefits of both approaches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was playing with this more this evening and I think I came up with something that is half way between that I like even better. Will keep iterating and report back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have something you may be able to leverage, check this PR out. It extends the ExecutionPlan approach to writing files, I think you can leverage that work here too. With that change, COPY TO
and INSERT INTO
will use the same ExecutionPlan
-based approach -- the only difference would be related to appending vs overwriting.
FYI, if you are curious about timing, we plan to finalize and submit to upstream in a week or so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ozankabak -- I have reviewed https://github.com/synnada-ai/arrow-datafusion/pull/89 and I have thought about how to incorporate the same structure
I really like your idea to use the the same plans for COPY TO
and INSERT INTO
. After some more thought, I have an idea of how to plan COPY
statements using the same plans as an INSERT.
Here is a proposal for a simplified API: #6339
I'll try and bang out a PR shortly
Thank you for the heads up - I will study the PR you mention.
…On Wed, May 10, 2023 at 5:02 PM Mehmet Ozan Kabak ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In datafusion/core/src/execution/context.rs
<#6313 (comment)>
:
> @@ -450,6 +457,36 @@ impl SessionContext {
self.read_batch(record_batch)
}
+ // Execute a COPY TO statement, returning the number of rows
I think we have something you may be able to leverage, check this PR
<https://github.com/synnada-ai/arrow-datafusion/pull/89> out. It extends
the ExecutionPlan approach to writing files, I think you can leverage that
work here too.
—
Reply to this email directly, view it on GitHub
<#6313 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADXZMJBIT634DSGELJU3XLXFQF7FANCNFSM6AAAAAAX37ZGKA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
COPY ... TO ...
statementCOPY ... TO ...
statement
i have enough feedback for now and I working to add this functionality in pieces, so no need for this PR now |
Which issue does this PR close?
Closes #5654
Closes #5988
Rationale for this change
What changes are included in this PR?
(I'll try and break this up into smaller pieces for easier review but I want to show it all working together)
COPY .. TO ...
statementsLogicalPlan::CopyTo
variant-[ ] Parser tests
-[ ] Add end user documentation
-[ ] Properly support writing single parquet files
-[ ] sqllogictests
I also plan to file follow on tasks (like support for other file formats, options, etc)
Are these changes tested?
Yes
Are there any user-facing changes?
Yes there is a new (documented) statement