feat(sql): support inserts with default constraints #9844

IndexSeek · 2024-08-15T01:40:07Z

Description of changes

This change intends to support scenarios where a user needs to insert into a table where a table contains DEFAULT values on columns, meaning the incoming object may have fewer columns than the target table schema.

It's common to insert into tables with DEFAULT constraints; in many cases, these are sequences, and it is a bit tricky to provide the sequence "nextval" approach today.

This topic was initially brought up in Zulip.

Here is an example of this in practice:

import ibis
import pandas as pd

con = ibis.duckdb.connect()
con.raw_sql("CREATE TABLE example (a INTEGER DEFAULT 1, b VARCHAR NOT NULL);")

df = pd.DataFrame(
    {
        "a": [1, 2, 3],
        "b": ["foo", "bar", "baz"],
    }
)

Backend error

If the user excludes a required column (e.g., NOT NULL without a DEFAULT) from the container used for insert, the backend will error out with its respective error.

con.insert("example", df[["a"]])

Successful insert with excluded columns

con.insert("example", df[["b"]])
con.table("example").to_pandas()

a	b
1	foo
1	bar
1	baz

I'm still working through the tests for this, but wanted to put this out here in case anyone wanted to take a look before I can return to it.

IndexSeek · 2024-08-16T02:47:12Z

I have made more progress on the tests. I got the Oracle backend passing, but it required a hacky solution to ensure the identifier was being quoted.

import ibis.backends.sql.compilers as sc

quoted = getattr(sc, con.dialect.__name__.lower()).compiler.quoted

It works, but I wonder if there is a better way.

I will try to look into the remaining backends that are still failing, which are: Clickhouse, Trino, Druid, Exasol.

Edit:
I was able to get the Clickhouse query to compile by suffixing it with ORDER BY "a", but I noticed in another line that create table was not implemented, so I will mark this as notimpl as well.

cpcloud · 2024-08-23T21:51:28Z

ibis/backends/sql/__init__.py

+            columns=[
+                sg.to_identifier(col, quoted=quoted)
+                for col in columns
+                if col in source_cols


Let's make a variable for source.schema and then use col in variable to do this lookup.

I believe the change I have made will reduce the time complexity to avoid needing to check again against the source_cols list. I have rewritten things a bit here, using a variable named columns to refer to the column list to be used for the insert SQL expression.

ibis/backends/tests/test_client.py

…m/IndexSeek/ibis into feat/insert-provided-columns-only

IndexSeek · 2024-08-24T17:12:13Z

I swapped things around here to use issubset - my idea was that if the source contains fewer columns than the target, but the column names are the same, that's fine, as we can use the source's columns in the insert list.

issubset will also evaluate to True if the columns are the same in both source and target, regardless of order. In this scenario, we want to use the order of source, so that the SQL expression will be written properly.

So if my source schema is:

ibis.Schema {
  b  string
  a  int64
}

But my target schema is:

ibis.Schema {
  a  int64
  b  string
}

This is okay, because the underlying SQL expression come out to:

INSERT INTO target (b, a)
SELECT * 
FROM source;

and in the event that maybe the target schema has a default constraint on column "a", and our source schema looks like

ibis.Schema {
  b  string
}

the query will be written like so:

INSERT INTO target (b)
SELECT * 
FROM source;

IndexSeek · 2024-08-24T17:26:21Z

I have marked this one ready for review, but I have a couple of questions.

The Oracle backend is still failing, and I wasn't sure if there was a clean way to address that with a SQLGlot method to ensure quoting the identifiers in the Create expression from parse_one or if I should adjust the ct_sql variable to enclose the object identifiers in double-quotes. My concern is that the Snowflake backend may encounter a similar outcome when cloud runs are triggered.
Druid is marked "notyet," but fails because raw_sql is not supported. Should I include that in the list with exasol?

ibis/backends/tests/test_client.py

cpcloud · 2024-08-25T12:46:16Z

I'll add an xfail marker to the Druid backend, since it doesn't support CREATE TABLE.

cpcloud · 2024-08-25T12:51:57Z

Oh, looks like you did that already. Let me see what's failing then.

cpcloud · 2024-08-25T12:54:11Z

@IndexSeek I improved the implementation bit: we can use our Schema.keys() method (whose output behaves like a set) instead of constructing a set and calling issubset with a sequence (which itself will construct a set if the input argument isn't already a set).

This avoids constructing two throwaway sets, the effects of which can show up in wide-table use cases.

…r failure

IndexSeek · 2024-08-25T18:06:16Z

@IndexSeek I improved the implementation bit: we can use our Schema.keys() method (whose output behaves like a set) instead of constructing a set and calling issubset with a sequence (which itself will construct a set if the input argument isn't already a set).

This avoids constructing two throwaway sets, the effects of which can show up in wide-table use cases.

This is a very nice improvement. Thank you for the assistance here and the explanation, @cpcloud!

cpcloud · 2024-08-25T21:13:25Z

Running the cloud backend test suite, then if that's all green this is good to merge!

ibis/backends/tests/test_client.py

cpcloud · 2024-08-25T22:15:13Z

ibis/backends/tests/test_client.py

+    try:
+        db = getattr(con, "current_database", None)
+    except NotImplementedError:
+        db = None


I'm going to put up a separate PR to fix the reason why this is so gross.

IndexSeek force-pushed the feat/insert-provided-columns-only branch from fc06dbe to 5c6d1eb Compare August 16, 2024 01:07

IndexSeek added 3 commits August 22, 2024 20:34

feat(sql): support inserts with default constraints

e493862

test(sql): quote object identifiers in missing column test

c1f595b

test(sql): mark clickhouse as notimpl

dc4a244

IndexSeek force-pushed the feat/insert-provided-columns-only branch from ade767e to dc4a244 Compare August 23, 2024 00:35

cpcloud reviewed Aug 23, 2024

View reviewed changes

ibis/backends/tests/test_client.py Outdated Show resolved Hide resolved

test(sql): refactor sg_expr for parse_one

258271f

IndexSeek force-pushed the feat/insert-provided-columns-only branch from b302ab5 to 258271f Compare August 24, 2024 15:10

IndexSeek added 3 commits August 24, 2024 12:22

test(sql): refactor sg_expr for parse_one

5ec637a

refactor(sql): use subset to compare insert list

4357d9e

Merge branch 'feat/insert-provided-columns-only' of https://github.co…

ae9f298

…m/IndexSeek/ibis into feat/insert-provided-columns-only

IndexSeek marked this pull request as ready for review August 24, 2024 17:17

test(sql): not using temp table for flink

1f0a63a

IndexSeek changed the title ~~feat(sql): support inserts with default constraints [WIP]~~ feat(sql): support inserts with default constraints Aug 24, 2024

cpcloud requested changes Aug 24, 2024

View reviewed changes

ibis/backends/tests/test_client.py Outdated Show resolved Hide resolved

test(sql): quote table and column identifiers

cabd583

IndexSeek force-pushed the feat/insert-provided-columns-only branch from 8c0a4b8 to cabd583 Compare August 24, 2024 20:58

test(sql): mark flink test notyet

a6f53b5

IndexSeek requested a review from cpcloud August 25, 2024 00:21

chore: avoid constructing a new set when we already have the schema keys

3fedd16

test(druid): fix xfail marker

187113e

cpcloud added 2 commits August 25, 2024 08:55

style: kill whitespace change

599c9ca

test: reuse temp_table to get automatic cleanup on either success o…

040d00e

…r failure

cpcloud approved these changes Aug 25, 2024

View reviewed changes

cpcloud added this to the 9.4 milestone Aug 25, 2024

cpcloud added feature Features or general enhancements ddl Issues related to creating or altering data definitions sql Backends that generate SQL labels Aug 25, 2024

cpcloud added the ci-run-cloud Run BigQuery, Snowflake, Databricks, and Athena backend tests label Aug 25, 2024

ibis-docs-bot bot removed the ci-run-cloud Run BigQuery, Snowflake, Databricks, and Athena backend tests label Aug 25, 2024

test: ensure a catalog and database because bigquery requires it

f57d69c

IndexSeek commented Aug 25, 2024

View reviewed changes

ibis/backends/tests/test_client.py Outdated Show resolved Hide resolved

test: continue fixing test

0c2a537

cpcloud reviewed Aug 25, 2024

View reviewed changes

fix(datafusion): raise instead of return

ccf4d02

cpcloud added the ci-run-cloud Run BigQuery, Snowflake, Databricks, and Athena backend tests label Aug 25, 2024

ibis-docs-bot bot removed the ci-run-cloud Run BigQuery, Snowflake, Databricks, and Athena backend tests label Aug 25, 2024

cpcloud merged commit 86a3c06 into ibis-project:main Aug 25, 2024
87 checks passed

IndexSeek deleted the feat/insert-provided-columns-only branch August 26, 2024 01:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sql): support inserts with default constraints #9844

feat(sql): support inserts with default constraints #9844

IndexSeek commented Aug 15, 2024

IndexSeek commented Aug 16, 2024 •

edited

Loading

cpcloud Aug 23, 2024

IndexSeek Aug 24, 2024

IndexSeek commented Aug 24, 2024 •

edited

Loading

IndexSeek commented Aug 24, 2024

cpcloud commented Aug 25, 2024

cpcloud commented Aug 25, 2024

cpcloud commented Aug 25, 2024 •

edited

Loading

IndexSeek commented Aug 25, 2024

cpcloud commented Aug 25, 2024

cpcloud Aug 25, 2024

feat(sql): support inserts with default constraints #9844

feat(sql): support inserts with default constraints #9844

Conversation

IndexSeek commented Aug 15, 2024

Description of changes

Backend error

Successful insert with excluded columns

IndexSeek commented Aug 16, 2024 • edited Loading

cpcloud Aug 23, 2024

Choose a reason for hiding this comment

IndexSeek Aug 24, 2024

Choose a reason for hiding this comment

IndexSeek commented Aug 24, 2024 • edited Loading

IndexSeek commented Aug 24, 2024

cpcloud commented Aug 25, 2024

cpcloud commented Aug 25, 2024

cpcloud commented Aug 25, 2024 • edited Loading

IndexSeek commented Aug 25, 2024

cpcloud commented Aug 25, 2024

cpcloud Aug 25, 2024

Choose a reason for hiding this comment

IndexSeek commented Aug 16, 2024 •

edited

Loading

IndexSeek commented Aug 24, 2024 •

edited

Loading

cpcloud commented Aug 25, 2024 •

edited

Loading