Flink: SQL support for dynamic iceberg sink#15279
Flink: SQL support for dynamic iceberg sink#15279swapna267 wants to merge 6 commits intoapache:mainfrom
Conversation
mxm
left a comment
There was a problem hiding this comment.
Thanks @swapna267! This looks great.
| } | ||
|
|
||
| @TestTemplate | ||
| public void testCreateDynamicIcebergSink() throws DatabaseAlreadyExistException { |
There was a problem hiding this comment.
Could we verify this test works with both the old FlinkSink and the new IcebergSink?
There was a problem hiding this comment.
This test in particular is testing the DynamicIcebergSink only by setting use-dynamic-iceberg-sink to true.
But i also see, TestIcebergConnector is not testing the new IcebergSink code path. Partially it's covered in TestFlinkTableSink (where iceberg tables are created in Iceberg catalog).
If my understanding is right, i prefer to put that into separate PR.
There was a problem hiding this comment.
That makes sense. The test is fine as-is.
| String dynamicRecordGeneratorImpl = | ||
| flinkConf.get(FlinkCreateTableOptions.DYNAMIC_RECORD_GENERATOR_IMPL); | ||
| Preconditions.checkNotNull( | ||
| dynamicRecordGeneratorImpl, | ||
| "%s must be specified when use-dynamic-iceberg-sink is true", | ||
| FlinkCreateTableOptions.DYNAMIC_RECORD_GENERATOR_IMPL.key()); |
There was a problem hiding this comment.
Should we add a test to verify these conditions?
There was a problem hiding this comment.
Sure can add. Don't see such detailed one's in general . Also concerned about the time tests take today to complete.
There was a problem hiding this comment.
We have many such tests for Dynamic Sink. Not specifying the record generator will probably error when it's being created, but it would still be nice to check for the particular error message reported back to the user. I'll leave it up to you to add it or not.
Guosmilesmile
left a comment
There was a problem hiding this comment.
Thanks for the Pr!Left some comments.
flink/v2.1/flink/src/main/java/org/apache/iceberg/flink/IcebergTableSink.java
Outdated
Show resolved
Hide resolved
|
|
||
| private TableCreator createTableCreator() { | ||
| final Map<String, String> tableProperties = | ||
| org.apache.iceberg.util.PropertyUtil.propertiesWithPrefix(writeProps, "table.props."); |
There was a problem hiding this comment.
If I’m not mistaken, if we want to set the table property write.parquet.row-group-size-bytes, do we need to specify it here as table.props.write.parquet.row-group-size-bytes? I think this should be documented and we should add a corresponding test case.
There was a problem hiding this comment.
Yes right. When doing CREATE TABLE in flink catalog , we pass in catalog configuration here.
table.props prefix is used to separate out the physical Iceberg table properties.
Basic documentation about the connector is here,
https://iceberg.apache.org/docs/nightly/flink-connector/
Once we have all functionality (dynamic record generator impl is coming in next PR), will add details there.
I combined this in existing test case , https://github.com/swapna267/iceberg/blob/bd2d500f07fb24d05111b6dabc9a8e77637a922c/flink/v2.1/flink/src/test/java/org/apache/iceberg/flink/TestIcebergConnector.java#L393
I can pull it out into another one if we think it's required.
flink/v2.1/flink/src/main/java/org/apache/iceberg/flink/FlinkDynamicTableFactory.java
Show resolved
Hide resolved
|
Thanks @mxm and @Guosmilesmile for the review. Replied on some comments. |
| .getCatalogLoader() | ||
| .loadCatalog() | ||
| .loadTable(TableIdentifier.of(databaseName(), tableName())); | ||
| assertThat(table.properties()).containsEntry("key1", "val1"); |
There was a problem hiding this comment.
Could we also verify the records written to the table?
| String dynamicRecordGeneratorImpl = | ||
| flinkConf.get(FlinkCreateTableOptions.DYNAMIC_RECORD_GENERATOR_IMPL); | ||
| Preconditions.checkNotNull( | ||
| dynamicRecordGeneratorImpl, | ||
| "%s must be specified when use-dynamic-iceberg-sink is true", | ||
| FlinkCreateTableOptions.DYNAMIC_RECORD_GENERATOR_IMPL.key()); |
There was a problem hiding this comment.
We have many such tests for Dynamic Sink. Not specifying the record generator will probably error when it's being created, but it would still be nice to check for the particular error message reported back to the user. I'll leave it up to you to add it or not.
|
LGTM, just some minor comments. |
| tableLoader, resolvedSchema, context.getConfiguration(), writeProps); | ||
| return new IcebergTableSink( | ||
| tableLoader, | ||
| resolvedCatalogTable.getResolvedSchema(), |
There was a problem hiding this comment.
Why did we remove the filtering for the physical columns?
| Context context, Configuration flinkConf, Map<String, String> writeProps) { | ||
| String dynamicRecordGeneratorImpl = | ||
| flinkConf.get(FlinkCreateTableOptions.DYNAMIC_RECORD_GENERATOR_IMPL); | ||
| Preconditions.checkNotNull( |
There was a problem hiding this comment.
I have received several comments that instead of checkNotNull we should use checkArgument and a message like Invalid dynamic record generator value: %s. %s must be specified when use-dynamic-iceberg-sink is true.
| return new IcebergTableSink( | ||
| catalogLoader, | ||
| dynamicRecordGeneratorImpl, | ||
| resolvedCatalogTable.getResolvedSchema(), |
There was a problem hiding this comment.
Do we need something like this?
ResolvedSchema resolvedSchema =
ResolvedSchema.of(
resolvedCatalogTable.getResolvedSchema().getColumns().stream()
.filter(Column::isPhysical)
.collect(Collectors.toList()));
|
|
||
| private static FlinkCatalog createCatalogLoader( | ||
| Map<String, String> tableProps, String catalogName) { | ||
| Preconditions.checkNotNull( |
There was a problem hiding this comment.
use checkArgument and "standard error message"
There was a problem hiding this comment.
I see that this is only a move for this check. Do you think it would cause any issues if we change this to the new standard?
| org.apache.hadoop.conf.Configuration hadoopConf = FlinkCatalogFactory.clusterHadoopConf(); | ||
| FlinkCatalogFactory factory = new FlinkCatalogFactory(); | ||
| return (FlinkCatalog) factory.createCatalog(catalogName, tableProps, hadoopConf); |
There was a problem hiding this comment.
nit: Just to get rid of the org.apache.hadoop.conf.Configuration
| org.apache.hadoop.conf.Configuration hadoopConf = FlinkCatalogFactory.clusterHadoopConf(); | |
| FlinkCatalogFactory factory = new FlinkCatalogFactory(); | |
| return (FlinkCatalog) factory.createCatalog(catalogName, tableProps, hadoopConf); | |
| FlinkCatalogFactory factory = new FlinkCatalogFactory(); | |
| return (FlinkCatalog) factory.createCatalog(catalogName, tableProps, FlinkCatalogFactory.clusterHadoopConf();); |
|
Nice stuff @swapna267! Could you please update the documentation too? |
This PR introduces a SQL table connector for using the dynamic iceberg sink.
Two new configuration options have been added to FlinkCreateTableOptions:
Example SQL,
Planning to provide a CustomVariantToDynamicRecordGenerator that can handle Flink VARIANT type column to generate records of different schemas landing in tables of corresponding schema.
Will add that in a different PR.