-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add documentation for Iceberg support in PrestoCPP #24741
Add documentation for Iceberg support in PrestoCPP #24741
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the doc! Some nits and suggestions, nothing major.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @agrawalreetika! I see PrestoCPP and Presto C++ were mixed. Let's unify them to Presto C++
are ``HIVE``, ``HADOOP``, and ``NESSIE`` and ``REST``. | ||
|
||
``iceberg.hadoop.config.resources`` The path(s) for Hadoop configuration resources. | ||
``iceberg.hadoop.config.resources`` The path(s) for Hadoop configuration resources. Yes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Presto C++ Support value is missing
as a join with the data of the equality delete files. | ||
|
||
``iceberg.enable-parquet-dereference-pushdown`` Enable parquet dereference pushdown. ``true`` | ||
``iceberg.enable-parquet-dereference-pushdown`` Enable parquet dereference pushdown. ``true`` Yes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Presto C++ Support value is missing
statistics file cache. | ||
======================================================= ============================================================= ============ | ||
======================================================= ============================================================= ================================== =================== ========================================= | ||
|
||
Table Properties |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please also add a section for the C++ support for Table Properties? Just saying it's not suported because write is not implemented yet
@@ -1511,13 +1587,23 @@ schema evolution, such as adding, dropping, and renaming columns. With schema | |||
evolution, users can evolve a table schema with SQL after enabling the Presto | |||
Iceberg connector. | |||
|
|||
Presto C++ Support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for Partition Column Transform. Could you please add Presto C++ support as well? Thanks!
|
||
* Supports reading and writing of DWRF and PARQUET file formats, supports reading ORC file format. | ||
Hive Connector - Iceberg Table Support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the following structure would be clearer
Supported Connectors
- Hive connector
- Hive table support
- Iceberg table support
- TPCH connector
- ....
- Fuzzer connector
aa5b2d3
to
76c9f55
Compare
@steveburnett @yingsu00 Thanks for your review. I have made the changes based on your comments. Please have a look at your convenience. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the doc, it's very helpful. Only a couple of nit and little thing.
|
||
``SELECT`` Yes Yes Read is supported in Presto C++ including those with positional delete files. | ||
|
||
``INSERT INTO`` Yes No |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: duplicate with line 1155.
|
||
* Only read operations are supported for Iceberg tables. | ||
|
||
* The Iceberg connector supports both V1 and V2 tables, including those with positional delete files, but does not support equality delete files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a reader's perspective, I'm a little confused about the Iceberg connector
here, do you think it's more appropriate to use something like Hive connector for Iceberg
or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we need to create an iceberg catalog in Prestissimo as well for accessing Iceberg tables but the underline implementation on Velox is via Hive connector itself which is why I thought of keeping Hive Connector
as main header and then different table support.
I am open for suggestions here whatever we all think is more clear.
@yingsu00 What's your opinion on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didn't describe it clearly. What I mean is, the content structure Hive connector/Iceberg Table Support
is no problem, just the words Iceberg connector
here may bring a little confusion. If I were a newer, I might wonder whether there is an Iceberg connector
for presto C++ that I haven't noticed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agrawalreetika : If we use Java side Iceberg connector or catalog code, then its better to call it Iceberg Connector. The reuse of HiveConnector is an internal detail imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could call it Iceberg connector because it's in Presto C++ point of view, not Velox point of view. But it would be helpful to explain that the Presto C++ Iceberg connector and catalog is backed up by Velox HiveConnector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hantangwangd Thanks for the clarification. So here as @aditi-pandit & @yingsu00 mentioned from Presto C++ point of view it's going to be a different iceberg connector with something like this -
connector.name=iceberg
That is why I thought calling it iceberg connector would be better in the Presto C++ document.
@hantangwangd @aditi-pandit @yingsu00 Please take a look at the document, its already updated based on the points we discussed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got the point, thank you all for your perspective @aditi-pandit @yingsu00 @agrawalreetika.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update! I pulled the updated branch, ran a new local doc build, and reviewed again.
I found a few nits that aren't directly related to your new content here that I hope we can fix here, I don't think that fixing them will add a lot of work or delay the PR.
76c9f55
to
ae78e53
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull updated branch, new local doc build, looks good. Thank you!
as a join with the data of the equality delete files. | ||
|
||
``iceberg.enable-parquet-dereference-pushdown`` Enable parquet dereference pushdown. ``true`` | ||
``iceberg.enable-parquet-dereference-pushdown`` Enable parquet dereference pushdown. ``true`` Yes NA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it NA? If it's fails now, just say No
|
||
Example: ``/etc/hadoop/conf/core-site.xml.`` This property | ||
is required if the iceberg.catalog.type is ``hadoop``. | ||
Otherwise, it will be ignored. | ||
|
||
``iceberg.file-format`` The storage file format for Iceberg tables. The available ``PARQUET`` | ||
``iceberg.file-format`` The storage file format for Iceberg tables. The available ``PARQUET`` Yes NA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a little explanation here why it's NA? We can say "NA, write is not supported yet"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for other NA items. Thanks!
updates. | ||
|
||
``iceberg.delete-as-join-rewrite-enabled`` When enabled, equality delete row filtering is applied ``true`` | ||
``iceberg.delete-as-join-rewrite-enabled`` When enabled, equality delete row filtering is applied ``true`` Yes Yes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think Presto C++ supports it. Presto C++ do not interpret the equality delete file as joins, but would compile it into domain filters or filter functions. Also, reading equality delete file PR is not merged yet. Just say No for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I missed equality delete here.
|
||
``iceberg.rows-for-metadata-optimization-threshold`` The maximum number of partitions in an Iceberg table to ``1000`` | ||
``iceberg.rows-for-metadata-optimization-threshold`` The maximum number of partitions in an Iceberg table to ``1000`` Yes Yes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this one on coordinator only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I checked in code its usage is in IcebergMetadataOptimizer so I am assuming this is coordinator only. Do you think, I should check anything more on this?
===================================================== ======================================================================= =================== ================== | ||
Property Name Description Presto Java Support Presto C++ Support | ||
===================================================== ======================================================================= =================== ================== | ||
``iceberg.delete_as_join_rewrite_enabled`` Overrides the behavior of the connector property Yes Yes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be No, or NA
Presto C++ Support | ||
^^^^^^^^^^^^^^^^^^ | ||
|
||
All above extra hidden metadata tables are supported in Presto C++. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to double confirm that you have verified they are working, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from$changelog
everything works, $changelog
throws an error for C++ which I missed adding here. I opened an issue for the same. #24816
506ae37
ae78e53
to
506ae37
Compare
@yingsu00 I have updated all the |
Presto C++ Support | ||
^^^^^^^^^^^^^^^^^^ | ||
|
||
Table properties are not supported in Presto C++ because write operations have not been implemented. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should consider re-wording this. It's not quite clear what the statement "Table properties are not supported ..." means. Does that mean reading them? writing them? considering them during writes? I think we should be a bit more specific
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should consider re-wording this. It's not quite clear what the statement "Table properties are not supported ..." means. Does that mean reading them? writing them? considering them during writes? I think we should be a bit more specific
I think what @ZacBlanco said makes sense. @agrawalreetika Will you be able to try them and update the doc separating the read and write
Presto C++ Support | ||
^^^^^^^^^^^^^^^^^^ | ||
|
||
All above extra hidden metadata columns are supported in Presto C++. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All above extra hidden metadata columns are supported in Presto C++. | |
All above metadata columns are supported in Presto C++. |
above extra hidden
Three adjectives in a row feels like a lot. I think we can reduce
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here existing extra hidden metadata columns
is taken from the main heading. But if we think we should just rephrase it saying - All above metadata columns are supported in Presto C++
that's fine with me, LMK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feel overly description to have three adjectives in the sentence - I think this gets the point across. Let's remove them
Presto C++ Support | ||
^^^^^^^^^^^^^^^^^^ | ||
|
||
All above extra hidden metadata tables, except `$changelog`, are supported in Presto C++. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All above extra hidden metadata tables, except `$changelog`, are supported in Presto C++. | |
All above metadata tables, except `$changelog`, are supported in Presto C++. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here existing extra hidden metadata tables
is taken from the main heading. But if we think we should just rephrase it saying - All above metadata tables, except
$changelog, are supported in Presto C++.
that's fine with me, LMK.
Presto C++ Support | ||
~~~~~~~~~~~~~~~~~~ | ||
|
||
Read from the tables with Partition column transform is supported in Presto C++. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read from the tables with Partition column transform is supported in Presto C++. | |
Reads from tables with partition column transforms is supported in Presto C++. |
Presto C++ Support | ||
^^^^^^^^^^^^^^^^^^ | ||
|
||
Schema Evolution is supported in Presto C++. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Schema Evolution is supported in Presto C++. | |
Schema evolution is supported in Presto C++. |
Presto C++ Support | ||
^^^^^^^^^^^^^^^^^^ | ||
|
||
Presto C++ supports Parquet writer versions V1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little confusing because earlier the properties section stated that writes are not supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing it point.
It's about actually reading, so I think I can rephrase it something like - Presto C++ supports reading Parquet data written with Parquet writer version V1.
?
Presto C++ Support | ||
^^^^^^^^^^^^^^^^^^ | ||
|
||
Time Travel is supported in Presto C++. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Time Travel is supported in Presto C++. | |
Time travel queries are supported in Presto C++. |
95463f3
to
e2dfc98
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment on consistency in our grammar, otherwise I think this looks great. Thanks!
Schema evolution is supported in Presto C++. | ||
|
||
Parquet Writer Version | ||
---------------------- | ||
|
||
Presto now supports Parquet writer versions V1 and V2 for the Iceberg catalog. | ||
It can be toggled using the session property ``parquet_writer_version`` and the config property ``hive.parquet.writer.version``. | ||
Valid values for these properties are ``PARQUET_1_0`` and ``PARQUET_2_0``. Default is ``PARQUET_1_0``. | ||
|
||
Presto C++ Support | ||
^^^^^^^^^^^^^^^^^^ | ||
|
||
Presto C++ supports reading Parquet data written with Parquet writer version V1. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that across these Presto C++ Support
sections we have varying forms
... is supported in Presto C++
and
Presto C++ supports ....
Two comments on this
- We should be consistent
- I prefer that we standardize on the 2nd as it's in active voice, which is generally the form we want to write for our documentation (@steveburnett can you confirm the 2nd is the form that we should prefer?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should prefer the use of active voice, but exceptions where passive voice is better can happen. Sometimes active voice can be more awkward depending on the emphasis you want to communicate.
See
- the Gitlab Documentation Style Guide entry for Active Voice (shorter)
- the Google Developer Documentation Style Guide entry for Active Voice (longer and more examples)
tl;dr: use active voice, but if active voice is awkward then passive voice is okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this context of the "Presto C++ Support" subtopics, I would argue that passive voice is better.
-
The statements immediately follow the heading "Presto C++ Support" so "Presto C++ supports" repeat the information without adding to it, and
-
Passive voice here follows the guideline in the Google Developer Documentation Style Guide entry for Active Voice "To emphasize an object over an action."
Using passive voice here emphasizes the object "HIVE, NESSIE, REST, and HADOOP Iceberg catalogs" and makes it easier for the user to find the information they need.
See the screenshots I made in a local doc build to show the difference. I think the second one is the better one.


e2dfc98
to
4c57c31
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change and the new consistency! One nit.
4c57c31
to
dfb1896
Compare
dfb1896
to
956efe4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull updated branch, new local doc build, looks good. Thanks!
Description
Add documentation for Iceberg support in PrestoCPP
Motivation and Context
Add documentation for Iceberg support in PrestoCPP
Impact
Iceberg documentation improvement
Test Plan
Contributor checklist
Release Notes