-
Notifications
You must be signed in to change notification settings - Fork 0
Clp integration fix #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…ataProvider and ClpMySqlSplitProvider
…ath to splitPath 2. Fix a bug with array type annotation in ClpSchemaTree 3. Update table schema for ClpMySqlSplitProvider and ClpMySqlMetadataProvider and update tests accordingly
WalkthroughA new "presto-clp" connector module has been added to the Presto project. This includes all connector components, metadata and split providers (with MySQL support), configuration, error codes, planning and filter pushdown logic, and test coverage. Documentation and build/provisioning integration for the connector are also included. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant PrestoEngine
participant ClpConnector
participant ClpMetadataProvider
participant ClpSplitProvider
participant MySQL
User->>PrestoEngine: Submit SQL query on CLP catalog
PrestoEngine->>ClpConnector: beginTransaction()
PrestoEngine->>ClpConnector: getMetadata()
PrestoEngine->>ClpConnector: getSplitManager()
PrestoEngine->>ClpConnector: getRecordSetProvider()
PrestoEngine->>ClpConnector: getConnectorPlanOptimizerProvider()
PrestoEngine->>ClpMetadataProvider: listTableNames(), listColumnHandles()
ClpMetadataProvider->>MySQL: Query metadata tables
MySQL-->>ClpMetadataProvider: Return schema/table/column info
ClpMetadataProvider-->>PrestoEngine: Return metadata
PrestoEngine->>ClpSplitProvider: listSplits()
ClpSplitProvider->>MySQL: Query archive/split tables
MySQL-->>ClpSplitProvider: Return split info
ClpSplitProvider-->>PrestoEngine: Return splits
PrestoEngine->>ClpConnector: shutdown()
sequenceDiagram
participant PrestoEngine
participant ClpPlanOptimizer
participant ClpFilterToKqlConverter
PrestoEngine->>ClpPlanOptimizer: Optimize plan with filter on table scan
ClpPlanOptimizer->>ClpFilterToKqlConverter: Convert filter RowExpression to KQL
ClpFilterToKqlConverter-->>ClpPlanOptimizer: Return KQL and remaining predicate
ClpPlanOptimizer-->>PrestoEngine: Rewrite plan with pushed-down KQL filter
Tip ⚡️ Faster reviews with caching
Enjoy the performance boost—your workflow just got faster. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🔇 Additional comments (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java
Outdated
Show resolved
Hide resolved
public interface ClpMetadataProvider | ||
{ | ||
/** | ||
* Returns the list of column handles for the given table. | ||
*/ | ||
public List<ClpColumnHandle> listColumnHandles(SchemaTableName schemaTableName); | ||
|
||
/** | ||
* Returns the list of table names in the given schema. | ||
*/ | ||
public List<String> listTableNames(String schema); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public interface ClpMetadataProvider | |
{ | |
/** | |
* Returns the list of column handles for the given table. | |
*/ | |
public List<ClpColumnHandle> listColumnHandles(SchemaTableName schemaTableName); | |
/** | |
* Returns the list of table names in the given schema. | |
*/ | |
public List<String> listTableNames(String schema); | |
} | |
public interface ClpMetadataProvider | |
{ | |
String DEFAULT_SCHEMA_NAME = "default"; | |
/** | |
* Returns the list of column handles for the given table. | |
*/ | |
public List<ClpColumnHandle> listColumnHandles(SchemaTableName schemaTableName); | |
/** | |
* Returns the list of table names in the given schema. | |
*/ | |
public List<String> listTableNames(String schema); | |
List<String> listSchemaNames(); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DEFAULT_SCHEMA_NAME is in ClpMetadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think provider should not be aware of DEFAULT_SCHEMA_NAME, and DEFAULT_SCHEMA_NAME is not used in the provider
private static final String DEFAULT_SCHEMA_NAME = "default"; | ||
private final ClpMetadataProvider clpMetadataProvider; | ||
private final LoadingCache<SchemaTableName, List<ClpColumnHandle>> columnHandleCache; | ||
private final LoadingCache<String, List<String>> tableNameCache; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private final LoadingCache<String, List<String>> tableNameCache; | |
private final LoadingCache<String, List<String>> tableNameCache; | |
private final LoadingCache<String, List<String>> schemaNameCache; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only support one schema, and there's no need to add a cache for it
this.tableNameCache = CacheBuilder.newBuilder() | ||
.expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS) | ||
.refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS) | ||
.build(CacheLoader.from(this::loadTableNames)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this.tableNameCache = CacheBuilder.newBuilder() | |
.expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS) | |
.refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS) | |
.build(CacheLoader.from(this::loadTableNames)); | |
this.tableNameCache = CacheBuilder.newBuilder() | |
.expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS) | |
.refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS) | |
.build(CacheLoader.from(this::loadTableNames)); | |
this.schemaNameCache = CacheBuilder.newBuilder() | |
.expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS) | |
.refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS) | |
.build(CacheLoader.from(this::loadSchemaNames)); |
{ | ||
return columnHandleCache.getUnchecked(schemaTableName); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private List<String> loadSchemaNames() | |
{ | |
return clpMetadataProvider.listSchemaNames(); | |
} | |
@Override | ||
public List<SchemaTableName> listTables(ConnectorSession session, Optional<String> schemaName) | ||
{ | ||
String schemaNameValue = schemaName.orElse(DEFAULT_SCHEMA_NAME); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String schemaNameValue = schemaName.orElse(DEFAULT_SCHEMA_NAME); | |
String schemaNameValue = schemaName.orElse(ClpMetadataProvider.DEFAULT_SCHEMA_NAME); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 49
🔭 Outside diff range comments (1)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (1)
195-205
: 🧹 Nitpick (assertive)Follow project guideline: prefer
false == <expr>
over!<expr>
Coding-guidelines explicitly require
false == <expression>
for negation in Java code.
!hasDefinition
violates this rule and appears several times in the class (and in other new classes).Example fix:
- if (!expression.getDefinition().isPresent()) { + if (false == expression.getDefinition().isPresent()) {Please run a quick search-replace to stay consistent with the code-base conventions.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (34)
pom.xml
(2 hunks)presto-clp/pom.xml
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpHandleResolver.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlugin.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplitManager.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTransactionHandle.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpNodeType.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitProvider.java
(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java
(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java
(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java
(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
(1 hunks)presto-docs/src/main/sphinx/connector.rst
(1 hunks)presto-docs/src/main/sphinx/connector/clp.rst
(1 hunks)presto-server/src/main/provisio/presto.xml
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
: - Preferfalse == <expression>
rather than!<expression>
.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpHandleResolver.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitProvider.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTransactionHandle.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpNodeType.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlugin.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplitManager.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
🧬 Code Graph Analysis (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitProvider.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1)
ClpSplit
(30-64)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1)
ClpTableLayoutHandle
(23-73)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
ClpColumnHandle
(26-114)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
presto-spi/src/main/java/com/facebook/presto/spi/ColumnMetadata.java (1)
ColumnMetadata
(29-205)
⏰ Context from checks skipped due to timeout of 90000ms (20)
- GitHub Check: hive-dockerized-tests (17.0.13)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (8, :presto-accumulo)
- GitHub Check: test-other-modules (17.0.13)
- GitHub Check: spark-integration (17.0.13)
- GitHub Check: spark-integration (8)
- GitHub Check: product-tests-specific-environment1 (17.0.13)
- GitHub Check: product-tests-specific-environment2 (17.0.13)
- GitHub Check: product-tests-specific-environment1 (8)
- GitHub Check: product-tests-specific-environment2 (8)
- GitHub Check: singlestore-dockerized-tests (17.0.13)
- GitHub Check: singlestore-dockerized-tests (8)
- GitHub Check: product-tests-basic-environment (17.0.13)
- GitHub Check: product-tests-basic-environment (8)
- GitHub Check: maven-checks (8)
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: test (:presto-docs)
- GitHub Check: arrowflight-java-tests (17.0.13, :presto-base-arrow-flight)
- GitHub Check: arrowflight-java-tests (8, :presto-base-arrow-flight)
- GitHub Check: prestocpp-linux-build-for-test
🔇 Additional comments (28)
presto-docs/src/main/sphinx/connector.rst (1)
17-17
: Looks good!The CLP connector entry has been added to the documentation in the proper alphabetical order, consistent with the existing format.
presto-server/src/main/provisio/presto.xml (1)
236-240
: Proper integration of CLP connector artifactThe artifact definition follows the same structure as other connector plugins and is correctly positioned in the file. This ensures the CLP connector will be properly included in the Presto server distribution.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitProvider.java (1)
21-27
: Interface design looks goodThe interface is well-defined with a clear single responsibility - listing splits for a given table layout handle. The method signature and JavaDoc are appropriate.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTransactionHandle.java (1)
18-22
: Proper transaction handle implementationThe singleton enum pattern is appropriate for a connector transaction handle. This implementation is consistent with other Presto connector transaction handles.
pom.xml (2)
217-217
: Integration of presto-clp module looks good.The new module has been properly added to the list of modules in the root pom.xml file.
823-827
: Dependency management entry for presto-clp module is properly defined.The dependency management section correctly includes the new presto-clp module with the appropriate project version, following the pattern used for other modules.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlugin.java (1)
1-28
: Plugin class implementation looks good.The
ClpPlugin
class correctly implements the Presto SPIPlugin
interface and properly registers the connector factory.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java (1)
22-43
: Well-structured error code implementationThe error code implementation follows Presto's conventions well with:
- Descriptive names for each error scenario
- Proper use of the EXTERNAL error type for connector errors
- Good organization with sequential code values and a unique base offset (0x0400_0000)
- Complete implementation of the ErrorCodeSupplier interface
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpHandleResolver.java (1)
23-55
: Complete handle resolver implementationThe implementation correctly maps all the required handle types to their CLP-specific implementations, following the standard pattern for Presto connector handle resolvers.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (2)
39-46
: Appropriate metadata provider configuration and error handlingThe implementation correctly binds the MySQL metadata provider based on configuration and throws appropriate exceptions for unsupported types.
27-37
: Core component bindings look goodThe singleton bindings for core connector components follow Presto's standard practices.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpNodeType.java (2)
32-46
: LGTM: Efficient lookup table initializationThe static initialization block correctly handles potential gaps in byte code values by finding the maximum type code and creating an appropriately sized lookup table.
55-57
: 🧹 Nitpick (assertive)Consider using yoda condition for the null check
The condition check could be rewritten to follow the coding guideline of preferring
false == <expression>
rather than!<expression>
style.- if (type < 0 || type >= LOOKUP_TABLE.length || LOOKUP_TABLE[type] == null) { + if (type < 0 || type >= LOOKUP_TABLE.length || null == LOOKUP_TABLE[type]) {Likely an incorrect or invalid review comment.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplitManager.java (1)
33-37
: LGTM: Proper dependency injection with null checkThe constructor properly injects the required dependency and validates it with
requireNonNull
.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java (1)
54-60
: LGTM: Comprehensive binding of required servicesThe code correctly binds all required services from the ConnectorContext to make them available for dependency injection.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1)
36-39
: LGTM: Proper JSON serialization annotationsThe class correctly uses Jackson annotations for JSON serialization and deserialization.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1)
52-57
: 🧹 Nitpick (assertive)Use preferred null check format
According to the coding guidelines, prefer
false == <expression>
rather than!<expression>
. This applies to the null check in the equals method.- if (o == null || getClass() != o.getClass()) { + if (false == (o != null) || false == (getClass() == o.getClass())) { return false; }Likely an incorrect or invalid review comment.
presto-clp/pom.xml (1)
86-96
:✅ Verification successful
Inconsistent Airlift dependency group IDs
There's an inconsistency in the group IDs for Airlift dependencies. Early dependencies use
com.facebook.airlift
, but these useio.airlift
, which might lead to conflicts.
🏁 Script executed:
#!/bin/bash # Check for potential version conflicts between the different Airlift dependencies # 1. Look for airlift dependencies in the current project echo "Airlift dependencies in this project:" grep -A2 "artifactId>.*airlift" presto-clp/pom.xml # 2. Check parent pom for dependency definitions echo -e "\nAirlift dependency definitions in parent pom:" grep -A10 -n "<artifactId>.*airlift" pom.xml | head -20Length of output: 302
To verify whether there’s truly a mix of
com.facebook.airlift
andio.airlift
dependencies, let’s search for all AirliftgroupId
entries in both the child and parent POMs, and across the project:This will confirm if both group IDs are used and pinpoint where they occur.
🏁 Script executed:
#!/bin/bash set -e echo "Searching for Airlift groupId entries in presto-clp/pom.xml" grep -n "<groupId>.*airlift" presto-clp/pom.xml || true echo -e "\nSearching for Airlift groupId entries in parent pom.xml" grep -R -n "<groupId>.*airlift" pom.xml || true echo -e "\nSearching for any com.facebook.airlift occurrences across the project" grep -R -n "com.facebook.airlift" . || true echo -e "\nSearching for any io.airlift occurrences across the project" grep -R -n "io.airlift" . || trueLength of output: 125540
Airlift groupId mix is expected—no change needed
The project legitimately uses bothcom.facebook.airlift
(for legacy modules) andio.airlift
(for relocated modules likeunits
andslice
). Versions for all Airlift artifacts are controlled in the parent POM, so there’s no conflict at runtime.• presto-clp/pom.xml lines 28–43: legacy
com.facebook.airlift
deps
• presto-clp/pom.xml lines 87–93: newio.airlift
deps
• parent pom.xml: dependencyManagement pins versions for both groupIdspresto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (5)
29-41
: Well-designed immutable class with proper documentation.The class properly implements an immutable data structure with final fields and comprehensive JavaDoc that clearly explains the three possible states of expression conversion. The constructor correctly initializes all fields.
43-47
: LGTM - constructor delegation is properly implemented.The no-args constructor correctly delegates to the primary constructor with empty optionals for both fields, maintaining immutability.
49-53
: LGTM - single definition constructor is well implemented.This constructor properly handles the case of a fully translatable KQL string, setting the definition and leaving the remaining expression empty.
55-59
: LGTM - remaining expression constructor is well implemented.This constructor properly handles the case of a non-translatable expression, setting the remaining expression and leaving the definition empty.
61-69
: LGTM - getters are properly implemented.The getter methods are correctly implemented to return the Optional fields.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (3)
26-45
: LGTM - Well-designed immutable column handle class.The class properly implements the ColumnHandle interface and follows an immutable pattern with final fields. The JsonCreator and JsonProperty annotations are correctly used for serialization/deserialization.
47-50
: LGTM - Convenience constructor with proper delegation.This convenience constructor properly delegates to the primary constructor, setting the original column name to be the same as the column name.
76-83
: LGTM - ColumnMetadata creation is well implemented.The method correctly uses the builder pattern to create a ColumnMetadata instance with the appropriate properties.
presto-docs/src/main/sphinx/connector/clp.rst (1)
1-182
: Comprehensive connector documentation with clear explanations.The documentation is well-structured and provides comprehensive information about the CLP connector, including:
- Clear overview and configuration instructions
- Detailed explanation of all configuration properties with defaults
- Thorough description of data type mappings
- Explanation of metadata and split providers
- Clear examples for various data structures
- SQL support and limitations
The documentation will be very helpful for users setting up and using the CLP connector.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1)
67-70
: Isolation level argument is silently ignored
beginTransaction()
accepts anIsolationLevel
, but the chosenClpTransactionHandle.INSTANCE
doesn’t seem to differentiate based on that parameter. Please confirm that CLP truly has no need to track isolation or read-only flags; otherwise we should create distinct handles or, at minimum, validate/throw on unsupported levels.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpNodeType.java
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 13
♻️ Duplicate comments (17)
presto-clp/pom.xml (1)
5-19
: Add<build>
section to pom.xmlThe POM file is missing a
<build>
section which is present in other connector modules. This section is necessary to configure build plugins for thepresto-plugin
packaging type.Consider adding a
<build>
section after the<properties>
block:<properties> <air.main.basedir>${project.parent.basedir}</air.main.basedir> </properties> + <build> + <plugins> + <plugin> + <groupId>org.apache.maven.plugins</groupId> + <artifactId>maven-surefire-plugin</artifactId> + <configuration> + <skipTests>true</skipTests> + </configuration> + </plugin> + </plugins> + </build>presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1)
21-32
: 🧹 Nitpick (assertive)Consider adding a method to list available schemas
The interface provides methods to list tables and columns but lacks a method to list available schemas. This would provide a complete metadata API.
public interface ClpMetadataProvider { /** * Returns the list of column handles for the given table. */ List<ClpColumnHandle> listColumnHandles(SchemaTableName schemaTableName); /** * Returns the list of table names in the given schema. */ List<String> listTableNames(String schema); + + /** + * Returns the list of all available schema names. + */ + List<String> listSchemaNames(); }presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (4)
50-50
: Consider using platform-independent temp directoryThe hard-coded path
/tmp/split_testdb
might not work on all operating systems, particularly Windows.- final String metadataDbUrl = "jdbc:h2:file:/tmp/split_testdb;MODE=MySQL;DATABASE_TO_UPPER=FALSE"; + final String tempDir = System.getProperty("java.io.tmpdir"); + final String metadataDbUrl = String.format("jdbc:h2:file:%s/split_testdb;MODE=MySQL;DATABASE_TO_UPPER=FALSE", tempDir);
80-81
: Use platform-independent path constructionHard-coded path separators may cause issues on different operating systems.
- pstmt.setString(2, "/tmp/archives/" + tableName); + pstmt.setString(2, new File(System.getProperty("java.io.tmpdir"), "archives" + File.separator + tableName).getPath());
115-123
: Improve file cleanup handling in tearDown methodThe current implementation has several issues:
- Uses hard-coded paths that aren't platform-independent
- Doesn't handle exceptions during file deletion
- File deletion failures are silently ignored
@AfterMethod public void tearDown() { - File dbFile = new File("/tmp/split_testdb.mv.db"); - File lockFile = new File("/tmp/split_testdb.trace.db"); // Optional, H2 sometimes creates this - if (dbFile.exists()) { - dbFile.delete(); - System.out.println("Deleted database file: " + dbFile.getAbsolutePath()); - } - if (lockFile.exists()) { - lockFile.delete(); - } + String tempDir = System.getProperty("java.io.tmpdir"); + File dbFile = new File(tempDir, "split_testdb.mv.db"); + File lockFile = new File(tempDir, "split_testdb.trace.db"); // Optional, H2 sometimes creates this + try { + if (dbFile.exists() && !dbFile.delete()) { + System.err.println("Failed to delete database file: " + dbFile.getAbsolutePath()); + } + else if (dbFile.exists()) { + System.out.println("Deleted database file: " + dbFile.getAbsolutePath()); + } + + if (lockFile.exists() && !lockFile.delete()) { + System.err.println("Failed to delete lock file: " + lockFile.getAbsolutePath()); + } + } + catch (Exception e) { + System.err.println("Error cleaning up database files: " + e.getMessage()); + } }
129-136
: Enhance test coverage for ClpSplit propertiesThe test only verifies the split path format but doesn't check other properties of the splits.
@Test public void testListSplits() { ClpSplitProvider splitProvider = new ClpMySqlSplitProvider(config); for (String tableName : TABLE_NAME_LIST) { ClpTableLayoutHandle layoutHandle = new ClpTableLayoutHandle(new ClpTableHandle(new SchemaTableName(TABLE_SCHEMA, tableName)), Optional.empty()); List<ClpSplit> splits = splitProvider.listSplits(layoutHandle); assertEquals(splits.size(), NUM_SPLITS); for (int i = 0; i < NUM_SPLITS; i++) { - assertEquals(splits.get(i).getPath(), "/tmp/archives/" + tableName + "/id_" + i); + String expectedPath = "/tmp/archives/" + tableName + "/id_" + i; + assertEquals(splits.get(i).getPath(), expectedPath); + // Verify other important split properties + assertNotNull(splits.get(i).getConnectorId(), "ConnectorId should not be null"); + assertNotNull(splits.get(i).getConnectorSplitIdentity(), "SplitIdentity should not be null"); + // Add more assertions for other properties as needed } } }presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)
49-56
: Appropriate split provider configuration and error handlingThe implementation correctly binds the MySQL split provider based on configuration and throws appropriate exceptions for unsupported types.
For future extensibility, if more provider types are added later, you might consider refactoring these conditionals to a more scalable approach like a provider registry or factory pattern.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (2)
31-37
: Document the meaning of null splitPathThe
path
field is used without documentation explaining what a null value means in this context.Add a comment or Javadoc explaining the significance of a null
path
.
57-61
: Consider providing more specific info in getInfo()The
getInfo()
method returns the instance itself, which might not be the most informative approach if specific split information is needed.- public Object getInfo() - { - return this; - } + public Object getInfo() + { + return ImmutableMap.of("splitPath", path == null ? "NULL" : path); + }This would require adding the following import:
import com.google.common.collect.ImmutableMap;presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
97-99
: 🛠️ Refactor suggestionAlign equality check with coding guidelines
The equality check doesn't follow the coding guideline to prefer
false == <expression>
rather than!<expression>
.- if (obj == null || getClass() != obj.getClass()) { + if (false == (obj instanceof ClpColumnHandle)) { return false; }presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1)
85-93
: Swallowed shutdown exception still unresolved (same as previous review)
Theshutdown()
method logs the exception but lets the connector continue shutting down silently, exactly as noted in the prior review. Consider re-throwing aRuntimeException
or propagating a failed future so that coordinators/workers surface the failure.presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1)
60-65
: 🧹 Nitpick (assertive)Conform to project negation style & guard against NPE
The condition violates the guideline favouring
false == expr
:- if (dbName != null && !dbName.isEmpty()) { + if (null != dbName && false == dbName.isEmpty()) {(You might also consider issuing
USE ?
through a prepared statement rather than string concatenation.)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (2)
152-162
: Style guideline breach – usefalse == expr
instead of!expr
Lines 152 and 162 violate the project-wide rule that prefers
false == <expression>
to the logical-NOT operator. Earlier feedback already highlighted the same issue elsewhere in this class, so please sweep the new code as well.- if (!polymorphicTypeEnabled) { + if (false == polymorphicTypeEnabled) { … - else if (!existing.type.equals(newType)) { + else if (false == existing.type.equals(newType)) {
135-146
: 🛠️ Refactor suggestionColumn order is nondeterministic – iterate deterministically
HashMap
iteration makes the order of columns returned bycollectColumnHandles()
JVM-dependent, which can lead to unstable metadata and hard-to-debug mismatches between workers.- for (Map.Entry<String, ClpNode> entry : root.children.entrySet()) { + root.children.entrySet().stream() + .sorted(Map.Entry.comparingByKey()) // deterministic order + .forEach(entry -> { + String name = entry.getKey(); + ClpNode child = entry.getValue(); + if (child.isLeaf()) { + columns.add(new ClpColumnHandle(name, child.originalName, child.type, true)); + } + else { + Type rowType = buildRowType(child); + columns.add(new ClpColumnHandle(name, child.originalName, rowType, true)); + } + }); - … - }presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1)
98-113
: Style guideline breach – preferfalse == …
The three
if
statements at lines 98-100 and 112-113 negate predicates with!
. To stay consistent with project conventions, rewrite them usingfalse ==
.- if (!listSchemaNames(session).contains(schemaNameValue)) { + if (false == listSchemaNames(session).contains(schemaNameValue)) { … - if (!listSchemaNames(session).contains(schemaName)) { + if (false == listSchemaNames(session).contains(schemaName)) {presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (2)
61-66
: Makecity
immutable to avoid accidental state mutationAll other column-handle constants are declared
final
, butcity
is not. Making it mutable is unnecessary and increases the risk that tests mutate shared state between runs.- protected static final ClpColumnHandle city = new ClpColumnHandle("city", RowType.from(ImmutableList.of( + protected static final ClpColumnHandle city = new ClpColumnHandle("city", RowType.from(ImmutableList.of(
98-101
: Reuse a singleSqlParser
instance to reduce object churn
expression()
instantiates a newSqlParser
for every call. These objects are heavyweight; keeping one static parser (thread-safe in Presto) improves test execution time.- return ExpressionUtils.rewriteIdentifiersToSymbolReferences(new SqlParser().createExpression(sql, new ParsingOptions(ParsingOptions.DecimalLiteralTreatment.AS_DECIMAL))); + return ExpressionUtils.rewriteIdentifiersToSymbolReferences(SQL_PARSER.createExpression(sql, PARSING_OPTIONS));And add:
private static final SqlParser SQL_PARSER = new SqlParser(); private static final ParsingOptions PARSING_OPTIONS = new ParsingOptions(ParsingOptions.DecimalLiteralTreatment.AS_DECIMAL);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (15)
presto-clp/pom.xml
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java
(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java
(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
: - Preferfalse == <expression>
rather than!<expression>
.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java
🧬 Code Graph Analysis (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
ClpColumnHandle
(26-117)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
ClpConfig
(21-189)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1)
ClpSplit
(28-62)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1)
ClpTableLayoutHandle
(25-79)
⏰ Context from checks skipped due to timeout of 90000ms (20)
- GitHub Check: test (8, :presto-orc)
- GitHub Check: test (8, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: product-tests-basic-environment (8)
- GitHub Check: product-tests-basic-environment (17.0.13)
- GitHub Check: test-other-modules (8)
- GitHub Check: test-other-modules (17.0.13)
- GitHub Check: hive-dockerized-tests (17.0.13)
- GitHub Check: hive-tests (8)
- GitHub Check: hive-dockerized-tests (8)
- GitHub Check: hive-tests (17.0.13)
- GitHub Check: kudu (8)
- GitHub Check: kudu (17.0.13)
- GitHub Check: spark-integration (17.0.13)
- GitHub Check: spark-integration (8)
- GitHub Check: maven-checks (8)
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: arrowflight-java-tests (17.0.13, :presto-base-arrow-flight)
- GitHub Check: arrowflight-java-tests (8, :presto-base-arrow-flight)
- GitHub Check: test (:presto-docs)
- GitHub Check: prestocpp-linux-build-for-test
🔇 Additional comments (9)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java (1)
28-35
: Implementation looks good!The
getRecordSet
method correctly throws anUnsupportedOperationException
with a clear error message, which is better than returningnull
that would cause aNullPointerException
at runtime.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (2)
33-38
: Class bindings look good!The module correctly binds all core connector components as singletons and configures the ClpConfig.
40-47
: Appropriate metadata provider configuration and error handlingThe implementation correctly builds the configuration object and binds the appropriate metadata provider based on the configured type, with proper error handling for unsupported types.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (2)
61-78
: Implementation of equals, hashCode, and toString methods is correctThe methods correctly include both fields (
table
andkqlQuery
), ensuring proper equality comparison and string representation.
57-59
: 🛠️ Refactor suggestionAlign equality check with coding guidelines
The equality check doesn't follow the coding guideline to prefer
false == <expression>
rather than!<expression>
.- if (o == null || getClass() != o.getClass()) { + if (false == (o instanceof ClpTableLayoutHandle)) { return false; }Likely an incorrect or invalid review comment.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
110-115
: ToString method implementation looks goodThe toString implementation now correctly includes all fields: columnName, originalColumnName, columnType, and nullable.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1)
55-58
: ✔️ Cached optimiser provider – good catchSwitching from per-call construction to a constructor-injected, cached
planOptimizerProvider
eliminates unnecessary allocations and double-registration of optimisers. This addresses the earlier feedback perfectly.presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (1)
110-116
:originalName
can diverge from map key after conflict renamingWhen an intermediate node is renamed (lines 110-114), its
originalName
field remains the old segment. Down-stream consumers that rely onoriginalName
for display or mapping might become confused.Consider updating
existing.originalName = renamedExisting;
after the move, or add explanatory Javadoc clarifying why the old value is retained.presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (1)
74-74
: 🧹 Nitpick (assertive)Make typeProvider final
This variable is initialized only once and never reassigned, so it should be marked as
final
to prevent accidental modification.- protected final TypeProvider typeProvider = TypeProvider.fromVariables(variableToColumnHandleMap.keySet()); + protected static final TypeProvider typeProvider = TypeProvider.fromVariables(variableToColumnHandleMap.keySet());Likely an incorrect or invalid review comment.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java
Show resolved
Hide resolved
a824146
to
30cf714
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
♻️ Duplicate comments (23)
presto-clp/pom.xml (1)
1-128
: Add<build>
section to presto-clp/pom.xmlThe CLP connector POM omits a
<build>
block. All other connector modules (e.g. presto-accumulo, presto-base-arrow-flight) declare build plugins—such as the Maven Surefire and Enforcer plugins—to ensure thepresto-plugin
packaging is handled correctly.Please add a
<build>
section immediately after the<properties>
block inpresto-clp/pom.xml
. For example:</properties> + + <build> + <plugins> + <plugin> + <groupId>org.apache.maven.plugins</groupId> + <artifactId>maven-surefire-plugin</artifactId> + <configuration> + <skipTests>true</skipTests> + </configuration> + </plugin> + <!-- include any additional plugins (enforcer, packaging, etc.) as in other connectors --> + </plugins> + </build>presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)
49-56
: LGTM! Appropriate split provider configuration and error handlingThe implementation correctly binds the MySQL split provider based on configuration and throws appropriate exceptions for unsupported types.
For future extensibility, if more provider types are added later, you might consider refactoring these conditionals to a more scalable approach like a provider registry or factory pattern.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1)
57-61
:getInfo()
still returnsthis
– consider exposing useful details.
This mirrors the earlier feedback but has not been addressed. Returning the full object leaks implementation details while providing little actionable information to the coordinator UI.Re-expose only the data that is actually informative, e.g.
- public Object getInfo() - { - return this; - } + public Object getInfo() + { + return ImmutableMap.of("path", path); + }(This requires
import com.google.common.collect.ImmutableMap;
)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (4)
50-56
: Hard-coded/tmp
makes tests OS-specific.Prior review suggested using
java.io.tmpdir
; the literal/tmp
will fail on Windows and some container setups.
80-83
: Path construction is not platform-independent.Concatenate via
File
orPaths
to avoid “/” assumptions.
115-123
: Tear-down still silently ignores deletion failures and hard-codes paths.Re-use
java.io.tmpdir
, handledelete()
return value, and log failures.
135-136
: Assertion uses hard-coded separators.Use
File
/Paths
when buildingexpectedPath
so the test passes on Windows.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (2)
97-100
: 🛠️ Refactor suggestionUse
instanceof
to respect subclassing & follow style guide.
getClass() != obj.getClass()
prevents equality with subclasses and violates the project’s guideline favouringfalse == (obj instanceof …)
over negated comparisons.- if (obj == null || getClass() != obj.getClass()) { + if (false == (obj instanceof ClpColumnHandle)) { return false; } - ClpColumnHandle other = (ClpColumnHandle) obj; + ClpColumnHandle other = (ClpColumnHandle) obj;
101-105
: 🧹 Nitpick (assertive)Avoid boxing boolean in
Objects.equals
.
nullable
is a primitive; wrapping it forObjects.equals
is wasteful and less readable.- Objects.equals(this.columnType, other.columnType) && - Objects.equals(this.nullable, other.nullable); + Objects.equals(this.columnType, other.columnType) && + this.nullable == other.nullable;presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1)
85-92
: Swallowing exceptions on shutdown hides resource-leak issues.If
lifeCycleManager.stop()
throws, the connector reports success even though resources may still be open. Re-throw aRuntimeException
(Presto will surface it) after logging, or return a failed future if this becomes async.catch (Exception e) { log.error(e, "Error shutting down connector"); - } + throw new RuntimeException("Failed to shut down Clp connector", e); + }presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1)
106-107
: Logger throwable placed in argument list, stack-trace is lostSame issue flagged earlier:
Logger
expects the throwable as the first parameter.- log.warn("Database error while processing splits for %s: %s", tableName, e); + log.warn(e, "Database error while processing splits for %s", tableName);presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (3)
86-87
: Throwable parameter position in log call
log.warn("…%s: %s", value, e)
drops the stack trace. Passe
first, as already highlighted in previous reviews.Also applies to: 111-112
75-80
:⚠️ Potential issueIdentifier validation not applied – SQL-injection risk repeats
listColumnHandles()
embedsschemaTableName.getTableName()
straight into the query. UnlikelistTableNames()
, no call toisValidIdentifier()
is made.-String query = String.format(QUERY_SELECT_COLUMNS, config.getMetadataTablePrefix(), schemaTableName.getTableName()); +String table = schemaTableName.getTableName(); +if (false == isValidIdentifier(table)) { + throw new IllegalArgumentException("Invalid table identifier: " + table); +} +String query = String.format(QUERY_SELECT_COLUMNS, config.getMetadataTablePrefix(), table);Please apply similar hardening wherever table identifiers are interpolated.
59-63
: 🧹 Nitpick (assertive)Negation style & safer “USE” execution
Lines 61-63 violate the house style (
false == expr
) and build a SQL command with simple string concatenation.-if (dbName != null && !dbName.isEmpty()) { - connection.createStatement().execute("USE " + dbName); +if (null != dbName && false == dbName.isEmpty()) { + try (PreparedStatement stmt = connection.prepareStatement("USE ?")) { + stmt.setString(1, dbName); + stmt.execute(); + } }presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (1)
135-147
: Iteration order of HashMap causes non-deterministic column ordering
HashMap
does not guarantee iteration order; workers may disagree on column order which can surface as schema mismatches. Prior review suggested sorting; it is still outstanding.-for (Map.Entry<String, ClpNode> entry : root.children.entrySet()) { +root.children.entrySet().stream() + .sorted(Map.Entry.comparingByKey()) + .forEach(entry -> { + String name = entry.getKey(); + ClpNode child = entry.getValue(); + if (child.isLeaf()) { + columns.add(new ClpColumnHandle(name, child.originalName, child.type, true)); + } + else { + Type rowType = buildRowType(child); + columns.add(new ClpColumnHandle(name, child.originalName, rowType, true)); + } + });presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java (1)
60-61
: 🧹 Nitpick (assertive)Consider reusing SessionHolder
The SessionHolder is instantiated separately inside each test method. Creating a single instance as a class field would improve test efficiency by reducing object creation.
- SessionHolder sessionHolder = new SessionHolder(); + // Use shared sessionHolder from class fieldpresto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (4)
98-101
: 🛠️ Refactor suggestionCoding style deviation: Use
false == expr
instead of!expr
According to the coding guidelines, use
false == expression
rather than!expression
.- if (!listSchemaNames(session).contains(schemaNameValue)) { + if (false == listSchemaNames(session).contains(schemaNameValue)) {
112-118
: 🛠️ Refactor suggestionCoding style deviation: Use
false == expr
instead of!expr
Replace negation with explicit comparison to conform to coding guidelines.
- if (!listSchemaNames(session).contains(schemaName)) { + if (false == listSchemaNames(session).contains(schemaName)) { return null; } - if (!listTables(schemaName).contains(tableName.getTableName())) { + if (false == listTables(schemaName).contains(tableName.getTableName())) {
158-160
: 🛠️ Refactor suggestionCoding style deviation: Use
false == expr
instead of!expr
Replace negation with explicit comparison to conform to coding guidelines.
- if (schemaName != null && !listSchemaNames(session).contains(schemaName)) { + if (schemaName != null && false == listSchemaNames(session).contains(schemaName)) {
166-168
:⚠️ Potential issuePossible NPE when prefix contains only table name
SchemaTableName
constructor rejects null schema names, butprefix.getSchemaName()
could be null.- schemaTableNames = ImmutableList.of(new SchemaTableName(prefix.getSchemaName(), prefix.getTableName())); + String schema = Optional.ofNullable(prefix.getSchemaName()).orElse(DEFAULT_SCHEMA_NAME); + schemaTableNames = ImmutableList.of(new SchemaTableName(schema, prefix.getTableName()));presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (3)
55-56
: 🧹 Nitpick (assertive)Add class-level documentation
This base class lacks documentation explaining its purpose. Consider adding Javadoc to improve maintainability.
+/** + * Base class for CLP query tests providing common functionality for expression parsing, + * type resolution, and session management. + */ public class TestClpQueryBase
98-101
: 🧹 Nitpick (assertive)Use shared SqlParser instance to reduce object churn
Creating a new SqlParser for every expression parsing is inefficient. These objects are heavyweight; use a single static parser instance.
+ private static final SqlParser SQL_PARSER = new SqlParser(); + private static final ParsingOptions PARSING_OPTIONS = + new ParsingOptions(ParsingOptions.DecimalLiteralTreatment.AS_DECIMAL); public static Expression expression(String sql) { - return ExpressionUtils.rewriteIdentifiersToSymbolReferences(new SqlParser().createExpression(sql, new ParsingOptions(ParsingOptions.DecimalLiteralTreatment.AS_DECIMAL))); + return ExpressionUtils.rewriteIdentifiersToSymbolReferences(SQL_PARSER.createExpression(sql, PARSING_OPTIONS)); }
107-108
: 🧹 Nitpick (assertive)Use shared SqlParser instance here as well
This method also creates a new SqlParser instance. For consistency and efficiency, use the same shared SqlParser instance.
Map<NodeRef<Expression>, Type> expressionTypes = getExpressionTypes( session, metadata, - new SqlParser(), + SQL_PARSER,
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (15)
presto-clp/pom.xml
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java
(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java
(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
: - Preferfalse == <expression>
rather than!<expression>
.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java
🧬 Code Graph Analysis (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
ClpColumnHandle
(26-117)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
presto-spi/src/main/java/com/facebook/presto/spi/ColumnMetadata.java (1)
ColumnMetadata
(29-205)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (3)
presto-main-base/src/main/java/com/facebook/presto/Session.java (1)
Session
(75-939)presto-main-base/src/main/java/com/facebook/presto/SystemSessionProperties.java (1)
SystemSessionProperties
(88-3177)presto-spi/src/test/java/com/facebook/presto/spi/TestingSession.java (1)
TestingSession
(34-142)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: prestocpp-linux-build-for-test
- GitHub Check: test (:presto-docs)
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: changes
🔇 Additional comments (7)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1)
57-58
: 🧹 Nitpick (assertive)Use 'false ==' coding pattern for boolean expressions
According to the project coding guidelines, prefer
false == <expression>
rather than!<expression>
or<expression> == false
.- if (o == null || getClass() != o.getClass()) { + if (false == (o != null) || getClass() != o.getClass()) { return false; }Likely an incorrect or invalid review comment.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java (1)
29-35
: Implementation throws appropriate exceptionThe implementation properly throws an
UnsupportedOperationException
instead of returning null, which would causeNullPointerException
when the method is called.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)
40-47
: LGTM! Appropriate metadata provider configuration and error handlingThe implementation correctly binds the MySQL metadata provider based on configuration and throws appropriate exceptions for unsupported types.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java (3)
44-46
: Good handling of missing KQL expression caseThis check correctly asserts that when no KQL expression is expected, none should be present. This addresses the issue raised in a previous review comment about missing negative assertions.
70-71
: Clear and informative commentThe comment effectively explains that wildcard pattern '%hello%' doesn't get pushed down, which helps understand the test case's purpose and the connector's limitations.
32-38
: Well-organized test helper methodThe
testFilter
helper method is well-designed to make test cases concise and readable. It properly converts SQL expressions to ClpExpressions and verifies both KQL output and remaining expressions.presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (1)
76-96
: Well-designed SessionHolder classThe SessionHolder class is well-structured to encapsulate session state for tests, providing both connector session and session objects with clean accessor methods.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
Outdated
Show resolved
Hide resolved
30cf714
to
5a68d91
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (37)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java (1)
28-35
: Implementation looks good.The method correctly throws an
UnsupportedOperationException
to explicitly indicate that record set provision is not supported in this connector, rather than returning null which would cause aNullPointerException
.presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1)
22-33
: 🛠️ Refactor suggestionAdd a listSchemaNames method for completeness.
The interface is well-defined but is missing a method to retrieve schema names, which would make it more complete and consistent with other connector implementations.
public interface ClpMetadataProvider { /** * Returns the list of column handles for the given table. */ List<ClpColumnHandle> listColumnHandles(SchemaTableName schemaTableName); /** * Returns the list of table names in the given schema. */ List<ClpTableHandle> listTableHandles(String schema); + + /** + * Returns the list of all schema names. + */ + List<String> listSchemaNames(); }presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)
49-56
: Appropriate split provider configuration and error handling.The implementation correctly binds the MySQL split provider based on configuration and throws appropriate exceptions for unsupported types.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (2)
33-38
:⚠️ Potential issueAdd null-safety & clarify contract of
path
.
path
is a fundamental identifier for the split, yet the constructor allowsnull
to slip through and the field is not annotated as non-null. A singlenull
value propagating into the engine will surface later as an NPE that is much harder to diagnose.+import static java.util.Objects.requireNonNull; + @JsonCreator public ClpSplit(@JsonProperty("path") String path) { - this.path = path; + this.path = requireNonNull(path, "path is null"); }If
null
is semantically meaningful (e.g., "unspecified location"), please document that and keep the field annotated with@Nullable
; otherwise, enforce the pre-condition as above.
57-61
: 🧹 Nitpick (assertive)Consider providing more specific info in getInfo().
The
getInfo()
method returns the instance itself, which might not be the most informative approach if specific split information is needed.@Override public Object getInfo() { - return this; + return ImmutableMap.of("splitPath", path == null ? "NULL" : path); }This would require adding the following import:
import com.google.common.collect.ImmutableMap;presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (5)
50-50
: 🛠️ Refactor suggestionConsider using platform-independent temp directory
The hard-coded path
/tmp/split_testdb
might not work on all operating systems, particularly Windows.- final String metadataDbUrl = "jdbc:h2:file:/tmp/split_testdb;MODE=MySQL;DATABASE_TO_UPPER=FALSE"; + final String tempDir = System.getProperty("java.io.tmpdir"); + final String metadataDbUrl = String.format("jdbc:h2:file:%s/split_testdb;MODE=MySQL;DATABASE_TO_UPPER=FALSE", tempDir);
84-84
: 🛠️ Refactor suggestionUse platform-independent path construction
Hard-coded path separators may cause issues on different operating systems.
- pstmt.setString(3, "/tmp/archives/" + tableName); + pstmt.setString(3, new File(System.getProperty("java.io.tmpdir"), "archives" + File.separator + tableName).getPath());
118-126
: 🛠️ Refactor suggestionImprove file cleanup handling in tearDown method
The current implementation has several issues:
- Uses hard-coded paths that aren't platform-independent
- Doesn't handle exceptions during file deletion
- File deletion failures are silently ignored
@AfterMethod public void tearDown() { - File dbFile = new File("/tmp/split_testdb.mv.db"); - File lockFile = new File("/tmp/split_testdb.trace.db"); // Optional, H2 sometimes creates this - if (dbFile.exists()) { - dbFile.delete(); - System.out.println("Deleted database file: " + dbFile.getAbsolutePath()); - } - if (lockFile.exists()) { - lockFile.delete(); - } + String tempDir = System.getProperty("java.io.tmpdir"); + File dbFile = new File(tempDir, "split_testdb.mv.db"); + File lockFile = new File(tempDir, "split_testdb.trace.db"); // Optional, H2 sometimes creates this + try { + if (dbFile.exists() && !dbFile.delete()) { + System.err.println("Failed to delete database file: " + dbFile.getAbsolutePath()); + } + else if (dbFile.exists()) { + System.out.println("Deleted database file: " + dbFile.getAbsolutePath()); + } + + if (lockFile.exists() && !lockFile.delete()) { + System.err.println("Failed to delete lock file: " + lockFile.getAbsolutePath()); + } + } + catch (Exception e) { + System.err.println("Error cleaning up database files: " + e.getMessage()); + } }
139-140
: 🛠️ Refactor suggestionUse platform-independent path for split verification
The expected path in the assertion uses hard-coded Unix-style separators.
- assertEquals(splits.get(i).getPath(), "/tmp/archives/" + tableName + "/id_" + i); + String expectedPath = new File(new File(new File(System.getProperty("java.io.tmpdir"), "archives"), tableName), "id_" + i).getPath(); + assertEquals(splits.get(i).getPath(), expectedPath);
129-143
: 🧹 Nitpick (assertive)Enhance test coverage for ClpSplit properties
The test only verifies the split path format but doesn't check other properties of the splits.
@Test public void testListSplits() { ClpSplitProvider splitProvider = new ClpMySqlSplitProvider(config); for (String tableName : TABLE_NAME_LIST) { ClpTableLayoutHandle layoutHandle = new ClpTableLayoutHandle( new ClpTableHandle(new SchemaTableName(TABLE_SCHEMA, tableName), ClpTableHandle.StorageType.FS), Optional.empty()); List<ClpSplit> splits = splitProvider.listSplits(layoutHandle); assertEquals(splits.size(), NUM_SPLITS); for (int i = 0; i < NUM_SPLITS; i++) { assertEquals(splits.get(i).getPath(), "/tmp/archives/" + tableName + "/id_" + i); + // Verify other important split properties + assertNotNull(splits.get(i).getConnectorId(), "ConnectorId should not be null"); + assertNotNull(splits.get(i).getConnectorSplitIdentity(), "SplitIdentity should not be null"); } } }presto-clp/pom.xml (1)
1-128
: 🛠️ Refactor suggestionAdd
<build>
section to presto-clp/pom.xmlThe CLP connector POM omits a
<build>
block. All other connector modules declare build plugins to ensure thepresto-plugin
packaging is handled correctly.Please add a
<build>
section immediately after the<properties>
block inpresto-clp/pom.xml
:</properties> + + <build> + <plugins> + <plugin> + <groupId>org.apache.maven.plugins</groupId> + <artifactId>maven-surefire-plugin</artifactId> + <configuration> + <skipTests>true</skipTests> + </configuration> + </plugin> + <!-- include any additional plugins (enforcer, packaging, etc.) as in other connectors --> + </plugins> + </build>presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
104-104
: 🧹 Nitpick (assertive)Simplify boolean comparison
Since
nullable
is a primitive boolean, usingObjects.equals()
is unnecessary.- Objects.equals(this.nullable, other.nullable); + this.nullable == other.nullable;presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java (2)
72-75
: 🧹 Nitpick (assertive)Use Objects.equals for null-safe field comparison
While fields are likely non-null due to constructor initialization, using Objects.equals provides additional safety and follows best practices.
ClpTableHandle other = (ClpTableHandle) obj; - return this.schemaTableName.equals(other.schemaTableName) && - this.storageType == other.storageType; + return Objects.equals(this.schemaTableName, other.schemaTableName) && + this.storageType == other.storageType;
63-71
: 🛠️ Refactor suggestionFollow coding guidelines for boolean expressions in equals method
The equals method uses
!expression
pattern instead of the preferredfalse == expression
pattern specified in the coding guidelines.@Override public boolean equals(Object obj) { if (this == obj) { return true; } - if (obj == null || getClass() != obj.getClass()) { + if (false == (obj != null) || false == (getClass() == obj.getClass())) { return false; } ClpTableHandle other = (ClpTableHandle) obj;presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1)
84-93
: Avoid swallowing exceptions during shutdownIf
lifeCycleManager.stop()
fails, the connector logs the exception but still returns normally. This can mask resource-leak issues in production and complicate graceful shutdown.@Override public final void shutdown() { try { lifeCycleManager.stop(); } catch (Exception e) { log.error(e, "Error shutting down connector"); + throw new RuntimeException("Error shutting down connector", e); } }
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (3)
68-76
: Improve database connection security and reliabilityThe connection method needs several improvements:
- Follows coding guideline using
false == expression
pattern- Uses prepared statement for DB selection
- Adds connection timeout
private Connection getConnection() throws SQLException { - Connection connection = DriverManager.getConnection(config.getMetadataDbUrl(), config.getMetadataDbUser(), config.getMetadataDbPassword()); + Properties connectionProps = new Properties(); + connectionProps.setProperty("user", config.getMetadataDbUser()); + connectionProps.setProperty("password", config.getMetadataDbPassword()); + connectionProps.setProperty("connectTimeout", "10000"); // 10 second timeout + + Connection connection = DriverManager.getConnection(config.getMetadataDbUrl(), connectionProps); String dbName = config.getMetadataDbName(); - if (dbName != null && !dbName.isEmpty()) { - connection.createStatement().execute("USE " + dbName); + if (null != dbName && false == dbName.isEmpty()) { + try (PreparedStatement statement = connection.prepareStatement("USE ?")) { + statement.setString(1, dbName); + statement.execute(); + } } return connection; }
85-86
: Validate table name to prevent SQL injectionThe archivePathQuery still uses string formatting with tableName directly, which could be vulnerable to SQL injection.
String tablePathQuery = String.format(SQL_SELECT_DATASETS_TEMPLATE, config.getMetadataTablePrefix(), tableName); -String archivePathQuery = String.format(SQL_SELECT_ARCHIVES_TEMPLATE, config.getMetadataTablePrefix(), tableName); +// Ensure tableName is a valid identifier +if (false == ClpConfig.SAFE_SQL_IDENTIFIER.matcher(tableName).matches()) { + throw new IllegalArgumentException("Invalid table identifier: " + tableName); +} +String archiveTable = config.getMetadataTablePrefix() + tableName + ARCHIVE_TABLE_SUFFIX; +String archivePathQuery = String.format("SELECT %s FROM %s", ARCHIVES_TABLE_COLUMN_ID, archiveTable);
117-119
: Correct throwable logging signatureLogger expects the throwable as the first argument when you want proper stack-trace output.
- log.warn("Database error while processing splits for %s: %s", tableName, e); + log.warn(e, "Database error while processing splits for %s", tableName);presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (3)
60-66
: Avoid hard-coded /tmp paths in testsUsing a fixed on-disk file (
/tmp/metadata_testdb
) makes tests non-hermetic and can fail when run concurrently by multiple build agents or on systems without write access to/tmp
. Prefer an in-memory H2 instance orFiles.createTempDirectory
.- final String metadataDbUrl = "jdbc:h2:file:/tmp/metadata_testdb;MODE=MySQL;DATABASE_TO_UPPER=FALSE"; + final String metadataDbUrl = "jdbc:h2:mem:metadata_testdb;MODE=MySQL;DATABASE_TO_UPPER=FALSE";
136-144
: Check return value of File.delete() and avoid System.outThe test silently ignores failed deletions and writes to stdout. Switching to assertion methods will surface issues and keep test output clean.
@AfterMethod public void tearDown() { File dbFile = new File("/tmp/metadata_testdb.mv.db"); File lockFile = new File("/tmp/metadata_testdb.trace.db"); // Optional, H2 sometimes creates this if (dbFile.exists()) { - dbFile.delete(); - System.out.println("Deleted database file: " + dbFile.getAbsolutePath()); + assertTrue(dbFile.delete(), "Failed to delete database file: " + dbFile.getAbsolutePath()); } if (lockFile.exists()) { - lockFile.delete(); + assertTrue(lockFile.delete(), "Failed to delete lock file: " + lockFile.getAbsolutePath()); } }
105-105
: 🧹 Nitpick (assertive)Avoid hard-coded paths in test data
The absolute path
/tmp/archives/
is used for test data, which could cause issues on systems without write access to/tmp
.- pstmt.setString(3, "/tmp/archives/" + TABLE_NAME); + pstmt.setString(3, "./test-archives/" + TABLE_NAME);presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (3)
135-146
: Column order is still nondeterministicIteration over
root.children.entrySet()
yields a JVM-dependent order, so the resulting list of column handles can vary between workers, jeopardizing metadata consistency.
Please sort the entries or switch to an ordered map.
152-163
: Negation style violates the project guideline
if (!polymorphicTypeEnabled)
andelse if (!existing.type.equals(newType))
should use the prescribedfalse == <expr>
form.- if (!polymorphicTypeEnabled) { + if (false == polymorphicTypeEnabled) { ... - else if (!existing.type.equals(newType)) { + else if (false == existing.type.equals(newType)) {
190-196
: Nested field order remains nondeterministic
HashMap
traversal inbuildRowType()
means innerRowType
field order can fluctuate.
Sort the child entries or use a deterministic map to ensure reproducible type signatures.presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (4)
71-74
: Style + NPE guard on database name checkThe negation style and null-check order break project conventions and risk an NPE.
Use guideline-compliant ordering:- if (dbName != null && !dbName.isEmpty()) { + if (null != dbName && false == dbName.isEmpty()) {
86-90
: Potential SQL-injection via unvalidated table name
schemaTableName.getTableName()
is interpolated intoquery
without first verifying it withisValidIdentifier()
.
Validate (or quote) the identifier before building the SQL to avoid malicious input.
98-100
: Incorrect log-formatting drops stack tracePassing the throwable as a format argument suppresses the stack trace.
Follow Airlift logging API:log.warn(e, "...")
.- log.warn("Failed to load table schema for %s: %s", schemaTableName.getTableName(), e); + log.warn(e, "Failed to load table schema for %s", schemaTableName.getTableName());
120-126
: Same log-formatting issue when listing tables- log.warn("Failed to load table names: %s", e); + log.warn(e, "Failed to load table names");presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java (1)
60-66
: Minor test optimisation: reuseSessionHolder
Each test method instantiates its own
SessionHolder
, increasing setup time.
Move it to a@BeforeMethod
or class field to reduce overhead.presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (3)
55-56
: 🧹 Nitpick (assertive)Add descriptive Javadoc to clarify purpose and usage
This is a reusable test-base that will be inherited by multiple suites, yet it still lacks any explanatory header. A short Javadoc block improves discoverability for future contributors.
98-101
: 🧹 Nitpick (assertive)Reuse a single static
SqlParser
instead of instantiating per call
Creating a newSqlParser
andParsingOptions
for every invocation ofexpression()
is unnecessarily expensive. Past feedback already highlighted this; consider caching them asprivate static final
fields.-private static Expression expression(String sql) -{ - return ExpressionUtils.rewriteIdentifiersToSymbolReferences( - new SqlParser().createExpression(sql, - new ParsingOptions(ParsingOptions.DecimalLiteralTreatment.AS_DECIMAL))); -} +private static final SqlParser SQL_PARSER = new SqlParser(); +private static final ParsingOptions PARSING_OPTIONS = + new ParsingOptions(ParsingOptions.DecimalLiteralTreatment.AS_DECIMAL); + +private static Expression expression(String sql) +{ + return ExpressionUtils.rewriteIdentifiersToSymbolReferences( + SQL_PARSER.createExpression(sql, PARSING_OPTIONS)); +}
104-110
: 🧹 Nitpick (assertive)
toRowExpression
repeatsnew SqlParser()
– align with the shared instance
The second instantiation duplicates the overhead warned about above; please switch to the same static parser.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (5)
69-73
: 🧹 Nitpick (assertive)Add error handling around provider calls
clpMetadataProvider.listColumnHandles
may throw and poison the cache, propagating asUncheckedExecutionException
. Wrapping in try/catch with meaningful logging will make operational debugging easier.
98-101
: 🧹 Nitpick (assertive)Follow project coding-style: use
false == expr
instead of!expr
The guideline (false == <expression>
) applies to all Java sources in this repo. Please update the condition.- if (!listSchemaNames(session).contains(schemaNameValue)) { + if (false == listSchemaNames(session).contains(schemaNameValue)) {
111-114
: 🧹 Nitpick (assertive)Style consistency: prefer
false == expression
Same style concern as above for thegetTableHandle
guard.- if (!listSchemaNames(session).contains(schemaName)) { + if (false == listSchemaNames(session).contains(schemaName)) {
156-158
: 🧹 Nitpick (assertive)Maintain style guideline in prefix validation
Yet another occurrence of the!expr
pattern—please update.- if (schemaName != null && !listSchemaNames(session).contains(schemaName)) { + if (schemaName != null && false == listSchemaNames(session).contains(schemaName)) {
161-167
:⚠️ Potential issuePossible
NullPointerException
when schema is omitted
prefix.getSchemaName()
can benull
; passing it to theSchemaTableName
constructor will throw. This was previously reported but remains unresolved.- schemaTableNames = ImmutableList.of(new SchemaTableName(prefix.getSchemaName(), prefix.getTableName())); + String schema = Optional.ofNullable(prefix.getSchemaName()).orElse(DEFAULT_SCHEMA_NAME); + schemaTableNames = ImmutableList.of(new SchemaTableName(schema, prefix.getTableName()));
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (17)
presto-clp/pom.xml
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java
(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java
(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java
(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
: - Preferfalse == <expression>
rather than!<expression>
.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java
🧬 Code Graph Analysis (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
presto-spi/src/main/java/com/facebook/presto/spi/ColumnMetadata.java (1)
ColumnMetadata
(29-205)
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: spark-integration (8)
- GitHub Check: hive-tests (17.0.13)
- GitHub Check: hive-dockerized-tests (8)
- GitHub Check: hive-tests (8)
- GitHub Check: arrowflight-java-tests (17.0.13, :presto-base-arrow-flight)
- GitHub Check: test (:presto-docs)
- GitHub Check: prestocpp-linux-build-for-test
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: maven-checks (8)
🔇 Additional comments (18)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)
40-47
: Good conditional binding based on config, with proper error handling.The implementation correctly binds the MySQL metadata provider based on configuration and throws appropriate exceptions for unsupported types.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (2)
57-62
: LGTM: ClpConfig setup is correctly configuredThe configuration is properly set up with the necessary parameters for metadata database connection.
97-98
: 🧹 Nitpick (assertive)Follow coding guidelines for boolean expressions
According to the coding guidelines, prefer
false == <expression>
rather than negated expressions.- archiveTableName); - stmt.execute(createArchiveTableSQL); + archiveTableName); + stmt.execute(createArchiveTableSQL);Likely an incorrect or invalid review comment.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (3)
25-49
: Implementation of TableLayoutHandle looks goodThe class correctly implements ConnectorTableLayoutHandle with appropriate fields for table and KQL query.
61-78
: Good implementation of equals, hashCode and toStringThe implementation properly includes all fields in equals, hashCode, and toString methods, ensuring consistent behavior.
57-58
: 🧹 Nitpick (assertive)Follow coding guidelines for boolean expressions
According to the coding guidelines, prefer
false == <expression>
rather than using negated expressions.- if (o == null || getClass() != o.getClass()) { + if (false == (o != null) || getClass() != o.getClass()) { return false; }Likely an incorrect or invalid review comment.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (2)
76-83
: Good implementation of getColumnMetadataThe method correctly builds a ColumnMetadata object with the appropriate properties.
97-98
: 🧹 Nitpick (assertive)Follow coding guidelines for boolean expressions
According to the coding guidelines, prefer
false == <expression>
rather than using negated expressions.- if (obj == null || getClass() != obj.getClass()) { + if (false == (obj != null) || getClass() != obj.getClass()) { return false; }Likely an incorrect or invalid review comment.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java (1)
77-84
: LGTM! Good implementation of toString with detailed outputThe toString implementation properly uses toStringHelper to create a detailed and helpful string representation including all fields.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (2)
35-52
: LGTM! Good dependency injection and null checksThe constructor properly uses dependency injection and applies null checks to all parameters using requireNonNull.
54-58
: LGTM! Good use of injected providerThe connector properly returns the injected optimizer provider instance rather than creating a new one.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (3)
48-53
: LGTM! SQL templates with parameterized query for datasetsThe SQL_SELECT_DATASETS_TEMPLATE correctly uses a parameterized query with ? placeholder for the WHERE clause.
91-100
: LGTM! Good use of prepared statement with parameter bindingThe code correctly uses a prepared statement with proper parameter binding to safely insert the table name.
102-102
: 🧹 Nitpick (assertive)Follow coding guidelines for boolean expressions
Use
false == expression
pattern for boolean expressions per coding guidelines.- if (tablePath == null || tablePath.isEmpty()) { + if (null == tablePath || false == tablePath.isEmpty()) {Likely an incorrect or invalid review comment.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (3)
147-151
: LGTM! Good test for schema names listingThe test correctly verifies that the expected schema names are returned by the metadata provider.
153-159
: LGTM! Good test for table listingThe test properly verifies that the expected tables are returned by the metadata provider.
161-204
: LGTM! Comprehensive test for table metadataThe test thoroughly verifies complex metadata structures including nested row types and arrays.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java (1)
40-55
: Negative-assertion gap resolved – looks goodThe added
else
branches now assert the absence of unexpected push-downs and residuals.
This closes the regression hole flagged earlier.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
Show resolved
Hide resolved
5a68d91
to
bf9da2e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
♻️ Duplicate comments (41)
presto-clp/pom.xml (2)
5-19
: Add<build>
section to presto-clp/pom.xmlThe CLP connector POM omits a
<build>
block. All other connector modules declare build plugins—such as the Maven Surefire and Enforcer plugins—to ensure thepresto-plugin
packaging is handled correctly.Please add a
<build>
section immediately after the<properties>
block.
22-25
: Consider specifying a version range for mysql-connector-jThe MySQL connector dependency doesn't specify a version constraint, which might lead to unexpected behavior if the version changes.
<dependency> <groupId>com.mysql</groupId> <artifactId>mysql-connector-j</artifactId> + <version>[8.0.0,)</version> <scope>runtime</scope> </dependency>
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1)
22-33
: Consider adding a listSchemaNames method for completenessThe interface is well-defined, but consider adding a
listSchemaNames()
method to retrieve all available schema names. This would make the interface more complete and consistent with other connector implementations.public interface ClpMetadataProvider { /** * Returns the list of column handles for the given table. */ List<ClpColumnHandle> listColumnHandles(SchemaTableName schemaTableName); /** * Returns the list of table names in the given schema. */ List<ClpTableHandle> listTableHandles(String schema); + + /** + * Returns the list of all schema names. + */ + List<String> listSchemaNames(); }presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)
49-56
: Appropriate split provider configuration and error handlingThe implementation correctly binds the MySQL split provider based on configuration and throws appropriate exceptions for unsupported types.
For future extensibility, if more provider types are added later, you might consider refactoring these conditionals to a more scalable approach like a provider registry or factory pattern.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (2)
33-37
:⚠️ Potential issueAdd null-safety to protect against NPEs
The constructor allows null values for the path parameter without validation, which could lead to NullPointerExceptions later in execution.
+import static java.util.Objects.requireNonNull; + @JsonCreator public ClpSplit(@JsonProperty("path") String path) { - this.path = path; + this.path = requireNonNull(path, "path is null"); }
57-61
: 🧹 Nitpick (assertive)Consider providing more specific info in getInfo()
The
getInfo()
method returns the instance itself, which might not be the most informative approach if specific split information is needed.+import com.google.common.collect.ImmutableMap; + @Override public Object getInfo() { - return this; + return ImmutableMap.of("splitPath", path); }presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (5)
50-50
:⚠️ Potential issueUse platform-independent temp directory
The hard-coded path
/tmp/split_testdb
might not work on all operating systems, particularly Windows.- final String metadataDbUrl = "jdbc:h2:file:/tmp/split_testdb;MODE=MySQL;DATABASE_TO_UPPER=FALSE"; + final String tempDir = System.getProperty("java.io.tmpdir"); + final String metadataDbUrl = String.format("jdbc:h2:file:%s/split_testdb;MODE=MySQL;DATABASE_TO_UPPER=FALSE", tempDir);
83-84
: 🛠️ Refactor suggestionUse platform-independent path construction
Hard-coded path separators may cause issues on different operating systems.
- pstmt.setString(3, "/tmp/archives/" + tableName); + pstmt.setString(3, new File(System.getProperty("java.io.tmpdir"), "archives" + File.separator + tableName).getPath());
118-126
: 🛠️ Refactor suggestionImprove file cleanup handling in tearDown method
The current implementation has several issues:
- Uses hard-coded paths that aren't platform-independent
- Doesn't handle exceptions during file deletion
- File deletion failures are silently ignored
@AfterMethod public void tearDown() { - File dbFile = new File("/tmp/split_testdb.mv.db"); - File lockFile = new File("/tmp/split_testdb.trace.db"); // Optional, H2 sometimes creates this - if (dbFile.exists()) { - dbFile.delete(); - System.out.println("Deleted database file: " + dbFile.getAbsolutePath()); - } - if (lockFile.exists()) { - lockFile.delete(); - } + String tempDir = System.getProperty("java.io.tmpdir"); + File dbFile = new File(tempDir, "split_testdb.mv.db"); + File lockFile = new File(tempDir, "split_testdb.trace.db"); // Optional, H2 sometimes creates this + try { + if (dbFile.exists() && !dbFile.delete()) { + System.err.println("Failed to delete database file: " + dbFile.getAbsolutePath()); + } + else if (dbFile.exists()) { + System.out.println("Deleted database file: " + dbFile.getAbsolutePath()); + } + + if (lockFile.exists() && !lockFile.delete()) { + System.err.println("Failed to delete lock file: " + lockFile.getAbsolutePath()); + } + } + catch (Exception e) { + System.err.println("Error cleaning up database files: " + e.getMessage()); + } }
139-140
: 🛠️ Refactor suggestionUse platform-independent path for split verification
The expected path in the assertion uses hard-coded Unix-style separators.
- assertEquals(splits.get(i).getPath(), "/tmp/archives/" + tableName + "/id_" + i); + String expectedPath = new File(new File(new File(System.getProperty("java.io.tmpdir"), "archives"), tableName), "id_" + i).getPath(); + assertEquals(splits.get(i).getPath(), expectedPath);
129-143
: 🧹 Nitpick (assertive)Enhance test coverage for ClpSplit properties
The test only verifies the split path format but doesn't check other properties of the splits.
@Test public void testListSplits() { ClpSplitProvider splitProvider = new ClpMySqlSplitProvider(config); for (String tableName : TABLE_NAME_LIST) { ClpTableLayoutHandle layoutHandle = new ClpTableLayoutHandle( new ClpTableHandle(new SchemaTableName(TABLE_SCHEMA, tableName), ClpTableHandle.StorageType.FS), Optional.empty()); List<ClpSplit> splits = splitProvider.listSplits(layoutHandle); assertEquals(splits.size(), NUM_SPLITS); for (int i = 0; i < NUM_SPLITS; i++) { assertEquals(splits.get(i).getPath(), "/tmp/archives/" + tableName + "/id_" + i); + // Verify other important split properties + assertEquals(splits.get(i).getNodeSelectionStrategy(), NodeSelectionStrategy.NO_PREFERENCE, "Node selection strategy should be NO_PREFERENCE"); + assertTrue(splits.get(i).getPreferredNodes(null).isEmpty(), "Preferred nodes list should be empty"); + assertNotNull(splits.get(i).getInfo(), "Split info should not be null"); } } }presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
104-104
: 🧹 Nitpick (assertive)Simplify boolean comparison
Since
nullable
is a primitive boolean, usingObjects.equals()
is unnecessary.- Objects.equals(this.nullable, other.nullable); + this.nullable == other.nullable;presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java (1)
63-75
: Follow project coding guidelines for conditional expressions.The equals method uses the pattern
!expression
instead of the preferredfalse == expression
style mentioned in the coding guidelines. Additionally, consider usingObjects.equals()
for null-safe comparison ofschemaTableName
.@Override public boolean equals(Object obj) { if (this == obj) { return true; } - if (obj == null || getClass() != obj.getClass()) { + if (null == obj || false == (getClass() == obj.getClass())) { return false; } ClpTableHandle other = (ClpTableHandle) obj; - return this.schemaTableName.equals(other.schemaTableName) && + return Objects.equals(this.schemaTableName, other.schemaTableName) && this.storageType == other.storageType; }presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1)
85-93
: Swallowing exceptions during shutdown may hide problemsIf
lifeCycleManager.stop()
fails, the connector logs the exception but still returns normally. That can mask resource-leak issues in production and complicates graceful shutdown on coordinators/workers. Consider re-throwing as aRuntimeException
(Presto catches those and surfaces them) after logging, or at least returning a failed future if the lifecycle is managed asynchronously.presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (3)
70-76
: Improve connection handling for security and resiliencySeveral improvements should be made to the database connection handling:
- No connection timeout is specified, which could lead to hanging if the database is unreachable
- The "USE dbName" statement is executed directly without using a prepared statement
- The method doesn't follow the coding guideline - it uses
!dbName.isEmpty()
rather thanfalse == dbName.isEmpty()
private Connection getConnection() throws SQLException { - Connection connection = DriverManager.getConnection(config.getMetadataDbUrl(), config.getMetadataDbUser(), config.getMetadataDbPassword()); + // Set connection properties including timeout + Properties connectionProps = new Properties(); + connectionProps.setProperty("user", config.getMetadataDbUser()); + connectionProps.setProperty("password", config.getMetadataDbPassword()); + connectionProps.setProperty("connectTimeout", "10000"); // 10 second timeout + + Connection connection = DriverManager.getConnection(config.getMetadataDbUrl(), connectionProps); String dbName = config.getMetadataDbName(); - if (dbName != null && !dbName.isEmpty()) { - connection.createStatement().execute("USE " + dbName); + if (null != dbName && false == dbName.isEmpty()) { + try (PreparedStatement statement = connection.prepareStatement("USE ?")) { + statement.setString(1, dbName); + statement.execute(); + } } return connection; }
85-86
: Prevent SQL injection in archive query constructionWhile the datasets table query properly uses a parameterized query (line 92), the archive query still embeds the table name directly via string formatting. This creates a SQL injection risk.
- String archivePathQuery = String.format(SQL_SELECT_ARCHIVES_TEMPLATE, config.getMetadataTablePrefix(), tableName); + // Ensure tableName is a valid identifier + if (false == tableName.matches("[A-Za-z0-9_]+")) { + throw new IllegalArgumentException("Invalid table name: " + tableName); + } + String archiveTable = config.getMetadataTablePrefix() + tableName + ARCHIVE_TABLE_SUFFIX; + String archivePathQuery = "SELECT " + ARCHIVES_TABLE_COLUMN_ID + " FROM " + archiveTable;
118-119
: Incorrect throwable logging signature
Logger
expects the throwable as the first argument when you want stack-trace output:- log.warn("Database error while processing splits for %s: %s", tableName, e); + log.warn(e, "Database error while processing splits for %s", tableName);Without this change the stack trace is lost and
%s
is filled withtoString()
of the exception.presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (2)
60-66
: Avoid hard-coded/tmp
paths in testsUsing a fixed on-disk location (
/tmp/metadata_testdb
and related directories) makes the test non-hermetic, can break when two builds run concurrently, and fails on systems where/tmp
is not writable (e.g., Windows agents).
Consider switching to an in-memory H2 database (jdbc:h2:mem:
) or a randomly generated temporary directory created viaFiles.createTempDirectory()
.
136-144
: Check the return value ofFile.delete()
and removeSystem.out
usage
File.delete()
returnsfalse
when the deletion fails, but the result is ignored, potentially masking real problems.
Also, emitting tostdout
from unit tests clutters CI logs; prefer logging (AirliftLogger
) or assertions.- if (dbFile.exists()) { - dbFile.delete(); - System.out.println("Deleted database file: " + dbFile.getAbsolutePath()); - } + if (dbFile.exists()) { + assertTrue(dbFile.delete(), "Failed to delete test database file: " + dbFile.getAbsolutePath()); + }presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (4)
136-146
: Column order is nondeterministic
HashMap
iteration order varies between JVM runs, leading to unstable metadata and potential worker mismatches.
Use a sorted map or sort the entries before adding tocolumns
.
152-154
: Negation style violates project guidelineThe guideline says “prefer
false == <expr>
rather than!<expr>
”.
if (!polymorphicTypeEnabled)
should be rewritten to comply.
162-165
: Negation style & equals usageSame guideline issue: replace
!existing.type.equals(newType)
withfalse == existing.type.equals(newType)
.
191-196
: Nested field order is nondeterministic
buildRowType()
iterates overHashMap
, giving inconsistentRowType
signatures across nodes.
Sort the child entries or use an ordered map for reproducibility.presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (4)
70-74
: Guideline-compliant negation & null check
if (dbName != null && !dbName.isEmpty())
violates the house style and callsisEmpty()
whendbName
may benull
.- if (dbName != null && !dbName.isEmpty()) { + if (null != dbName && false == dbName.isEmpty()) {
86-90
: Validate table identifier before building query
schemaTableName.getTableName()
is interpolated into the SQL without validation, opening a path for SQL-injection-style exploits from a compromised metadata store.
CallisValidIdentifier()
and reject/escape unsafe names.
99-101
: Fix log formatting to preserve stack traceTreat the throwable as the first argument to Airlift’s logger, not part of the format string:
- log.warn("Failed to load table schema for %s: %s", schemaTableName.getTableName(), e); + log.warn(e, "Failed to load table schema for %s", schemaTableName.getTableName());
125-126
: Same logging issue when listing tablesPass the exception as the first parameter:
- log.warn("Failed to load table names: %s", e); + log.warn(e, "Failed to load table names");presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java (2)
57-66
: 🧹 Nitpick (assertive)Consider reusing SessionHolder to improve test performance
Each test method creates its own instance of SessionHolder. Creating a single instance as a class field or in a @BeforeMethod would reduce object creation overhead and improve test performance.
+ private SessionHolder sessionHolder; + + @BeforeMethod + public void setUp() + { + sessionHolder = new SessionHolder(); + } + @Test public void testStringMatchPushdown() { - SessionHolder sessionHolder = new SessionHolder();
128-132
: 🧹 Nitpick (assertive)Consider separating this test case for clarity
This test case verifies different behavior (when parts of an OR expression can't be pushed down) from other test cases in the method. Moving it to a separate test method would improve readability and test isolation.
+ @Test + public void testOrPushdownWithUnpushableExpression() + { + SessionHolder sessionHolder = new SessionHolder(); + + testFilter("fare > 0 OR city.Name like 'b%' OR lower(city.Region.Name) = 'hello world' OR city.Region.Id != 1", + Optional.empty(), + Optional.of("fare > 0 OR city.Name like 'b%' OR lower(city.Region.Name) = 'hello world' OR city.Region.Id != 1"), + sessionHolder); + }presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (2)
34-43
: 🛠️ Refactor suggestionValidate required DB connection properties to fail fast
When the metadata provider is MYSQL, properties like
metadataDbUrl
,metadataDbName
,metadataDbUser
, andmetadataDbPassword
are essential but can currently remain null, leading to hard-to-trace NPEs deep in the call stack.Add validation to ensure mandatory fields are supplied when using MySQL:
+import static com.facebook.presto.spi.StandardErrorCode.GENERIC_INTERNAL_ERROR; +import static java.util.Objects.requireNonNull; + +import javax.annotation.PostConstruct; public class ClpConfig { + @PostConstruct + public void validate() + { + if (MetadataProviderType.MYSQL.equals(metadataProviderType)) { + requireNonNull(metadataDbUrl, "clp.metadata-db-url must be set when using MYSQL metadata provider"); + requireNonNull(metadataDbName, "clp.metadata-db-name must be set when using MYSQL metadata provider"); + requireNonNull(metadataDbUser, "clp.metadata-db-user must be set when using MYSQL metadata provider"); + requireNonNull(metadataDbPassword, "clp.metadata-db-password must be set when using MYSQL metadata provider"); + } + }
123-132
: 🧹 Nitpick (assertive)Add toLowerCase() for consistent table prefix naming
Users often mix cases in table-prefixes. Converting the validated value to lowercase would prevent later mismatches with case-sensitive backends.
+import java.util.Locale; + @Config("clp.metadata-table-prefix") public ClpConfig setMetadataTablePrefix(String metadataTablePrefix) { if (metadataTablePrefix == null || !SAFE_SQL_IDENTIFIER.matcher(metadataTablePrefix).matches()) { throw new PrestoException(ClpErrorCode.CLP_UNSUPPORTED_CONFIG_OPTION, "Invalid metadataTablePrefix: " + metadataTablePrefix + ". Only alphanumeric characters and underscores are allowed."); } - this.metadataTablePrefix = metadataTablePrefix; + this.metadataTablePrefix = metadataTablePrefix.toLowerCase(Locale.ROOT); return this; }presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (4)
55-56
: 🧹 Nitpick (assertive)Add class-level documentation
This base class provides important testing infrastructure, but lacks documentation explaining its purpose and usage.
+/** + * Base class for CLP query tests providing common functionality for expression parsing, + * type resolution, and session management. It establishes the test environment for + * validating CLP-specific query components and expression translations. + */ public class TestClpQueryBase {
61-67
: 🧹 Nitpick (assertive)Make 'city' immutable to prevent accidental state mutation
All other column handle constants are declared
final
, butcity
is not. Making it immutable would prevent accidental state mutation between test runs.- protected static final ClpColumnHandle city = new ClpColumnHandle("city", RowType.from(ImmutableList.of( + protected static final ClpColumnHandle city = new ClpColumnHandle("city", RowType.from(ImmutableList.of(
98-101
: 🧹 Nitpick (assertive)Reuse a single SqlParser instance to reduce object churn
Creating a new SqlParser for every method call is inefficient. These objects are heavyweight; using a single static parser instance would improve test execution performance.
+ private static final SqlParser SQL_PARSER = new SqlParser(); + private static final ParsingOptions PARSING_OPTIONS = + new ParsingOptions(ParsingOptions.DecimalLiteralTreatment.AS_DECIMAL); public static Expression expression(String sql) { - return ExpressionUtils.rewriteIdentifiersToSymbolReferences(new SqlParser().createExpression(sql, new ParsingOptions(ParsingOptions.DecimalLiteralTreatment.AS_DECIMAL))); + return ExpressionUtils.rewriteIdentifiersToSymbolReferences(SQL_PARSER.createExpression(sql, PARSING_OPTIONS)); }
108-108
: 🧹 Nitpick (assertive)Use shared SqlParser instance for consistency
This method also creates a new SqlParser instance. For consistency and efficiency, use the same shared SqlParser instance as recommended for the
expression()
method.- new SqlParser(), + SQL_PARSER,presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (6)
57-65
: 🛠️ Refactor suggestionConsider bounding cache size to avoid unbounded memory growth
Both caches are time-bounded but not size-bounded. In long-running clusters with large schema counts, this can exhaust heap memory.
Add size limits to the caches:
+import static com.facebook.airlift.configuration.ConfigDefaults.getMaxCacheSize; @Inject public ClpMetadata(ClpConfig clpConfig, ClpMetadataProvider clpMetadataProvider) { this.columnHandleCache = CacheBuilder.newBuilder() + .maximumSize(getMaxCacheSize(10000)) .expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS) .refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS) .build(CacheLoader.from(this::loadColumnHandles)); this.tableHandleCache = CacheBuilder.newBuilder() + .maximumSize(getMaxCacheSize(1000)) .expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS) .refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS) .build(CacheLoader.from(this::loadTableHandles));
99-101
: 🧹 Nitpick (assertive)Use preferred coding style: false == expr rather than !expr
According to the coding guidelines, expressions like
!listSchemaNames(...).contains(...)
should be written asfalse == listSchemaNames(...).contains(...)
.- if (!listSchemaNames(session).contains(schemaNameValue)) { + if (false == listSchemaNames(session).contains(schemaNameValue)) { return ImmutableList.of(); }
112-114
: 🧹 Nitpick (assertive)Use preferred coding style: false == expr rather than !expr
According to the coding guidelines, expressions like
!listSchemaNames(...).contains(...)
should be written asfalse == listSchemaNames(...).contains(...)
.- if (!listSchemaNames(session).contains(schemaName)) { + if (false == listSchemaNames(session).contains(schemaName)) { return null; }
157-159
: 🧹 Nitpick (assertive)Use preferred coding style: false == expr rather than !expr
According to the coding guidelines, expressions like
!listSchemaNames(...).contains(...)
should be written asfalse == listSchemaNames(...).contains(...)
.- if (schemaName != null && !listSchemaNames(session).contains(schemaName)) { + if (schemaName != null && false == listSchemaNames(session).contains(schemaName)) { return ImmutableMap.of(); }
165-167
:⚠️ Potential issuePrevent potential NPE when prefix contains only table name
When
prefix.getSchemaName()
is null, creating a newSchemaTableName
will throw an exception because its constructor rejects null components.else { - schemaTableNames = ImmutableList.of(new SchemaTableName(prefix.getSchemaName(), prefix.getTableName())); + String schema = Optional.ofNullable(prefix.getSchemaName()).orElse(DEFAULT_SCHEMA_NAME); + schemaTableNames = ImmutableList.of(new SchemaTableName(schema, prefix.getTableName())); }
69-72
: 🛠️ Refactor suggestionAdd error handling for cache loading failures
The cache loading methods don't contain any error handling. Without proper error handling, exceptions during metadata loading might be difficult to troubleshoot.
+import com.facebook.presto.spi.PrestoException; +import static com.facebook.presto.plugin.clp.ClpErrorCode.CLP_METADATA_ERROR; private List<ClpColumnHandle> loadColumnHandles(SchemaTableName schemaTableName) { - return clpMetadataProvider.listColumnHandles(schemaTableName); + try { + return clpMetadataProvider.listColumnHandles(schemaTableName); + } + catch (Exception e) { + throw new PrestoException(CLP_METADATA_ERROR, + "Failed to load column metadata for " + schemaTableName, e); + } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (19)
presto-clp/pom.xml
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java
(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java
(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java
(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
(1 hunks)presto-docs/src/main/sphinx/connector/clp.rst
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
: - Preferfalse == <expression>
rather than!<expression>
.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java
🧬 Code Graph Analysis (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
ClpColumnHandle
(26-117)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java (1)
ClpTableHandle
(25-85)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
presto-spi/src/main/java/com/facebook/presto/spi/ColumnMetadata.java (1)
ColumnMetadata
(29-205)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (1)
ClpMySqlMetadataProvider
(33-129)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1)
ClpMySqlSplitProvider
(33-123)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
ClpConfig
(21-170)
⏰ Context from checks skipped due to timeout of 90000ms (17)
- GitHub Check: product-tests-basic-environment (17.0.13)
- GitHub Check: singlestore-dockerized-tests (8)
- GitHub Check: spark-integration (17.0.13)
- GitHub Check: test (8, :presto-cassandra -P test-cassandra-integration-smoke-test)
- GitHub Check: test-other-modules (17.0.13)
- GitHub Check: kudu (8)
- GitHub Check: kudu (17.0.13)
- GitHub Check: product-tests-specific-environment1 (8)
- GitHub Check: product-tests-specific-environment2 (17.0.13)
- GitHub Check: product-tests-specific-environment2 (8)
- GitHub Check: product-tests-specific-environment1 (17.0.13)
- GitHub Check: maven-checks (8)
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: test (:presto-docs)
- GitHub Check: arrowflight-java-tests (17.0.13, :presto-base-arrow-flight)
- GitHub Check: arrowflight-java-tests (8, :presto-base-arrow-flight)
- GitHub Check: prestocpp-linux-build-for-test
🔇 Additional comments (4)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java (1)
28-35
: Implementation correctly throws UnsupportedOperationExceptionThe implementation properly throws an exception rather than returning null, which is a good practice for unsupported operations. This ensures callers receive a clear error message instead of encountering a NullPointerException.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)
40-47
: Appropriate error handling for unsupported metadata providersThe implementation correctly checks the metadata provider type and throws an appropriate exception with a descriptive message when an unsupported type is configured.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
97-98
: 🧹 Nitpick (assertive)Follow project coding guidelines for boolean expressions
According to the coding guidelines, prefer
false == <expression>
rather than!<expression>
for boolean expressions.- if (obj == null || getClass() != obj.getClass()) { + if (false == (obj != null) || getClass() != obj.getClass()) { return false; }Likely an incorrect or invalid review comment.
presto-docs/src/main/sphinx/connector/clp.rst (1)
1-179
: Documentation looks comprehensive and clearThe documentation provides a clear explanation of the CLP connector, its configuration options, data type mappings, and usage limitations. It aligns well with the implementation code and provides users with the necessary information to set up and use the connector effectively.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
Show resolved
Hide resolved
bf9da2e
to
1b8d043
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
🔭 Outside diff range comments (1)
presto-docs/src/main/sphinx/connector/clp.rst (1)
177-179
: 🛠️ Refactor suggestionMissing information about query predicate pushdown capabilities.
The documentation explains that the connector only provides read access, but there's no mention of predicate/filter pushdown capabilities - an important feature for query performance that appears to be implemented in this connector based on the AI summary about
ClpPlanOptimizer
andClpFilterToKqlConverter
.Add a section that documents the connector's predicate pushdown capabilities, which types of filters can be pushed down to the underlying storage, and any limitations of these capabilities.
♻️ Duplicate comments (40)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1)
22-33
: MissinglistSchemaNames()
keeps the interface from parity with other Presto connectors
Re-raising prior observation.Many Presto connectors expose a
listSchemaNames()
helper, which simplifies metadata enumeration and avoids unnecessary caller-side filtering. Consider adding it for consistency.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (4)
33-37
: Add null-safety for the path parameterThe path parameter is a fundamental identifier for the split, yet the constructor allows null values which could lead to NullPointerExceptions later.
+import static java.util.Objects.requireNonNull; + @JsonCreator public ClpSplit(@JsonProperty("path") String path) { - this.path = path; + this.path = requireNonNull(path, "path is null"); }
39-43
: Document the returned path fieldThe path getter method lacks documentation explaining what this field represents.
@JsonProperty +/** + * Returns the path to the split's data location. + * This path is used by the record set provider to read data from the corresponding location. + */ public String getPath() { return path; }
57-61
: Consider providing more specific information in getInfo()The current implementation returns the split instance itself, which might not be the most informative approach if specific split information is needed.
- public Object getInfo() - { - return this; - } + public Object getInfo() + { + return ImmutableMap.of("splitPath", path); + }This would require adding the following import:
import com.google.common.collect.ImmutableMap;
31-31
: 🧹 Nitpick (assertive)Document the meaning of the path field
Add documentation explaining what this field represents in the context of a CLP split.
- private final String path; + /** + * Path to the data location for this split. + * This is used by the record set provider to read data from the corresponding location. + */ + private final String path;presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (4)
50-50
: Use platform-independent temp directoryThe hard-coded path
/tmp/split_testdb
might not work on all operating systems, particularly Windows.- final String metadataDbUrl = "jdbc:h2:file:/tmp/split_testdb;MODE=MySQL;DATABASE_TO_UPPER=FALSE"; + final String tempDir = System.getProperty("java.io.tmpdir"); + final String metadataDbUrl = String.format("jdbc:h2:file:%s/split_testdb;MODE=MySQL;DATABASE_TO_UPPER=FALSE", tempDir);
83-84
: Use platform-independent path constructionHard-coded path separators may cause issues on different operating systems.
- pstmt.setString(3, "/tmp/archives/" + tableName); + pstmt.setString(3, new File(System.getProperty("java.io.tmpdir"), "archives" + File.separator + tableName).getPath());
118-126
: Improve file cleanup handling in tearDown methodThe current implementation has several issues:
- Uses hard-coded paths that aren't platform-independent
- Doesn't handle exceptions during file deletion
- File deletion failures are silently ignored
- File dbFile = new File("/tmp/split_testdb.mv.db"); - File lockFile = new File("/tmp/split_testdb.trace.db"); // Optional, H2 sometimes creates this - if (dbFile.exists()) { - dbFile.delete(); - System.out.println("Deleted database file: " + dbFile.getAbsolutePath()); - } - if (lockFile.exists()) { - lockFile.delete(); - } + String tempDir = System.getProperty("java.io.tmpdir"); + File dbFile = new File(tempDir, "split_testdb.mv.db"); + File lockFile = new File(tempDir, "split_testdb.trace.db"); // Optional, H2 sometimes creates this + try { + if (dbFile.exists() && !dbFile.delete()) { + System.err.println("Failed to delete database file: " + dbFile.getAbsolutePath()); + } + else if (dbFile.exists()) { + System.out.println("Deleted database file: " + dbFile.getAbsolutePath()); + } + + if (lockFile.exists() && !lockFile.delete()) { + System.err.println("Failed to delete lock file: " + lockFile.getAbsolutePath()); + } + } + catch (Exception e) { + System.err.println("Error cleaning up database files: " + e.getMessage()); + }
142-142
: Use platform-independent path for split verificationThe expected path in the assertion uses hard-coded Unix-style separators.
- assertEquals(splits.get(i).getPath(), "/tmp/archives/" + tableName + "/id_" + i); + String expectedPath = new File(new File(new File(System.getProperty("java.io.tmpdir"), "archives"), tableName), "id_" + i).getPath(); + assertEquals(splits.get(i).getPath(), expectedPath);presto-clp/pom.xml (2)
5-19
: Add<build>
section to pom.xmlThe CLP connector POM omits a
<build>
block. All other connector modules declare build plugins to ensure thepresto-plugin
packaging is handled correctly.Add a
<build>
section immediately after the<properties>
block:</properties> + + <build> + <plugins> + <plugin> + <groupId>org.apache.maven.plugins</groupId> + <artifactId>maven-surefire-plugin</artifactId> + <configuration> + <skipTests>${air.check.skip-tests}</skipTests> + </configuration> + </plugin> + </plugins> + </build>
22-25
: Specify a version constraint for mysql-connector-jThe MySQL connector dependency doesn't specify a version constraint, which might lead to unexpected behavior if the version changes.
<dependency> <groupId>com.mysql</groupId> <artifactId>mysql-connector-j</artifactId> + <version>[8.0.0,)</version> <scope>runtime</scope> </dependency>
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1)
57-58
: Follow project coding guidelines for boolean expressionsAccording to the coding guidelines, prefer
false == <expression>
rather than!<expression>
for boolean expressions.- if (o == null || getClass() != o.getClass()) { + if (false == (o != null) || getClass() != o.getClass()) { return false; }presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
104-104
: Simplify boolean comparison.Since
nullable
is a primitive boolean, usingObjects.equals()
is unnecessary.- Objects.equals(this.nullable, other.nullable); + this.nullable == other.nullable;presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1)
92-93
: Incorrect throwable logging signature.Logger expects the throwable as the first argument when you want stack-trace output:
- log.warn("Database error while processing splits for %s: %s", tableName, e); + log.warn(e, "Database error while processing splits for %s", tableName);presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1)
85-93
: Swallowing exceptions during shutdown may hide problems.If
lifeCycleManager.stop()
fails, the connector logs the exception but still returns normally. That can mask resource-leak issues in production and complicates graceful shutdown on coordinators/workers.presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (3)
60-66
: Hard-coded/tmp
path re-introduces non-hermetic test failuresThe H2 database is still persisted under
/tmp/metadata_testdb
. This is fragile when tests run concurrently, on read-only filesystems, or inside containers that remap/tmp
. Please switch to an in-memory URL (jdbc:h2:mem:
) or create a truly private temp directory withFiles.createTempDirectory(...)
.
109-118
: Remove heavyPair
dependency for trivial tuplesPulling in
commons-math3
solely forPair
bloats the test class-path (>1 MB) and slows the build. A tinyrecord Pair(String name, ClpNodeType type) {}
orMap.Entry
is enough.
136-141
: Check deletion result & avoidSystem.out
in tests
File.delete()
can silently fail, andSystem.out.println
pollutes test logs. Replace with an assertion and a logger, e.g.:- if (dbFile.exists()) { - dbFile.delete(); - System.out.println("Deleted database file: " + dbFile.getAbsolutePath()); - } + if (dbFile.exists()) { + assertTrue(dbFile.delete(), "Failed to delete " + dbFile); + }presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java (1)
78-84
: equals() still ignores null-safety adviceThe earlier recommendation to use
Objects.equals
for the nullableschemaTableName
and to remove the raw null check hasn’t been applied. This keeps the method brittle and inconsistent with the rest of the codebase.presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (3)
135-146
: 🛠️ Refactor suggestionColumn order remains nondeterministic – may break workers
Iterating over
HashMap
yields random order, so two workers can disagree on the column list, producing “cannot resolve symbol” errors at runtime. Sort the entries (or useLinkedHashMap
) before emitting handles.- for (Map.Entry<String, ClpNode> entry : root.children.entrySet()) { + root.children.entrySet().stream() + .sorted(Map.Entry.comparingByKey()) + .forEach(entry -> { String name = entry.getKey(); ClpNode child = entry.getValue(); if (child.isLeaf()) { columns.add(new ClpColumnHandle(name, child.originalName, child.type, true)); } else { Type rowType = buildRowType(child); columns.add(new ClpColumnHandle(name, child.originalName, rowType, true)); } - } + });
190-196
: 🛠️ Refactor suggestionDeterministic field order for nested
RowType
is still missing
HashMap
iteration leaks randomness into nested types as well. Apply the same sorting strategy when buildingRowType
to guarantee stable signatures across the cluster.
152-163
: 🧹 Nitpick (assertive)Style: use guideline-compliant negation & null-safe comparison
The project mandates
false == <expr>
over!<expr>
, andObjects.equals
for null safety. Lines 152 and 162 still violate this rule.- if (!polymorphicTypeEnabled) { + if (false == polymorphicTypeEnabled) { - else if (!existing.type.equals(newType)) { + else if (false == existing.type.equals(newType)) {presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java (2)
60-61
: 🧹 Nitpick (assertive)Consider reusing
SessionHolder
instance to improve performanceEach test method creates a separate
SessionHolder
instance. Creating a single instance as a field or in a@BeforeMethod
would improve test efficiency.+ private SessionHolder sessionHolder; + + @BeforeMethod + public void setup() + { + sessionHolder = new SessionHolder(); + } + @Test public void testStringMatchPushdown() { - SessionHolder sessionHolder = new SessionHolder();
128-132
: 🧹 Nitpick (assertive)Consider separating this OR test case to a dedicated test method
This test case verifies different behavior (OR expressions with unpushable components) than other tests in the method. It would be clearer as a separate test method.
+ @Test + public void testOrPushdownWithUnpushableExpression() + { + SessionHolder sessionHolder = new SessionHolder(); + + testFilter("fare > 0 OR city.Name like 'b%' OR lower(city.Region.Name) = 'hello world' OR city.Region.Id != 1", + Optional.empty(), + Optional.of("fare > 0 OR city.Name like 'b%' OR lower(city.Region.Name) = 'hello world' OR city.Region.Id != 1"), + sessionHolder); + }presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (4)
102-103
: 🛠️ Refactor suggestionFix logging to properly include exception information
The current logging treats the exception as a format parameter, losing the stack trace information.
- log.warn("Failed to load table schema for %s: %s", schemaTableName.getTableName(), e); + log.warn(e, "Failed to load table schema for %s", schemaTableName.getTableName());
130-131
: 🛠️ Refactor suggestionFix logging to properly include exception information
The current logging treats the exception as a format parameter, losing the stack trace information.
- log.warn("Failed to load table names: %s", e); + log.warn(e, "Failed to load table names");
75-77
: 🛠️ Refactor suggestionFix negation style to conform to coding guidelines
The code uses
!dbName.isEmpty()
which violates the coding guideline that prefersfalse == <expression>
rather than!<expression>
.- if (dbName != null && !dbName.isEmpty()) { + if (null != dbName && false == dbName.isEmpty()) { connection.createStatement().execute(String.format("USE `%s`", dbName)); }
89-91
:⚠️ Potential issueMissing table identifier validation - SQL injection risk
The code directly interpolates
schemaTableName.getTableName()
into the SQL query without validation, creating a potential SQL injection vulnerability.+ String tableName = schemaTableName.getTableName(); + if (false == isValidIdentifier(tableName)) { + throw new IllegalArgumentException("Invalid table identifier: " + tableName); + } String query = String.format(SQL_SELECT_COLUMN_METADATA_TEMPLATE, - config.getMetadataTablePrefix(), schemaTableName.getTableName()); + config.getMetadataTablePrefix(), tableName);presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (2)
126-132
: 🧹 Nitpick (assertive)Add case normalization for consistent table prefix handling
The metadata table prefix validation doesn't normalize case, which could lead to inconsistencies when working with case-sensitive database backends.
+import java.util.Locale; ... if (metadataTablePrefix == null || !SAFE_SQL_IDENTIFIER.matcher(metadataTablePrefix).matches()) { throw new PrestoException(ClpErrorCode.CLP_UNSUPPORTED_CONFIG_OPTION, "Invalid metadataTablePrefix: " + metadataTablePrefix + ". Only alphanumeric characters and underscores are allowed."); } - this.metadataTablePrefix = metadataTablePrefix; + this.metadataTablePrefix = metadataTablePrefix.toLowerCase(Locale.ROOT);
33-42
:⚠️ Potential issueAdd validation for required DB connection properties
When using the MySQL metadata provider, several properties are required but not validated, which could lead to runtime failures.
import com.facebook.airlift.configuration.Config; import com.facebook.presto.spi.PrestoException; +import javax.annotation.PostConstruct; +import static com.facebook.presto.spi.StandardErrorCode.CONFIGURATION_INVALID; +import static java.util.Objects.requireNonNull; ... + @PostConstruct + public void validate() + { + if (MetadataProviderType.MYSQL.equals(metadataProviderType)) { + if (null == metadataDbUrl) { + throw new PrestoException(CONFIGURATION_INVALID, "clp.metadata-db-url must be set when using MYSQL provider"); + } + if (null == metadataDbUser) { + throw new PrestoException(CONFIGURATION_INVALID, "clp.metadata-db-user must be set when using MYSQL provider"); + } + if (null == metadataDbPassword) { + throw new PrestoException(CONFIGURATION_INVALID, "clp.metadata-db-password must be set when using MYSQL provider"); + } + } + }presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (4)
55-56
: 🧹 Nitpick (assertive)Add class-level documentation
This base class lacks Javadoc explaining its purpose and usage, which would improve maintainability.
+/** + * Base class for CLP query tests providing shared utilities for expression parsing, + * type resolution, and session management to support testing filter pushdown and + * expression translation. + */ public class TestClpQueryBase
61-66
: 🧹 Nitpick (assertive)Make 'city' variable immutable to match other field declarations
All other column handle constants are declared
final
, butcity
is not, creating inconsistency and potential for accidental state mutation.- protected static final ClpColumnHandle city = new ClpColumnHandle("city", RowType.from(ImmutableList.of( + protected static final ClpColumnHandle city = new ClpColumnHandle("city", RowType.from(ImmutableList.of(
98-101
: 🧹 Nitpick (assertive)Reuse a single SqlParser instance to improve performance
Creating a new SqlParser for every call is inefficient. Use a shared instance to reduce object churn and improve test performance.
+ private static final SqlParser SQL_PARSER = new SqlParser(); + private static final ParsingOptions PARSING_OPTIONS = + new ParsingOptions(ParsingOptions.DecimalLiteralTreatment.AS_DECIMAL); public static Expression expression(String sql) { - return ExpressionUtils.rewriteIdentifiersToSymbolReferences(new SqlParser().createExpression(sql, new ParsingOptions(ParsingOptions.DecimalLiteralTreatment.AS_DECIMAL))); + return ExpressionUtils.rewriteIdentifiersToSymbolReferences(SQL_PARSER.createExpression(sql, PARSING_OPTIONS)); }
108-108
: 🧹 Nitpick (assertive)Use the shared SqlParser instance here as well
This method also creates a new SqlParser instance rather than reusing a shared instance.
Map<NodeRef<Expression>, Type> expressionTypes = getExpressionTypes( session, metadata, - new SqlParser(), + SQL_PARSER,presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (6)
57-65
: 🛠️ Refactor suggestionAdd cache size limits to prevent unbounded memory growth
Both the
columnHandleCache
andtableHandleCache
have time-based expiration but no size limits. This could lead to excessive memory usage in large deployments with many tables.this.columnHandleCache = CacheBuilder.newBuilder() + .maximumSize(clpConfig.getMetadataCacheMaximumEntries()) .expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS) .refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS) .build(CacheLoader.from(this::loadColumnHandles)); this.tableHandleCache = CacheBuilder.newBuilder() + .maximumSize(clpConfig.getMetadataCacheMaximumEntries()) .expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS) .refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS) .build(CacheLoader.from(this::loadTableHandles));
69-72
: 🛠️ Refactor suggestionAdd error handling for metadata loading operations
The column handle loading method has no error handling, which could lead to unclear error messages when metadata queries fail.
private List<ClpColumnHandle> loadColumnHandles(SchemaTableName schemaTableName) { - return clpMetadataProvider.listColumnHandles(schemaTableName); + try { + return clpMetadataProvider.listColumnHandles(schemaTableName); + } + catch (Exception e) { + throw new PrestoException(ClpErrorCode.CLP_METADATA_ERROR, + "Failed to load column metadata for " + schemaTableName, e); + } }
74-77
: 🛠️ Refactor suggestionAdd error handling for table handle loading
Similar to column handle loading, table handle loading should include error handling for better diagnostics.
private List<ClpTableHandle> loadTableHandles(String schemaName) { - return clpMetadataProvider.listTableHandles(schemaName); + try { + return clpMetadataProvider.listTableHandles(schemaName); + } + catch (Exception e) { + throw new PrestoException(ClpErrorCode.CLP_METADATA_ERROR, + "Failed to load table metadata for schema " + schemaName, e); + } }
98-101
: 🧹 Nitpick (assertive)Follow project coding guideline for boolean expressions
Project guidelines prefer
false == expression
rather than!expression
.- if (!listSchemaNames(session).contains(schemaNameValue)) { + if (false == listSchemaNames(session).contains(schemaNameValue)) { return ImmutableList.of(); }
112-114
: 🧹 Nitpick (assertive)Follow project coding guideline for boolean expressions
Project guidelines prefer
false == expression
rather than!expression
.- if (!listSchemaNames(session).contains(schemaName)) { + if (false == listSchemaNames(session).contains(schemaName)) { return null; }
166-167
:⚠️ Potential issueHandle potential NPE when prefix contains only table name
SchemaTableName
's constructor will throw NullPointerException if schema name is null. This needs to be handled when creating SchemaTableName from a prefix that might only have a table name.- schemaTableNames = ImmutableList.of(new SchemaTableName(prefix.getSchemaName(), prefix.getTableName())); + String schema = Optional.ofNullable(prefix.getSchemaName()).orElse(DEFAULT_SCHEMA_NAME); + schemaTableNames = ImmutableList.of(new SchemaTableName(schema, prefix.getTableName()));
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (19)
presto-clp/pom.xml
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java
(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java
(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java
(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
(1 hunks)presto-docs/src/main/sphinx/connector/clp.rst
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
: - Preferfalse == <expression>
rather than!<expression>
.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java
🧬 Code Graph Analysis (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (1)
ClpMySqlMetadataProvider
(33-134)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1)
ClpMySqlSplitProvider
(32-97)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
ClpConfig
(21-170)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
presto-spi/src/main/java/com/facebook/presto/spi/ColumnMetadata.java (1)
ColumnMetadata
(29-205)
⏰ Context from checks skipped due to timeout of 90000ms (20)
- GitHub Check: product-tests-specific-environment2 (8)
- GitHub Check: product-tests-specific-environment1 (17.0.13)
- GitHub Check: test-other-modules (8)
- GitHub Check: test (17.0.13, :presto-cassandra -P test-cassandra-integration-smoke-test)
- GitHub Check: hive-dockerized-tests (8)
- GitHub Check: hive-dockerized-tests (17.0.13)
- GitHub Check: hive-tests (17.0.13)
- GitHub Check: hive-tests (8)
- GitHub Check: singlestore-dockerized-tests (8)
- GitHub Check: singlestore-dockerized-tests (17.0.13)
- GitHub Check: spark-integration (17.0.13)
- GitHub Check: spark-integration (8)
- GitHub Check: product-tests-basic-environment (17.0.13)
- GitHub Check: product-tests-basic-environment (8)
- GitHub Check: arrowflight-java-tests (17.0.13, :presto-base-arrow-flight)
- GitHub Check: arrowflight-java-tests (8, :presto-base-arrow-flight)
- GitHub Check: test (:presto-docs)
- GitHub Check: prestocpp-linux-build-for-test
- GitHub Check: maven-checks (8)
- GitHub Check: maven-checks (17.0.13)
🔇 Additional comments (4)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java (1)
28-35
:UnsupportedOperationException
is a sensible interim guardNice follow-up to the earlier feedback—returning
null
would have blown up later; the explicit exception makes the limitation clear at call-site.
No further action needed unless/until physical row extraction is implemented.presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1)
78-78
:❌ Incorrect review comment
Potential SQL injection vulnerability in query construction.
The query is constructed using string formatting with user-provided table names, which could be vulnerable to SQL injection attacks. Use prepared statements with proper parameter binding instead.
🏁 Script executed:
#!/bin/bash # Check if ClpConfig has validation for table names grep -A 10 "SAFE_SQL_IDENTIFIER" presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.javaLength of output: 960
🏁 Script executed:
#!/bin/bash # Display the beginning of ClpMySqlSplitProvider to inspect tableName origin and query template sed -n '1,200p' presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.javaLength of output: 3805
🏁 Script executed:
#!/bin/bash # Examine ClpTableHandle to see how schema and table names are handled sed -n '1,200p' presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.javaLength of output: 2717
Ignore SQL injection warning in ClpMySqlSplitProvider.java
The metadataTablePrefix is already validated against
SAFE_SQL_IDENTIFIER
(only alphanumerics and underscores),tableName
comes from Presto’s parsedSchemaTableName
(also restricted to valid identifiers), and the combined table name is wrapped in backticks. There is no injection vector in this query construction – no changes required.Likely an incorrect or invalid review comment.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java (1)
40-46
: Proper negative assertion for KQL expression - great!The code now correctly asserts that no KQL expression is present when none is expected, addressing a previous concern about hiding regressions.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
126-126
: 🛠️ Refactor suggestionFix negation style to conform to coding guidelines
The code uses
metadataTablePrefix == null
which violates the coding guideline that prefersnull == expression
overexpression == null
.- if (metadataTablePrefix == null || !SAFE_SQL_IDENTIFIER.matcher(metadataTablePrefix).matches()) { + if (null == metadataTablePrefix || false == SAFE_SQL_IDENTIFIER.matcher(metadataTablePrefix).matches()) {Likely an incorrect or invalid review comment.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java
Show resolved
Hide resolved
1b8d043
to
30ec77f
Compare
2. rename splitPath to path in ClpSplit 3. add more properties for equals, hashCode, and toString in ClpTableLayoutHandle 4. remove public from methods in ClpMetadataProvider 5. change most fo log.error to log.warn in ClpMySqlSplitProvider and ClpMySqlMetadataProvider and fix one logging statement in ClpMySqlMetadataProvider 6. fix two minor issues in ClpMetadata 7. throw an exception in getRecordSet 8. improve TestClpOptimizer 9. refactor ClpSchemaTree and fix several bugs 10. add StorageType to ClpTableHandle 11. add archive_storage_type to ClpMySqlMetadataProvider and ClpMySqlSplitProvider 12. change listing table names to listing ClpTableHandle in ClpMetadataProvider reflect the changes in ClpMetadata 13. use constants in ClpMySqlMetadataProvider and ClpMySqlSplitProvider 14. update TestClpSplit and TestClpMetadata to include archive_storage_type 15. remove archive-source from ClpConfig and docs 16. fix legacy configuration names in clp docs 17. quote table names and column names in ClpMySqlMetadataProvider and ClpMySqlSplitProvider 18. add tablePath to ClpTableHandle and simplifies the logic in ClpMySqlSplitProvider 19. update docstring for ClpMySqlMetadataProvider
30ec77f
to
7ca1179
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 12
♻️ Duplicate comments (36)
presto-clp/pom.xml (2)
16-19
: Add<build>
section to presto-clp/pom.xmlThe CLP connector POM omits a
<build>
block. All other connector modules (e.g. presto-accumulo, presto-base-arrow-flight) declare build plugins—such as the Maven Surefire and Enforcer plugins—to ensure thepresto-plugin
packaging is handled correctly.Please add a
<build>
section immediately after the<properties>
block inpresto-clp/pom.xml
. For example:</properties> + + <build> + <plugins> + <plugin> + <groupId>org.apache.maven.plugins</groupId> + <artifactId>maven-surefire-plugin</artifactId> + <configuration> + <skipTests>true</skipTests> + </configuration> + </plugin> + <!-- include any additional plugins (enforcer, packaging, etc.) as in other connectors --> + </plugins> + </build>
22-25
: Consider specifying a version range for mysql-connector-jThe MySQL connector dependency doesn't specify a version constraint, which might lead to unexpected behavior if the version changes.
<dependency> <groupId>com.mysql</groupId> <artifactId>mysql-connector-j</artifactId> + <version>[8.0.0,)</version> <scope>runtime</scope> </dependency>
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1)
22-33
: 🛠️ Refactor suggestionAdd missing listSchemaNames method
The interface is well-defined, but it should also include a
listSchemaNames()
method to retrieve all available schema names. This would make the interface more complete and consistent with other connector implementations.public interface ClpMetadataProvider { /** * Returns the list of column handles for the given table. */ List<ClpColumnHandle> listColumnHandles(SchemaTableName schemaTableName); /** * Returns the list of table handles in the given schema. */ List<ClpTableHandle> listTableHandles(String schema); + + /** + * Returns the list of all schema names. + */ + List<String> listSchemaNames(); }presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (2)
40-47
: 🧹 Nitpick (assertive)Refactor provider selection logic for better scalability
The current if/else chain with immediate PrestoException will become unwieldy as more provider types are added. Consider using a map-based approach to improve maintainability:
- if (config.getMetadataProviderType() == ClpConfig.MetadataProviderType.MYSQL) { - binder.bind(ClpMetadataProvider.class).to(ClpMySqlMetadataProvider.class).in(Scopes.SINGLETON); - } - else { - throw new PrestoException(ClpErrorCode.CLP_UNSUPPORTED_METADATA_SOURCE, - "Unsupported metadata provider type: " + config.getMetadataProviderType()); - } + Map<ClpConfig.MetadataProviderType, Class<? extends ClpMetadataProvider>> metadataProviders = + ImmutableMap.of(ClpConfig.MetadataProviderType.MYSQL, ClpMySqlMetadataProvider.class); + + Class<? extends ClpMetadataProvider> metadataProviderClass = metadataProviders.get(config.getMetadataProviderType()); + if (metadataProviderClass == null) { + throw new PrestoException(ClpErrorCode.CLP_UNSUPPORTED_METADATA_SOURCE, + "Unsupported metadata provider type: " + config.getMetadataProviderType()); + } + binder.bind(ClpMetadataProvider.class).to(metadataProviderClass).in(Scopes.SINGLETON);
49-55
: 🧹 Nitpick (assertive)Refactor split provider selection logic similarly
Apply the same map-based approach to the split provider selection:
- if (config.getSplitProviderType() == ClpConfig.SplitProviderType.MYSQL) { - binder.bind(ClpSplitProvider.class).to(ClpMySqlSplitProvider.class).in(Scopes.SINGLETON); - } - else { - throw new PrestoException(ClpErrorCode.CLP_UNSUPPORTED_SPLIT_SOURCE, - "Unsupported split provider type: " + config.getSplitProviderType()); - } + Map<ClpConfig.SplitProviderType, Class<? extends ClpSplitProvider>> splitProviders = + ImmutableMap.of(ClpConfig.SplitProviderType.MYSQL, ClpMySqlSplitProvider.class); + + Class<? extends ClpSplitProvider> splitProviderClass = splitProviders.get(config.getSplitProviderType()); + if (splitProviderClass == null) { + throw new PrestoException(ClpErrorCode.CLP_UNSUPPORTED_SPLIT_SOURCE, + "Unsupported split provider type: " + config.getSplitProviderType()); + } + binder.bind(ClpSplitProvider.class).to(splitProviderClass).in(Scopes.SINGLETON);presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (3)
33-37
: Add null-safety check to constructorThe constructor accepts a path without verifying it's non-null, which could lead to NullPointerExceptions later when the path is used.
@JsonCreator public ClpSplit(@JsonProperty("path") String path) { - this.path = path; + this.path = requireNonNull(path, "path is null"); }This would require adding the following import:
import static java.util.Objects.requireNonNull;
39-43
: Document the returned path fieldThe getter method lacks documentation explaining what this path represents in the context of a CLP split.
@JsonProperty +/** + * Returns the path to the split's data location. + * This path is used by the record set provider to read data from the corresponding location. + */ public String getPath() { return path; }
57-61
: Consider providing more specific info in getInfo()The
getInfo()
method returns the instance itself, which might not be the most informative approach if specific split information is needed.- public Object getInfo() - { - return this; - } + public Object getInfo() + { + return ImmutableMap.of("splitPath", path == null ? "NULL" : path); + }This would require adding the following import:
import com.google.common.collect.ImmutableMap;presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1)
57-58
: Follow project coding guidelines for boolean expressionsAccording to the coding guidelines, prefer
false == <expression>
rather than!<expression>
for boolean expressions.- if (o == null || getClass() != o.getClass()) { + if (false == (o != null) || getClass() != o.getClass()) { return false; }presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (5)
50-50
: Consider using platform-independent temp directoryThe hard-coded path
/tmp/split_testdb
might not work on all operating systems, particularly Windows.- final String metadataDbUrl = "jdbc:h2:file:/tmp/split_testdb;MODE=MySQL;DATABASE_TO_UPPER=FALSE"; + final String tempDir = System.getProperty("java.io.tmpdir"); + final String metadataDbUrl = String.format("jdbc:h2:file:%s/split_testdb;MODE=MySQL;DATABASE_TO_UPPER=FALSE", tempDir);
84-84
: Use platform-independent path constructionHard-coded path separators may cause issues on different operating systems.
- pstmt.setString(2, "/tmp/archives/" + tableName); + pstmt.setString(2, new File(System.getProperty("java.io.tmpdir"), "archives" + File.separator + tableName).getPath());
118-126
: Improve file cleanup handling in tearDown methodThe current implementation has several issues:
- Uses hard-coded paths that aren't platform-independent
- Doesn't handle exceptions during file deletion
- File deletion failures are silently ignored
- File dbFile = new File("/tmp/split_testdb.mv.db"); - File lockFile = new File("/tmp/split_testdb.trace.db"); // Optional, H2 sometimes creates this - if (dbFile.exists()) { - dbFile.delete(); - System.out.println("Deleted database file: " + dbFile.getAbsolutePath()); - } - if (lockFile.exists()) { - lockFile.delete(); - } + String tempDir = System.getProperty("java.io.tmpdir"); + File dbFile = new File(tempDir, "split_testdb.mv.db"); + File lockFile = new File(tempDir, "split_testdb.trace.db"); // Optional, H2 sometimes creates this + try { + if (dbFile.exists() && !dbFile.delete()) { + System.err.println("Failed to delete database file: " + dbFile.getAbsolutePath()); + } + else if (dbFile.exists()) { + System.out.println("Deleted database file: " + dbFile.getAbsolutePath()); + } + + if (lockFile.exists() && !lockFile.delete()) { + System.err.println("Failed to delete lock file: " + lockFile.getAbsolutePath()); + } + } + catch (Exception e) { + System.err.println("Error cleaning up database files: " + e.getMessage()); + }
136-137
: Use platform-independent path in table handle constructionHard-coded Unix-style paths might not work on all platforms.
- new ClpTableHandle(new SchemaTableName(TABLE_SCHEMA, tableName), - "/tmp/archives/" + tableName, + new ClpTableHandle(new SchemaTableName(TABLE_SCHEMA, tableName), + new File(System.getProperty("java.io.tmpdir"), "archives" + File.separator + tableName).getPath(),
142-142
: Use platform-independent path for split verificationThe expected path in the assertion uses hard-coded Unix-style separators.
- assertEquals(splits.get(i).getPath(), "/tmp/archives/" + tableName + "/id_" + i); + String expectedPath = new File(new File(new File(System.getProperty("java.io.tmpdir"), "archives"), tableName), "id_" + i).getPath(); + assertEquals(splits.get(i).getPath(), expectedPath);presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (2)
97-98
: Coding guideline violation in the equals methodThe equals method doesn't follow the coding guideline that requires using
false == (expression)
rather than the negation form. The type check should be updated to match the project's style.- if (obj == null || getClass() != obj.getClass()) { + if (false == (obj instanceof ClpColumnHandle)) { return false; }
104-104
: Simplify boolean comparisonSince
nullable
is a primitive boolean, usingObjects.equals()
is unnecessary.- Objects.equals(this.nullable, other.nullable); + this.nullable == other.nullable;presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (3)
61-68
: Still violates coding guideline & risks SQL-injection inUSE
clauseThe
if (dbName != null && !dbName.isEmpty())
check continues to (a) use!expr
instead of the mandatedfalse == expr
, and (b) concatenatesdbName
straight into an SQL statement. These exact issues were raised in earlier reviews.
45-47
: Validate identifiers & avoid string-interpolated SQL
SQL_SELECT_ARCHIVES_TEMPLATE
and the subsequentString.format(...)
embedtableName
into the query without validation. This leaves the provider open to SQL-injection via crafted table names coming from the metastore. Previous reviews already suggested:
- Verify
tableName
against a safe[A-Za-z0-9_]+
regex.- Build the dynamic table identifier only after validation (identifiers cannot be parameterised but must at least be quoted/back-ticked).
- Keep value predicates (
WHERE …
) parameterised.Also applies to: 78-79
80-90
: Logger signature still drops stack-trace
log.warn("Database error while processing splits for %s: %s", tableName, e);
passes the throwable as a formatting argument instead of the first parameter, so no stack trace is emitted. This was already pointed out in an earlier review.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1)
84-92
:shutdown()
still suppresses failures
lifeCycleManager.stop()
exceptions are logged and then ignored, potentially masking resource leaks. This exact issue was raised previously; please propagate the exception (or wrap inRuntimeException
) so Presto can fail the connector cleanly.presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (3)
60-66
: Avoid fixed/tmp
database path – use a temp directory for hermetic testsHard-coding
/tmp/metadata_testdb
causes flakiness on CI agents running tests in parallel or on systems without write access to/tmp
.
UseFiles.createTempDirectory()
(or H2’sjdbc:h2:mem:
) to obtain an isolated location and clean it up automatically.
110-118
: Drop heavyPair
dependency for simple tuplesPulling in
org.apache.commons.math3
for thePair
class bloats the test class-path.
A minimal replacement isMap.Entry<String, ClpNodeType>
or a localrecord
.
136-145
: Check return values when deleting files and avoidSystem.out
File.delete()
can silently fail andSystem.out
pollutes test logs.
Assert deletion succeeded or usejava.nio.file.Files.deleteIfExists(Path)
with proper exception handling.presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (1)
135-148
: Column order is nondeterministic – iterate deterministically to avoid inconsistent schemas
HashMap
iteration order varies between JVM runs, so the order of columns returned bycollectColumnHandles()
is unstable.
This can cause different workers to disagree on table metadata, leading to query failures.- for (Map.Entry<String, ClpNode> entry : root.children.entrySet()) { + root.children.entrySet().stream() + .sorted(Map.Entry.comparingByKey()) // deterministic order + .forEach(entry -> { String name = entry.getKey(); ClpNode child = entry.getValue(); if (child.isLeaf()) { columns.add(new ClpColumnHandle(name, child.originalName, child.type, true)); } else { Type rowType = buildRowType(child); columns.add(new ClpColumnHandle(name, child.originalName, rowType, true)); } - } + });presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java (1)
60-61
: 🧹 Nitpick (assertive)Reuse
SessionHolder
to shorten tests & cut object churn
SessionHolder
is created afresh inside every test method, but it is completely stateless after construction.
Declaring a single instance as a field initialised in a@BeforeMethod
(or@BeforeClass
) would eliminate dozens of allocations and shave a few hundred ms off suite runtime.This is exactly the optimisation raised in an earlier review; feel free to ignore if you decided against it for explicit-is-better reasons.
Also applies to: 85-86, 100-101, 121-122, 142-143, 162-163, 182-183, 200-201
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (3)
74-77
: 🧹 Nitpick (assertive)Follow negation guideline & guard against NPE when selecting DB
Project guideline: prefer
false == expression
over!expression
.
The current code also evaluatesisEmpty()
whendbName
can benull
.- String dbName = config.getMetadataDbName(); - if (dbName != null && !dbName.isEmpty()) { - connection.createStatement().execute(String.format("USE `%s`", dbName)); + String dbName = config.getMetadataDbName(); + if (null != dbName && false == dbName.isEmpty()) { + // Consider a prepared statement to avoid edge-case quoting issues + try (Statement useStmt = connection.createStatement()) { + useStmt.execute(String.format("USE `%s`", dbName)); + } }
102-103
: 🧹 Nitpick (assertive)Logging pattern discards stack-trace – swap argument order
Using the throwable as a format argument loses the stack-trace.
Swap parameters so the exception becomes the first argument tolog.warn/error
.- log.warn("Failed to load table schema for %s: %s", schemaTableName.getTableName(), e); + log.warn(e, "Failed to load table schema for %s", schemaTableName.getTableName()); - log.warn("Failed to load table names: %s", e); + log.warn(e, "Failed to load table names");Also applies to: 130-132
89-93
:⚠️ Potential issueValidate table name before embedding it in SQL
schemaTableName.getTableName()
is interpolated directly into theSELECT
text.
AlthoughlistTableHandles
validates identifiers,listColumnHandles
does not, so a malicious or corrupted metastore entry could lead to SQL-injection-style exploits or runtime errors.Add an explicit
isValidIdentifier()
check and fail fast:- String query = String.format(SQL_SELECT_COLUMN_METADATA_TEMPLATE, - config.getMetadataTablePrefix(), schemaTableName.getTableName()); + String table = schemaTableName.getTableName(); + if (false == isValidIdentifier(table)) { + throw new IllegalArgumentException("Invalid table identifier: " + table); + } + String query = String.format(SQL_SELECT_COLUMN_METADATA_TEMPLATE, + config.getMetadataTablePrefix(), table);presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (2)
70-116
: 🛠️ Refactor suggestionFail fast: validate mandatory MySQL connection properties
When
metadataProviderType == MYSQL
, the URL, user, and password are required.
Currently the first null dereference will surface deep inDriverManager.getConnection
, making diagnosis harder.Consider a
@PostConstruct
validator:@PostConstruct public void validate() { if (MetadataProviderType.MYSQL.equals(metadataProviderType)) { checkState(null != metadataDbUrl, "clp.metadata-db-url must be set"); checkState(null != metadataDbUser, "clp.metadata-db-user must be set"); checkState(null != metadataDbPassword, "clp.metadata-db-password must be set"); } }
124-131
: 🧹 Nitpick (assertive)Normalise table prefix for case-insensitive back-ends
Lower-casing the validated prefix avoids accidental mismatches where users mix cases.
- this.metadataTablePrefix = metadataTablePrefix; + this.metadataTablePrefix = metadataTablePrefix.toLowerCase(Locale.ROOT);presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (2)
98-101
: Reuse the sharedSqlParser
instance instead of allocating a new one per callThis identical issue was pointed out in an earlier review. Instantiating
SqlParser
every timeexpression()
is invoked creates unnecessary object churn and slows the test-suite. A single, static, thread-safe parser (plus a reusedParsingOptions
) will suffice.
105-109
:toRowExpression
suffers from the same repeatedSqlParser
allocationFor consistency with the fix requested above (and the past review), replace the
new SqlParser()
here with the shared static instance.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (4)
57-65
: Caches are still unbounded – risk of un-controlled heap growthPrevious feedback recommended adding a
maximumSize(...)
(or weight) limit to bothcolumnHandleCache
andtableHandleCache
. Without it, long-running clusters that see many tables/schemas can exhaust heap.
98-100
: Preferfalse == expr
over!expr
to follow project style-guideThe conditional negations here (
!listSchemaNames(...)
) deviate from the prescribed coding guideline for*.java
files. Replace withfalse == listSchemaNames(...).contains(...)
for consistency.Also applies to: 112-113, 157-158
166-167
: PossibleNullPointerException
when schema is omitted
prefix.getSchemaName()
may benull
. Passing it directly tonew SchemaTableName(...)
will throw. Guard withOptional.ofNullable(...).orElse(DEFAULT_SCHEMA_NAME)
as suggested previously.
69-78
: Add error handling when loading metadata from provider
loadColumnHandles
andloadTableHandles
bubble up unchecked exceptions from the provider, which can crash planning threads. Wrap the calls intry-catch
and translate to aPrestoException
with a clear error code for easier diagnostics (see prior review).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (19)
presto-clp/pom.xml
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java
(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java
(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java
(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
(1 hunks)presto-docs/src/main/sphinx/connector/clp.rst
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
: - Preferfalse == <expression>
rather than!<expression>
.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java
🧬 Code Graph Analysis (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
ClpColumnHandle
(26-117)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java (1)
ClpTableHandle
(25-96)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: spark-integration (8)
- GitHub Check: spark-integration (17.0.13)
🔇 Additional comments (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java (1)
25-36
: Implementation correctly throws UnsupportedOperationExceptionThe implementation correctly throws an UnsupportedOperationException with a clear message, as suggested in previous reviews. This is better than returning null, which would cause a NullPointerException.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1)
29-32
: Javadoc should match method signatureThe Javadoc comment mentions "table handles" which correctly matches the return type of
List<ClpTableHandle>
. This is an improvement over previous versions.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)
49-49
: 🧹 Nitpick (assertive)Apply yoda condition here too
Same issue as above:
- if (config.getSplitProviderType() == ClpConfig.SplitProviderType.MYSQL) { + if (ClpConfig.SplitProviderType.MYSQL == config.getSplitProviderType()) {Likely an incorrect or invalid review comment.
binder.bind(ClpConnector.class).in(Scopes.SINGLETON); | ||
binder.bind(ClpMetadata.class).in(Scopes.SINGLETON); | ||
binder.bind(ClpPlanOptimizerProvider.class).in(Scopes.SINGLETON); | ||
binder.bind(ClpRecordSetProvider.class).in(Scopes.SINGLETON); | ||
binder.bind(ClpSplitManager.class).in(Scopes.SINGLETON); | ||
configBinder(binder).bindConfig(ClpConfig.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Consider consolidating singleton bindings for better readability
The five repetitive binder.bind(X.class).in(Scopes.SINGLETON);
lines reduce readability. Consider introducing a helper method to make the code more concise and maintainable.
- binder.bind(ClpConnector.class).in(Scopes.SINGLETON);
- binder.bind(ClpMetadata.class).in(Scopes.SINGLETON);
- binder.bind(ClpPlanOptimizerProvider.class).in(Scopes.SINGLETON);
- binder.bind(ClpRecordSetProvider.class).in(Scopes.SINGLETON);
- binder.bind(ClpSplitManager.class).in(Scopes.SINGLETON);
+ bindSingletons(binder,
+ ClpConnector.class,
+ ClpMetadata.class,
+ ClpPlanOptimizerProvider.class,
+ ClpRecordSetProvider.class,
+ ClpSplitManager.class);
You would need to add a helper method:
private void bindSingletons(Binder binder, Class<?>... types) {
for (Class<?> type : types) {
binder.bind(type).in(Scopes.SINGLETON);
}
}
configBinder(binder).bindConfig(ClpConfig.class); | ||
|
||
ClpConfig config = buildConfigObject(ClpConfig.class); | ||
if (config.getMetadataProviderType() == ClpConfig.MetadataProviderType.MYSQL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Use yoda conditions for better error prevention
Per coding guidelines, prefer false == <expression>
rather than !<expression>
pattern for conditional checks. For enum comparisons, this translates to putting the constant on the left side:
- if (config.getMetadataProviderType() == ClpConfig.MetadataProviderType.MYSQL) {
+ if (ClpConfig.MetadataProviderType.MYSQL == config.getMetadataProviderType()) {
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if (config.getMetadataProviderType() == ClpConfig.MetadataProviderType.MYSQL) { | |
if (ClpConfig.MetadataProviderType.MYSQL == config.getMetadataProviderType()) { |
private final String path; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add @nullable annotation for the path field
The path
field is not annotated with @Nullable
, but it appears to accept null values based on the constructor implementation. This creates ambiguity about whether null values are allowed.
- private final String path;
+ @Nullable
+ private final String path;
This would require importing:
import javax.annotation.Nullable;
catch (SQLException e) { | ||
log.warn("Database error while processing splits for %s: %s", tableName, e); | ||
} | ||
|
||
return splits; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Swallowing SQLException hides failures
After catching SQLException
the method just logs and returns an (often empty) list. Callers cannot distinguish between “no splits” and “backend failure”, which can lead to silent data loss in production. Consider re-throwing a PrestoException
with an appropriate error code.
- catch (SQLException e) {
- log.warn("Database error while processing splits for %s: %s", tableName, e);
- }
+ catch (SQLException e) {
+ log.warn(e, "Database error while processing splits for %s", tableName);
+ throw new PrestoException(GENERIC_INTERNAL_ERROR,
+ format("Unable to list CLP splits for table %s: %s", tableName, e.getMessage()), e);
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
catch (SQLException e) { | |
log.warn("Database error while processing splits for %s: %s", tableName, e); | |
} | |
return splits; | |
} | |
catch (SQLException e) { | |
log.warn(e, "Database error while processing splits for %s", tableName); | |
throw new PrestoException(GENERIC_INTERNAL_ERROR, | |
format("Unable to list CLP splits for table %s: %s", tableName, e.getMessage()), e); | |
} | |
return splits; |
The CLP Connector enables SQL-based querying of CLP-S archives from Presto. This document describes how to set up the | ||
CLP Connector to run SQL queries. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Explain what “CLP-S archives” are
The doc still assumes readers know what a “CLP-S archive” is. A two-sentence primer (format, typical producers, why it matters) would improve accessibility.
{ | ||
if (!polymorphicTypeEnabled) { | ||
return baseName; | ||
} | ||
|
||
boolean conflictDetected = false; | ||
if (parent.children.containsKey(baseName)) { | ||
ClpNode existing = parent.children.get(baseName); | ||
if (existing.type == null) { | ||
conflictDetected = true; | ||
} | ||
else if (!existing.type.equals(newType)) { | ||
String existingSuffix = getTypeSuffix(existing.type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Negation style violates project guideline
The project mandates false == <expression>
instead of !<expression>
(e.g., line 152 and 162).
Please adjust to keep the codebase stylistically uniform.
List<RowType.Field> fields = new ArrayList<>(); | ||
for (Map.Entry<String, ClpNode> entry : node.children.entrySet()) { | ||
String name = entry.getKey(); | ||
ClpNode child = entry.getValue(); | ||
Type fieldType = child.isLeaf() ? child.type : buildRowType(child); | ||
fields.add(new RowType.Field(Optional.of(name), fieldType)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Nested RowType
field order also nondeterministic
buildRowType()
iterates over a HashMap
, producing unpredictable RowType
signatures.
Sort the entries (or use LinkedHashMap
) before adding fields so workers build identical types.
private boolean isValidIdentifier(String identifier) | ||
{ | ||
return identifier != null && ClpConfig.SAFE_SQL_IDENTIFIER.matcher(identifier).matches(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Negation style in isValidIdentifier
violates coding guideline
return identifier != null && ClpConfig.SAFE_SQL_IDENTIFIER.matcher(identifier).matches();
Should read:
-return identifier != null && ClpConfig.SAFE_SQL_IDENTIFIER.matcher(identifier).matches();
+return null != identifier && true == ClpConfig.SAFE_SQL_IDENTIFIER.matcher(identifier).matches();
if (isValidIdentifier(tableName) && archiveStorageDirectory != null && | ||
!archiveStorageDirectory.isEmpty()) { | ||
tableHandles.add(new ClpTableHandle(new SchemaTableName(schemaName, tableName), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Negation guideline in emptiness check
archiveStorageDirectory != null && !archiveStorageDirectory.isEmpty()
Be consistent with the project rule:
- if (isValidIdentifier(tableName) && archiveStorageDirectory != null &&
- !archiveStorageDirectory.isEmpty()) {
+ if (true == isValidIdentifier(tableName) && null != archiveStorageDirectory &&
+ false == archiveStorageDirectory.isEmpty()) {
ConnectorTableHandle table, | ||
Constraint<ColumnHandle> constraint, | ||
Optional<Set<ColumnHandle>> desiredColumns) | ||
{ | ||
ClpTableHandle tableHandle = (ClpTableHandle) table; | ||
ConnectorTableLayout layout = new ConnectorTableLayout(new ClpTableLayoutHandle(tableHandle, Optional.empty())); | ||
return ImmutableList.of(new ConnectorTableLayoutResult(layout, constraint.getSummary())); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Missed opportunity to push predicates into the layout handle
getTableLayouts
ignores the supplied Constraint
and always builds a layout with Optional.empty()
. This forfeits filter push-down – contradicting the intent of the CLP connector to embed KQL predicates at planning time. Consider incorporating constraint.getSummary()
(or the decomposed predicate) into the ClpTableLayoutHandle
, e.g.:
- ConnectorTableLayout layout = new ConnectorTableLayout(new ClpTableLayoutHandle(tableHandle, Optional.empty()));
+ ConnectorTableLayout layout = new ConnectorTableLayout(
+ new ClpTableLayoutHandle(tableHandle, Optional.of(constraint.getSummary())));
This small change enables richer predicate push-down without altering downstream components.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
ConnectorTableHandle table, | |
Constraint<ColumnHandle> constraint, | |
Optional<Set<ColumnHandle>> desiredColumns) | |
{ | |
ClpTableHandle tableHandle = (ClpTableHandle) table; | |
ConnectorTableLayout layout = new ConnectorTableLayout(new ClpTableLayoutHandle(tableHandle, Optional.empty())); | |
return ImmutableList.of(new ConnectorTableLayoutResult(layout, constraint.getSummary())); | |
} | |
ConnectorTableHandle table, | |
Constraint<ColumnHandle> constraint, | |
Optional<Set<ColumnHandle>> desiredColumns) | |
{ | |
ClpTableHandle tableHandle = (ClpTableHandle) table; | |
- ConnectorTableLayout layout = new ConnectorTableLayout(new ClpTableLayoutHandle(tableHandle, Optional.empty())); | |
+ ConnectorTableLayout layout = new ConnectorTableLayout( | |
+ new ClpTableLayoutHandle(tableHandle, Optional.of(constraint.getSummary()))); | |
return ImmutableList.of(new ConnectorTableLayoutResult(layout, constraint.getSummary())); | |
} |
Description
This PR introduces a CLP connector. The background and proposed implementation details are outlined in the associated RFC.
This PR implements one part of phase 1 of the proposed implementation, namely the Java implementation for the coordinator. The worker implementation will leverage Velox as the default query engine, so once the Velox PR is merged, we will submit another PR to this repo to add the necessary changes to
presto-native-execution
.Like other connectors, we have created a
presto-clp
module and implemented all required connector interfaces as well as a few extras to support query push downs.The important classes in the connector are described below.
Core Classes in Java
ClpConfig
The configuration class for CLP. Currently, we support the following properties:
clp.metadata-expire-interval
: Defines the time interval after which metadata entries are considered expired and removed from the cache.clp.metadata-refresh-interval
: Specifies how frequently metadata should be refreshed from the source to ensure up-to-date information.clp.polymorphic-type-enabled
: Enables or disables support for polymorphic types within CLP. This determines whether dynamic type resolution is allowed.clp.metadata-provider-type
: Defines the type of the metadata provider. It could be a database, a file-based store, or another external system. By default, we use MySQL.clp.metadata-db-url
: The connection URL for the metadata database, used whenclp.metadata-provider-type
is configured to use a database.clp.metadata-db-name
: The name of the metadata database.clp.metadata-db-user
: The database user with access to the metadata database.clp.metadata-db-password
: The password for the metadata database user.clp.metadata-table-prefix
: A prefix applied to table names in the metadata database.clp.split-provider-type
: Defines the type of split provider for query execution. By default, we use MySQL, and the connection parameters are the same as those for the metadata database.ClpSchemaTree
A helper class for constructing a nested schema representation from CLP’s column definitions. It supports hierarchical column names (e.g.,
a.b.c
), handles name/type conflicts when theclp.polymorphic-type-enabled
option is enabled, and maps serialized CLP types to Presto types. The schema tree produces a flat list ofClpColumnHandle
instances, includingRowType
for nested structures, making it suitable for dynamic or semi-structured data formats.When polymorphic types are enabled, conflicting fields are given unique names by appending a type-specific suffix to the column name. For instance, if an integer field named "a" and a
Varstring
(CLP type) field named "a" coexist in CLP’s schema tree, they are represented asa_bigint
anda_varchar
in Presto. This approach ensures that such fields remain queryable while adhering to Presto’s constraints.ClpMetadataProvider
An interface responsible for retrieving metadata from a specified source.
We provide a default implementation called
ClpMySqlMetadataProvider
, which uses two MySQL tables. One of these is the datasets table, defined with the schema shown below. Currently, we support only a single Presto schema nameddefault
, and this metadata table stores all table names, paths, and storage types associated with that Presto schema.name
VARCHAR(255)
PRIMARY KEY
archive_storage_type
VARCHAR(4096)
NOT NULL
archive_storage_directory
VARCHAR(4096)
NOT NULL
The second MySQL table contains column metadata, defined by the schema shown below. Each Presto table is associated with a corresponding MySQL table that stores metadata about its columns.
name
VARCHAR(512)
NOT NULL
type
TINYINT
NOT NULL
name
,type
)ClpSplitProvider
In CLP, an archive is the fundamental unit for searching, and we treat each archive as a Presto Split. This allows independent parallel searches across archives. The
ClpSplitProvider
interface, shown below, defines how to retrieve split information from a specified source:We provide a default implementation called
ClpMySqlSplitProvider
. It uses an archive table to store archive IDs associated with each table. The table below shows part of the schema (some irrelevant fields are omitted).pagination_id
BIGINT
AUTO_INCREMENT PRIMARY KRY
id
VARCHAR(128)
NOT NULL
By concatenating the table path (
archive_storage_directory
) and the archive ID (id
), we can retrieve all split paths for a table.ClpMetadata
This interface enables Presto to access various metadata. All requests are delegated to
ClpMetadataProvider
For metadata management, it also maintains two caches and periodically refreshes the metadata.
columnHandleCache
: ALoadingCache<SchemaTableName, List<ClpColumnHandle>>
that maps aSchemaTableName
to its corresponding list ofClpColumnHandle
objects.tableHandleCache
: ALoadingCache<String, List<ClpTableHandle>>
that maps a schema name (String
) to a list ofClpTableHandle
objectsClpConnectorPlanOptimizer
andClpFilterToKqlConverter
Presto exposes
PlanNode
to the connector, allowing the connector to push down relevant filters to CLP for improved query performance.There are three main steps
ClpTableLayoutHandle
and constructing a newTableScanNode
FilterNode
with any remaining predicates and the newTableScanNode
ClpFilterToKqlConverter
implementsRowExpressionVisitor<ClpExpression, Void>
and handles expression transformation and pushdown. Since KQL is not SQL-compatible, only certain types of filters can be converted, including:LIKE
predicate (the"^%[^%_]*%$"
pattern is not pushed down)AND
,OR
,NOT
)IS NULL
substr(a, start, length) = 'abc'
andsubstr(a, start) = 'abc'
formsFor comparison or match expressions, one side must contain a
VariableReferenceExpression
, and the other must be aConstantExpression
--specifically a string or numeric literal. Expressions likea > c
orsubstr(a, 2) = lower(c)
are not eligible for pushdown. In simple cases without logical operators, the SQL plan can be directly translated into a KQL query. However, for expressions involving logical operators, it's critical to ensure that all conditions are handled correctly.A naive approach would be to convert only the pushdown-eligible parts of the SQL query into KQL, letting Presto or Prestissimo handle the rest. But this can lead to incorrect execution plans and unintended behavior.
Consider the following query:
Since CLP doesn’t currently support
regexp_like
, if we simply push downb = 2
to CLP, Presto will only receive results whereb = 2
, effectively changing the query semantics toregexp_like(a, '\d+b') AND b = 2
.To prevent such issues, the pushdown logic follows these rules:
OR
pushdown: AnOR
condition is pushable if and only if all child expressions can be pushed down. In this case, all child expressions are pushed down together with the OR operator.AND
pushdown: AnAND
condition is not pushable if and only if none of its child expressions are pushable. Otherwise, pushable expressions are pushed down with theAND
operator, while non-pushable expressions remain in the original plan.CallExpression
and can be transformed into KQL syntax.Example transformations:
SQL WHERE Clause:
a = 2 AND b = '3'
a: 2 AND b: "3"
FilterNode
is removed.SQL WHERE Clause:
a.b LIKE '%string another_string' OR "c.d" = TRUE
a.b: "*string another_string" OR c.d: true
FilterNode
is removed.SQL WHERE Clause:
a > 2 OR regexp_like(b, '\d+b')
"*"
a > 2 OR regexp_like(b, '\d+b')
remains in theFilterNode
.SQL WHERE Clause:
a > 2 AND regexp_like(b, '\d+b')
a > 2
regexp_like(b, '\d+b')
remains in theFilterNode
.SQL WHERE Clause:
NOT substr(a, 2, 5) = 'Hello' and b IS NULL
NOT a: "?Hello*" AND NOT b: *
FilterNode
is removed.Motivation and Context
See the associated RFC.
Impact
This module is independent from other modules and will not affect any existing functionality.
Test Plan
Unit tests are included in this PR, and we have also done end-to-end tests.
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.