-
Notifications
You must be signed in to change notification settings - Fork 0
feat: Add CLP connector #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
presto-clp/src/test/java/com/yscope/presto/TestClpPlanOptimizer.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/yscope/presto/TestClpPlanOptimizer.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/yscope/presto/TestClpPlanOptimizer.java
Outdated
Show resolved
Hide resolved
public List<ClpColumnHandle> loadColumnHandles(SchemaTableName schemaTableName) | ||
{ | ||
List<ClpColumnHandle> columnHandles = clpMetadataProvider.listColumnHandles(schemaTableName); | ||
if (!config.isPolymorphicTypeEnabled()) { | ||
return columnHandles; | ||
} | ||
return handlePolymorphicType(columnHandles); | ||
} | ||
|
||
public List<String> loadTableNames(String schemaName) | ||
{ | ||
return clpMetadataProvider.listTableNames(schemaName); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
presto-clp/src/main/java/com/yscope/presto/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/test/java/com/yscope/presto/TestClpMetadata.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (11)
presto-clp/src/main/java/com/yscope/presto/ClpExpression.java (1)
46-46
: Minor spacing issue in constructor call.There's an extra space between
this
and the opening parenthesis.- this (Optional.empty(), Optional.empty()); + this(Optional.empty(), Optional.empty());presto-clp/src/main/java/com/yscope/presto/ClpSplitManager.java (3)
26-28
: Add Javadoc documentation to the classAdding class-level documentation would improve code maintainability by clarifying the purpose and usage of this connector component.
+/** + * Split manager for the CLP connector. + * Provides the implementation for retrieving the splits for CLP tables. + */ public class ClpSplitManager implements ConnectorSplitManager {
29-29
: Make the field declaration finalSince the field is only assigned in the constructor and not modified elsewhere, marking it as final would make the immutability explicit and improve code safety.
- private final ClpSplitProvider clpSplitProvider; + private final ClpSplitProvider clpSplitProvider;
44-44
: Add error handling for listSplits operationThe call to clpSplitProvider.listSplits() lacks error handling. Consider wrapping this call in a try-catch block to handle potential exceptions from the split provider and provide more informative error messages.
- return new FixedSplitSource(clpSplitProvider.listSplits(layoutHandle)); + try { + return new FixedSplitSource(clpSplitProvider.listSplits(layoutHandle)); + } + catch (RuntimeException e) { + throw new RuntimeException("Failed to get splits for " + layoutHandle.getTable().getSchemaTableName(), e); + }presto-clp/src/main/java/com/yscope/presto/ClpPlanOptimizer.java (4)
51-58
: Consider using thevariableAllocator
parameter or remove it if unnecessary.The
optimize
method signature requires aVariableAllocator
parameter, yet it is not used within the method. If it is not needed, consider removing it (if permissible by the interface) or explicitly document why it is unused.
73-73
: Usefalse == ...
instead of negation.According to the coding guidelines, prefer checking
false == (node.getSource() instanceof TableScanNode)
rather than using!(node.getSource() instanceof TableScanNode)
.Apply this diff to line 73:
-if (!(node.getSource() instanceof TableScanNode)) { +if (false == (node.getSource() instanceof TableScanNode)) {
86-86
: Replace negation withfalse == ...
.In keeping with the guidelines, prefer
false == kqlQuery.isPresent()
to!kqlQuery.isPresent()
.Apply this diff to line 86:
-if (!kqlQuery.isPresent()) { +if (false == kqlQuery.isPresent()) {
105-105
: Usefalse == remainingPredicate.isPresent()
.Same reasoning applies here. Change
!remainingPredicate.isPresent()
tofalse == remainingPredicate.isPresent()
.Apply this diff to line 105:
-if (!remainingPredicate.isPresent()) { +if (false == remainingPredicate.isPresent()) {presto-clp/src/main/java/com/yscope/presto/ClpConfig.java (1)
133-133
: Follow coding guidelines for negation checks.Instead of using
!SAFE_SQL_IDENTIFIER.matcher(metadataTablePrefix).matches()
, consider usingfalse == SAFE_SQL_IDENTIFIER.matcher(metadataTablePrefix).matches()
to align with the specified coding standard.- if (metadataTablePrefix == null || !SAFE_SQL_IDENTIFIER.matcher(metadataTablePrefix).matches()) { + if (metadataTablePrefix == null || false == SAFE_SQL_IDENTIFIER.matcher(metadataTablePrefix).matches()) {presto-clp/src/main/java/com/yscope/presto/ClpFilterToKqlConverter.java (2)
94-94
: Adhere to coding guidelines for boolean negation.
Within these lines, local style guidelines recommend usingfalse == <expression>
instead of!<expression>
. Consider updating for consistency.- if (expression.getRemainingExpression().isPresent() || !expression.getDefinition().isPresent()) { + if (expression.getRemainingExpression().isPresent() || false == expression.getDefinition().isPresent()) { - else if (!remainingExpressions.isEmpty()) { + else if (false == remainingExpressions.isEmpty()) { - if (expression.getRemainingExpression().isPresent() || !expression.getDefinition().isPresent()) { + if (expression.getRemainingExpression().isPresent() || false == expression.getDefinition().isPresent()) {Also applies to: 120-120, 144-144
100-135
: Refactor repetitive string-building in handleAnd/handleOr.
Both methods perform very similar string operations and then remove trailing operators. Refactoring to a shared helper method would improve readability and remove code duplication.- // Repetitive building, appending " AND " or " OR ", then slicing off the last operator + // Possible shared utility method e.g., buildCompoundExpression(List<ClpExpression> expressions, String operator)Also applies to: 137-152
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
presto-clp/src/main/java/com/yscope/presto/ClpConfig.java
(1 hunks)presto-clp/src/main/java/com/yscope/presto/ClpExpression.java
(1 hunks)presto-clp/src/main/java/com/yscope/presto/ClpFilterToKqlConverter.java
(1 hunks)presto-clp/src/main/java/com/yscope/presto/ClpMetadata.java
(1 hunks)presto-clp/src/main/java/com/yscope/presto/ClpModule.java
(1 hunks)presto-clp/src/main/java/com/yscope/presto/ClpPlanOptimizer.java
(1 hunks)presto-clp/src/main/java/com/yscope/presto/ClpSplitManager.java
(1 hunks)presto-clp/src/main/java/com/yscope/presto/metadata/ClpSchemaTree.java
(1 hunks)presto-clp/src/test/java/com/yscope/presto/TestClpMetadata.java
(1 hunks)presto-clp/src/test/java/com/yscope/presto/TestClpPlanOptimizer.java
(1 hunks)presto-clp/src/test/java/com/yscope/presto/TestClpSplit.java
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- presto-clp/src/test/java/com/yscope/presto/TestClpSplit.java
- presto-clp/src/test/java/com/yscope/presto/TestClpMetadata.java
- presto-clp/src/test/java/com/yscope/presto/TestClpPlanOptimizer.java
- presto-clp/src/main/java/com/yscope/presto/metadata/ClpSchemaTree.java
- presto-clp/src/main/java/com/yscope/presto/ClpMetadata.java
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
: - Preferfalse == <expression>
rather than!<expression>
.
presto-clp/src/main/java/com/yscope/presto/ClpSplitManager.java
presto-clp/src/main/java/com/yscope/presto/ClpExpression.java
presto-clp/src/main/java/com/yscope/presto/ClpFilterToKqlConverter.java
presto-clp/src/main/java/com/yscope/presto/ClpPlanOptimizer.java
presto-clp/src/main/java/com/yscope/presto/ClpConfig.java
presto-clp/src/main/java/com/yscope/presto/ClpModule.java
🧬 Code Definitions (2)
presto-clp/src/main/java/com/yscope/presto/ClpPlanOptimizer.java (1)
presto-spi/src/main/java/com/facebook/presto/spi/VariableAllocator.java (1)
VariableAllocator
(33-167)
presto-clp/src/main/java/com/yscope/presto/ClpModule.java (3)
presto-clp/src/main/java/com/yscope/presto/metadata/ClpMySQLMetadataProvider.java (1)
ClpMySQLMetadataProvider
(30-112)presto-clp/src/main/java/com/yscope/presto/split/ClpMySQLSplitProvider.java (1)
ClpMySQLSplitProvider
(31-108)presto-clp/src/main/java/com/yscope/presto/ClpConfig.java (1)
ClpConfig
(21-189)
⏰ Context from checks skipped due to timeout of 90000ms (20)
- GitHub Check: product-tests-basic-environment (8)
- GitHub Check: kudu (8)
- GitHub Check: spark-integration (17.0.13)
- GitHub Check: spark-integration (8)
- GitHub Check: singlestore-dockerized-tests (8)
- GitHub Check: singlestore-dockerized-tests (17.0.13)
- GitHub Check: hive-tests (17.0.13)
- GitHub Check: hive-tests (8)
- GitHub Check: hive-dockerized-tests (8)
- GitHub Check: hive-dockerized-tests (17.0.13)
- GitHub Check: product-tests-specific-environment2 (17.0.13)
- GitHub Check: product-tests-specific-environment2 (8)
- GitHub Check: product-tests-specific-environment1 (17.0.13)
- GitHub Check: product-tests-specific-environment1 (8)
- GitHub Check: test-other-modules (17.0.13)
- GitHub Check: test-other-modules (8)
- GitHub Check: test (:presto-base-arrow-flight)
- GitHub Check: test (:presto-docs)
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: maven-checks (8)
🔇 Additional comments (18)
presto-clp/src/main/java/com/yscope/presto/ClpExpression.java (3)
31-35
: Good documentation for class variables.The comments for
definition
andremainingExpression
provide clear explanations of what these fields represent, addressing the previous review comment about adding introductory comments for these variables.
20-28
: Well-documented class purpose.The Javadoc clearly explains the purpose of the class and the three different cases it handles when converting Presto RowExpression to KQL, which makes the code more maintainable and understandable.
43-47
: Good use of convenience constructors.The additional constructors provide convenient ways to create ClpExpression objects for each of the three cases described in the class documentation, making the code more readable where this class is used.
Also applies to: 49-53, 55-59
presto-clp/src/main/java/com/yscope/presto/ClpSplitManager.java (2)
16-22
: Missing import for SplitSchedulingContextThe SplitSchedulingContext class is used as a parameter type in the getSplits method (line 41), but there's no import statement for it. This will cause a compilation error.
Add the following import:
import com.facebook.presto.spi.connector.ConnectorSplitManager; import com.facebook.presto.spi.connector.ConnectorTransactionHandle; +import com.facebook.presto.spi.connector.SplitSchedulingContext;
37-45
: Add type safety check before castingThe layout parameter is cast to ClpTableLayoutHandle without verifying its type first. This could lead to ClassCastException if an incorrect handle type is passed.
- ClpTableLayoutHandle layoutHandle = (ClpTableLayoutHandle) layout; + if (!(layout instanceof ClpTableLayoutHandle)) { + throw new IllegalArgumentException("Expected ClpTableLayoutHandle but got " + layout.getClass().getName()); + } + ClpTableLayoutHandle layoutHandle = (ClpTableLayoutHandle) layout;presto-clp/src/main/java/com/yscope/presto/ClpModule.java (6)
1-13
: Licence header looks correct
No concerns with the Apache 2.0 licence block.
14-26
: Imports are consistent
The imported classes and packages appear to align with the Presto framework and your use-cases.
27-29
: Class definition is well-structured
TheClpModule
class correctly extendsAbstractConfigurationAwareModule
.
30-36
: Configuration binding is straightforward
The binder usage and config binding withClpConfig
is aligned with Presto’s typical patterns.
38-45
: Conditional binding for metadata provider
This logic cleanly handles the single supported provider type (MySQL) and raises an exception if unsupported. Future expansions could be handled via a switch statement if more providers are introduced.
47-53
: Conditional binding for split provider
Similar to the metadata provider logic, this is well-handled. The explicit exception for unsupported providers ensures clear error reporting.presto-clp/src/main/java/com/yscope/presto/ClpPlanOptimizer.java (2)
89-89
: Be mindful of potential sensitive data in logs.Logging the raw KQL query might present a privacy or security concern if user-supplied values end up in logs. Confirm that these queries do not contain user PII or confidential information.
60-114
: Well-structured rewrite logic.The Rewriter class cleanly handles table scans and updates the plan accordingly. The approach of returning the original node if conditions do not match ensures minimal disruption.
presto-clp/src/main/java/com/yscope/presto/ClpConfig.java (1)
23-27
: Confirm usage ofArchiveSource
andgetInputSource()
.It appears this functionality may not be used within this file. A previous reviewer recommended verifying whether we genuinely require
ArchiveSource
andgetInputSource()
or if they should be consolidated, since each archive may be treated as a split.Also applies to: 166-176
presto-clp/src/main/java/com/yscope/presto/ClpFilterToKqlConverter.java (4)
52-54
: Looks good – well-structured use of RowExpressionVisitor.
The class implementation and interface usage are clear and provide a cohesive structure for converting filter expressions to KQL.
80-83
: Reintroduce null checks on the column handle.
Castingassignments.get(variable)
toClpColumnHandle
without verifying it can lead toNullPointerException
orClassCastException
. A prior review flagged this issue as well, so kindly reintroduce or confirm the null checks to ensure safety.
254-293
: Validate edge cases for the LIKE operator conversion.
The current path translates%
to*
and_
to?
, yet it may not handle multiple wildcards or boundary cases (e.g. leading/trailing%
). Ensure end-to-end test coverage for patterns such as'%abc%'
,'_abc'
, and'abc%'
.
342-368
: Verify number format handling in parseIntValue.
The code attempts to parse integers and silently ignoresNumberFormatException
. Confirm that silently ignoring the exception is desired behaviour, and ensure test coverage for erroneous or out-of-bounds values (e.g. large integers).
StringBuilder queryBuilder = new StringBuilder(); | ||
queryBuilder.append("("); | ||
for (RowExpression argument : node.getArguments().subList(1, node.getArguments().size())) { | ||
if (!(argument instanceof ConstantExpression)) { | ||
return new ClpExpression(node); | ||
} | ||
ConstantExpression literal = (ConstantExpression) argument; | ||
String literalString = getLiteralString(literal); | ||
queryBuilder.append(variableName).append(": "); | ||
if (literal.getType() instanceof VarcharType) { | ||
queryBuilder.append("\"").append(literalString).append("\""); | ||
} | ||
else { | ||
queryBuilder.append(literalString); | ||
} | ||
queryBuilder.append(" OR "); | ||
} | ||
// Remove the last " OR " from the query | ||
return new ClpExpression(queryBuilder.substring(0, queryBuilder.length() - 4) + ")"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure secure handling of user input when constructing KQL queries.
Using raw string concatenation for "variableName": "literalString"
can expose the system to injection or parsing issues if the literal contains KQL metacharacters. Consider adding proper escaping or validation to avoid malicious or malformed inputs.
presto-clp/src/test/java/com/yscope/presto/TestClpMetadata.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/yscope/presto/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/yscope/presto/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/yscope/presto/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/yscope/presto/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/yscope/presto/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
presto-clp/src/main/java/com/yscope/presto/metadata/ClpSchemaTree.java (3)
37-48
: Add a constructor forClpNode
to enforce clearer node usage.
It might be beneficial to provide a constructor (for example, acceptingtype
andoriginalName
) and explicitly prevent children from being populated in leaf nodes. This can help avert confusion regarding leaf vs. non-leaf usage.
58-77
: Consider supporting more diverse array element types.
Currently,UnstructuredArray
is mapped tonew ArrayType(VarcharType.VARCHAR)
. If there is a likelihood of arrays containing numeric or boolean elements, you could introduce logic to differentiate the inner types.
105-108
: Adhere to the code style preference for boolean expressions.
Per the coding guidelines, preferfalse == existing.type.equals(prestoType)
rather than using!existing.type.equals(prestoType)
.- if (existing.type != null && !existing.type.equals(prestoType)) { + if (existing.type != null && false == existing.type.equals(prestoType)) {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
presto-clp/src/main/java/com/yscope/presto/metadata/ClpSchemaTree.java
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
: - Preferfalse == <expression>
rather than!<expression>
.
presto-clp/src/main/java/com/yscope/presto/metadata/ClpSchemaTree.java
⏰ Context from checks skipped due to timeout of 90000ms (12)
- GitHub Check: product-tests-basic-environment (8)
- GitHub Check: product-tests-specific-environment2 (17.0.13)
- GitHub Check: singlestore-dockerized-tests (8)
- GitHub Check: singlestore-dockerized-tests (17.0.13)
- GitHub Check: hive-dockerized-tests (17.0.13)
- GitHub Check: spark-integration (8)
- GitHub Check: kudu (17.0.13)
- GitHub Check: kudu (8)
- GitHub Check: test (:presto-docs)
- GitHub Check: test (:presto-base-arrow-flight)
- GitHub Check: maven-checks (8)
- GitHub Check: maven-checks (17.0.13)
🔇 Additional comments (1)
presto-clp/src/main/java/com/yscope/presto/metadata/ClpSchemaTree.java (1)
159-169
: Add a brief method-level comment explaining the DFS approach.
This method implements a depth-first traversal to constructRowType
from nested child nodes. Documenting this will improve clarity.
presto-clp/src/main/java/com/yscope/presto/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/yscope/presto/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/yscope/presto/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/yscope/presto/ClpFilterToKqlConverter.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/yscope/presto/split/ClpMySQLSplitProvider.java
Outdated
Show resolved
Hide resolved
presto-clp/src/main/java/com/yscope/presto/ClpColumnHandle.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (37)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitProvider.java (1)
21-24
: Consider adding Javadoc documentation.The interface and its method lack Javadoc documentation. Adding documentation would improve clarity about the purpose of this interface and the expected behavior of the
listSplits
method, especially regarding any potential side effects or exceptions that implementations might throw.public interface ClpSplitProvider { + /** + * Lists the splits for a given table layout handle. + * + * @param clpTableLayoutHandle the table layout handle + * @return a list of splits + */ List<ClpSplit> listSplits(ClpTableLayoutHandle clpTableLayoutHandle); }presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (2)
43-47
: Fix spacing in constructor delegationThere's an extra space between "this" and the opening parenthesis on line 46.
- this (Optional.empty(), Optional.empty()); + this(Optional.empty(), Optional.empty());
61-69
: Consider adding additional utility methodsThe getter methods are appropriately implemented. Consider adding helper methods to improve usability:
+ /** + * Returns true if the entire expression was converted to KQL. + */ + public boolean isFullyTranslatable() + { + return definition.isPresent() && !remainingExpression.isPresent(); + } + + /** + * Returns true if part of the expression was converted to KQL. + */ + public boolean isPartiallyTranslatable() + { + return definition.isPresent() && remainingExpression.isPresent(); + } + + /** + * Returns true if none of the expression could be converted to KQL. + */ + public boolean isNonTranslatable() + { + return !definition.isPresent() && remainingExpression.isPresent(); + } + + @Override + public String toString() + { + return "ClpExpression{" + + "definition=" + definition + + ", remainingExpression=" + remainingExpression + + '}'; + } + + @Override + public boolean equals(Object o) + { + if (this == o) { + return true; + } + if (o == null || getClass() != o.getClass()) { + return false; + } + ClpExpression that = (ClpExpression) o; + return Objects.equals(definition, that.definition) && + Objects.equals(remainingExpression, that.remainingExpression); + } + + @Override + public int hashCode() + { + return Objects.hash(definition, remainingExpression); + }Don't forget to add the import for java.util.Objects if you implement these methods.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java (3)
26-28
: Enhance with class documentationConsider adding Javadoc to the class describing its purpose and responsibilities within the CLP connector ecosystem. This would help future maintainers understand how this class fits into the overall architecture.
+/** + * Provides plan optimizers for the CLP connector. + * This class is responsible for creating and providing logical and physical + * plan optimizers that are specific to CLP query execution. + */ public class ClpPlanOptimizerProvider implements ConnectorPlanOptimizerProvider {
40-44
: Add method documentationConsider adding Javadoc to document that this method intentionally returns an empty set of logical plan optimizers.
+/** + * Returns an empty set since no logical plan optimizations are needed for the CLP connector. + */ @Override public Set<ConnectorPlanOptimizer> getLogicalPlanOptimizers() { return ImmutableSet.of(); }
46-50
: Add method documentationConsider adding Javadoc to describe the purpose of the physical plan optimizers being provided.
+/** + * Returns a set of physical plan optimizers for the CLP connector. + * Currently provides a ClpPlanOptimizer that handles operations + * such as filter pushdown or conversion to KQL expressions. + */ @Override public Set<ConnectorPlanOptimizer> getPhysicalPlanOptimizers() { return ImmutableSet.of(new ClpPlanOptimizer(functionManager, functionResolution)); }presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (5)
41-42
: Potential null return value not properly documented in getter methodThe
archivePath
parameter is marked as@Nullable
in the constructor, but the corresponding getter methodgetArchivePath()
is missing the@Nullable
annotation. This inconsistency could lead to NullPointerExceptions.@JsonProperty +@Nullable public String getArchivePath() { return archivePath; }
39-47
: Add validation to constructor parametersThe constructor accepts nullable parameters but doesn't perform any validation. Consider adding validation checks for the parameters to ensure they meet your requirements, especially if certain formats are expected for
archivePath
.@JsonCreator public ClpSplit(@JsonProperty("schemaTableName") @Nullable SchemaTableName schemaTableName, @JsonProperty("archivePath") @Nullable String archivePath, @JsonProperty("query") Optional<String> query) { + // Optional validation logic here + // Example: if (archivePath != null && !isValidArchivePath(archivePath)) { + // throw new IllegalArgumentException("Invalid archive path format"); + // } this.schemaTableName = schemaTableName; this.archivePath = archivePath; this.query = query; }
32-85
: Add equals, hashCode, and toString methods for better object handlingThe class lacks
equals()
,hashCode()
, andtoString()
methods, which are important for properly handling instances in collections and for debugging purposes.Consider adding these methods to improve usability:
@Override public boolean equals(Object o) { if (this == o) { return true; } if (o == null || getClass() != o.getClass()) { return false; } ClpSplit that = (ClpSplit) o; return Objects.equals(schemaTableName, that.schemaTableName) && Objects.equals(archivePath, that.archivePath) && Objects.equals(query, that.query); } @Override public int hashCode() { return Objects.hash(schemaTableName, archivePath, query); } @Override public String toString() { return "ClpSplit{" + "schemaTableName=" + schemaTableName + ", archivePath='" + archivePath + '\'' + ", query=" + query + '}'; }Remember to add
import java.util.Objects;
to your imports.
32-34
: Add Javadoc comments for better code documentationThe class lacks Javadoc comments that would help users understand its purpose and usage. Consider adding class-level and method-level documentation.
+/** + * Represents a split in the CLP connector. + * A split corresponds to an archive path that can be processed by a Presto worker. + */ public class ClpSplit implements ConnectorSplit {
80-84
: Consider adding type safety to getInfo() return valueWhile returning
this
fromgetInfo()
is technically correct, it might be clearer to explicitly document that this method returns theClpSplit
instance itself by adding a comment explaining this behavior.@Override public Object getInfo() { + // Return this instance to provide information about the split return this; }
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplitManager.java (2)
43-43
: Add type safety check before casting.Direct casting without type checking can lead to ClassCastException at runtime. Consider adding an instanceof check before casting.
- ClpTableLayoutHandle layoutHandle = (ClpTableLayoutHandle) layout; + if (!(layout instanceof ClpTableLayoutHandle)) { + throw new IllegalArgumentException("Expected ClpTableLayoutHandle but got " + layout.getClass().getName()); + } + ClpTableLayoutHandle layoutHandle = (ClpTableLayoutHandle) layout;
44-44
: Consider adding exception handling.The
listSplits
method might throw exceptions that aren't being caught or handled here. This could potentially cause runtime failures without proper error context.- return new FixedSplitSource(clpSplitProvider.listSplits(layoutHandle)); + try { + return new FixedSplitSource(clpSplitProvider.listSplits(layoutHandle)); + } + catch (Exception e) { + throw new RuntimeException("Failed to get splits for " + layoutHandle, e); + }pom.xml (1)
215-215
: Module Addition: New presto-clp Module Registered
The new<module>presto-clp</module>
is added as expected in the<modules>
list. This aligns with the PR objectives to introduce the CLP connector module. Please verify that its ordering is consistent with project conventions and that any associated build or integration configurations (such as plugin registration) are updated accordingly.presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1)
21-26
: Interface methods are implicitly publicThe
public
modifier on interface methods is redundant since methods declared in an interface are implicitly public. Consider removing thepublic
modifier for both methods to follow Java best practices.public interface ClpMetadataProvider { - public List<ClpColumnHandle> listColumnHandles(SchemaTableName schemaTableName); + List<ClpColumnHandle> listColumnHandles(SchemaTableName schemaTableName); - public List<String> listTableNames(String schema); + List<String> listTableNames(String schema); }presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java (1)
48-57
: Consider using Objects.equals for null safetyIn the
equals
method, you're directly callingequals
onschemaTableName
. While the class check protects againstnull
forobj
, it's generally safer to useObjects.equals()
for comparing object fields to handle potential null values.@Override public boolean equals(Object obj) { if (this == obj) { return true; } if (obj == null || getClass() != obj.getClass()) { return false; } ClpTableHandle other = (ClpTableHandle) obj; - return this.schemaTableName.equals(other.schemaTableName); + return Objects.equals(this.schemaTableName, other.schemaTableName); }presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (1)
94-94
: Adhere to coding guideline for negation.Your code uses
!expression
in these sections, but the coding guidelines for this repository specify usingfalse == expression
instead of!expression
. Please consider updating for consistency.Also applies to: 145-145, 190-190, 244-244, 262-262, 276-276
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (2)
35-45
: Consider validating constructor arguments for null values.It's a good practice to enforce non-null arguments for critical fields like columnName and columnType to help prevent unexpected runtime errors.
Below is a possible change to add argument checks:
public ClpColumnHandle( @JsonProperty("columnName") String columnName, @JsonProperty("originalColumnName") String originalColumnName, @JsonProperty("columnType") Type columnType, @JsonProperty("nullable") boolean nullable) { - this.columnName = columnName; - this.originalColumnName = originalColumnName; - this.columnType = columnType; + this.columnName = java.util.Objects.requireNonNull(columnName, "columnName is null"); + this.originalColumnName = java.util.Objects.requireNonNull(originalColumnName, "originalColumnName is null"); + this.columnType = java.util.Objects.requireNonNull(columnType, "columnType is null"); this.nullable = nullable; }
47-50
: Validate constructor arguments in the convenience constructor as well.For consistency with the primary constructor, consider adding non-null checks.
public ClpColumnHandle(String columnName, Type columnType, boolean nullable) { - this(columnName, columnName, columnType, nullable); + this( + java.util.Objects.requireNonNull(columnName, "columnName is null"), + columnName, + java.util.Objects.requireNonNull(columnType, "columnType is null"), + nullable + ); }presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (4)
73-73
: Use 'false == expression' instead of '!expression' per coding guidelines.Replace
!(node.getSource() instanceof TableScanNode)
withfalse == (node.getSource() instanceof TableScanNode)
.-if (!(node.getSource() instanceof TableScanNode)) { +if (false == (node.getSource() instanceof TableScanNode)) { return node; }
86-86
: Use 'false == expression' instead of '!expression' per coding guidelines.Replace
if (!kqlQuery.isPresent()) { ... }
withif (false == kqlQuery.isPresent()) { ... }
.-if (!kqlQuery.isPresent()) { +if (false == kqlQuery.isPresent()) { return node; }
105-105
: Use 'false == expression' instead of '!expression' per coding guidelines.Replace
if (!remainingPredicate.isPresent()) { ... }
withif (false == remainingPredicate.isPresent()) { ... }
.-if (!remainingPredicate.isPresent()) { +if (false == remainingPredicate.isPresent()) { return newTableScanNode; }
89-89
: Potentially sanitize logged content.Printing the entire KQL query to the logs may reveal sensitive application data. Consider masking or sanitizing the query if it might contain private user information.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)
38-54
: Conditional provider binding is robust.
Handling unsupported provider types by throwing aPrestoException
is appropriate here. However, if you foresee adding more providers in the future, consider using a pluggable registry-like approach to avoid frequent modifications in this module.presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (2)
57-74
: Database setup is well-organized.
Creating H2 tables in the setup routine is a good approach for test isolation. Consider using an in-memory database for faster tests, unless file-based persistence is required for your test scenario.
126-202
: Integration tests for table metadata are comprehensive.
You’re validating the returned columns, types, and nullability. Consider adding a negative test case that verifies behaviour when the table does not exist or the metadata is partially missing.presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1)
126-139
: Split listing test coverage.
Validating the number of splits and ensuring archive paths align with expectations is excellent. Consider an additional test verifying the scenario where no archives exist for a table.presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java (1)
171-174
: Consider extending tests for deeper edge cases
Nested negations (line 171) and more intricate filtering expressions are well covered, but you might consider additional scenarios such as handling empty strings, overly long substrings, or special characters in string expressions for added robustness.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1)
63-66
: Clarify read-only transaction handling
All transactions default to the same handle (ClpTransactionHandle.INSTANCE
), ignoring thereadOnly
flag. If different behaviour is eventually needed for read-write vs. read-only, consider adding appropriate logic.presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1)
59-60
: Apply style guideline for negation
Per the coding guidelines, prefer usingfalse == expression
rather than!expression
. You can rewrite:-if (dbName != null && !dbName.isEmpty()) { +if (dbName != null && false == dbName.isEmpty()) {presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (2)
42-52
: Consider updating to the current MySQL driver class name.Using
Class.forName("com.mysql.jdbc.Driver")
may lead to warnings in modern MySQL driver versions. It is often recommended to use"com.mysql.cj.jdbc.Driver"
instead for newer MySQL distributions.
69-86
: Minor concern about the error log statement.At line 83, concatenating log messages (
"Failed to load table schema for %s: %s" + schemaTableName.getTableName()
) could lead to confusing output. Consider using a standard logger format with placeholders instead, for example:log.error("Failed to load table schema for {}: {}", schemaTableName.getTableName(), e);
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java (1)
48-69
: Initial bootstrap logic looks correct.Dependency injection with the
Bootstrap
approach should integrate well. Make sure to log or track errors for easier debugging if the plugin fails to initialize.presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (1)
79-133
: Align with the code style guideline by avoiding negation with “!”.At line 108, consider using
false == existing.type.equals(prestoType)
rather than!existing.type.equals(prestoType)
, in keeping with the stated preferences for Java code. This is only a minor style point and not functionally problematic.Otherwise, polymorphic conflict resolution logic is well-designed.
Suffixing with the type display name resolves naming collisions neatly, especially when polymorphic types are enabled.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (2)
99-101
: Adhere to coding guidelines for negation checks.
In Java code, prefer usingfalse == <expression>
instead of!<expression>
. Consider the following changes for clarity:- if (!listSchemaNames(session).contains(schemaNameValue)) { + if (false == listSchemaNames(session).contains(schemaNameValue)) {- if (!listSchemaNames(session).contains(schemaName)) { + if (false == listSchemaNames(session).contains(schemaName)) {- if (!listTables(schemaName).contains(tableName.getTableName())) { + if (false == listTables(schemaName).contains(tableName.getTableName())) {Also applies to: 112-114, 116-118
158-160
: Apply negation style rule for schema comparison.
The condition uses!schemaName.equals(...)
. As per the Java coding guidelines, prefer:- if (schemaName != null && !schemaName.equals(DEFAULT_SCHEMA_NAME)) { + if (schemaName != null && false == schemaName.equals(DEFAULT_SCHEMA_NAME)) {presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
133-135
: Negation style improvement for metadata table prefix check.
Usefalse == SAFE_SQL_IDENTIFIER.matcher(...)
over!SAFE_SQL_IDENTIFIER.matcher(...)
for consistency:- if (metadataTablePrefix == null || !SAFE_SQL_IDENTIFIER.matcher(metadataTablePrefix).matches()) { + if (metadataTablePrefix == null || false == SAFE_SQL_IDENTIFIER.matcher(metadataTablePrefix).matches()) {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (31)
pom.xml
(2 hunks)presto-clp/pom.xml
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpHandleResolver.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlugin.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplitManager.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTransactionHandle.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpNodeType.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitProvider.java
(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java
(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java
(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java
(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
(1 hunks)presto-server/src/main/provisio/presto.xml
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTransactionHandle.java
🚧 Files skipped from review as they are similar to previous changes (2)
- presto-server/src/main/provisio/presto.xml
- presto-clp/pom.xml
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
: - Preferfalse == <expression>
rather than!<expression>
.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlugin.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitProvider.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplitManager.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpHandleResolver.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpNodeType.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java
🧬 Code Definitions (8)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitProvider.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1)
ClpSplit
(32-85)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1)
ClpTableLayoutHandle
(23-73)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
ClpColumnHandle
(26-114)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (1)
ClpMySqlMetadataProvider
(30-112)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1)
ClpMySqlSplitProvider
(31-108)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
ClpConfig
(21-189)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
ClpConfig
(21-189)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1)
ClpSplit
(32-85)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1)
ClpTableLayoutHandle
(23-73)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java (2)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (1)
Test
(49-203)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1)
Test
(36-140)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
ClpColumnHandle
(26-114)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
ClpConfig
(21-189)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
ClpColumnHandle
(26-114)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
ClpColumnHandle
(26-114)
⏰ Context from checks skipped due to timeout of 90000ms (20)
- GitHub Check: spark-integration (8)
- GitHub Check: product-tests-basic-environment (17.0.13)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (8, :presto-iceberg)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
- GitHub Check: product-tests-specific-environment1 (8)
- GitHub Check: test (17.0.13, :presto-redis -P test-redis-integration-smoke-test)
- GitHub Check: product-tests-specific-environment2 (8)
- GitHub Check: singlestore-dockerized-tests (8)
- GitHub Check: singlestore-dockerized-tests (17.0.13)
- GitHub Check: kudu (8)
- GitHub Check: kudu (17.0.13)
- GitHub Check: hive-tests (17.0.13)
- GitHub Check: hive-dockerized-tests (8)
- GitHub Check: hive-dockerized-tests (17.0.13)
- GitHub Check: hive-tests (8)
- GitHub Check: test (:presto-docs)
- GitHub Check: test (:presto-base-arrow-flight)
- GitHub Check: maven-checks (8)
- GitHub Check: maven-checks (17.0.13)
🔇 Additional comments (50)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitProvider.java (1)
21-24
: Well-defined interface with clear responsibility.The
ClpSplitProvider
interface has a clear and focused responsibility, following good design principles. The method signature is appropriate for its purpose of retrieving splits based on a table layout handle.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (4)
1-28
: License and documentation look goodThe class has a clear and detailed Javadoc that explains the purpose and the three possible scenarios for the conversion results. This is excellent documentation that will help other developers understand how to use this class.
29-35
: Clear class definition with well-explained fieldsThe class structure is clean with immutable fields, which is good for thread safety. The comments for each field clearly explain their purpose.
37-41
: Primary constructor looks goodThis constructor properly initializes both fields and follows an immutable design pattern.
49-59
: Specialized constructors are well implementedThe specialized constructors for fully translatable and non-translatable expressions make the API more convenient to use. They delegate to the primary constructor correctly.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java (1)
29-38
: Implementation looks goodThe dependency injection pattern is correctly implemented with final fields and the @Inject annotation. The constructor properly initializes all fields.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1)
68-72
: LGTM: Node selection strategy implementationThe implementation properly returns NO_PREFERENCE for node selection strategy, which is appropriate for connectors that don't have specific node placement requirements.
pom.xml (1)
765-770
: Dependency Addition: presto-clp Dependency Introduced
This new dependency block forpresto-clp
is correctly added under<dependencyManagement>
, with its version set as${project.version}
. Ensure that this internal dependency is properly integrated with the module build and that documentation (if any) reflects its addition.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java (1)
22-43
: Error handling implementation looks goodThe
ClpErrorCode
enum properly implements theErrorCodeSupplier
interface with well-defined error codes and follows standard Presto error handling patterns. The error code offset of0x0400_0000
helps avoid conflicts with other error codes in the system.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpHandleResolver.java (1)
23-55
: Handle resolver implementation is appropriateThe
ClpHandleResolver
correctly implements all required methods from theConnectorHandleResolver
interface, returning the appropriate CLP-specific handle classes. This class is essential for Presto to correctly work with the CLP connector's custom types.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1)
50-60
: Potential oversight inequals
andhashCode
methods.The
equals
andhashCode
methods only compare thetable
field and exclude thequery
field. Ifquery
is expected to participate in equality checks, consider updating these methods to include it.Could you confirm if this is an intentional design choice?
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpNodeType.java (1)
16-66
: All good here!This enumeration is straightforward, and the lookup table logic for byte-based indexing is well implemented.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (1)
169-169
: Confirm potential KQL injection risk.Literal strings are appended directly to KQL expressions. If any user-controlled input can reach these code paths, confirm whether escaping or sanitization is required to prevent a possible injection vulnerability.
Also applies to: 292-292, 366-366, 403-403
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlugin.java (1)
20-28
: Straightforward plugin implementation looks good.No issues spotted. Implementation is concise and aligns with Presto plugin guidelines.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (4)
1-13
: Licence header is correct and follows standard practice.
No changes required.
14-26
: Imports and package statement are well-organized.
No issues found with naming or usage.
27-29
: Class declaration clarity.
The extension ofAbstractConfigurationAwareModule
aligns well with Presto’s connector module design.
30-36
: Configuration binding is implemented correctly.
UsingconfigBinder(binder).bindConfig(ClpConfig.class)
ensures that properties are injected from the server configuration seamlessly.presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (3)
1-13
: Licence header is in compliance.
No changes needed.
14-33
: Imports and test annotations.
Imports appear correct, and usage of TestNG’s@BeforeMethod
and@AfterMethod
is appropriate for test lifecycle management.
78-105
: Batch insertion logic.
Batch inserts reduce overhead and improve test setup performance. This is efficient.presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (4)
1-13
: Licence header verified.
No changes required.
14-35
: Imports and test structure.
Your usage of TestNG annotations is consistent, and the test structure is clean.
47-62
: Configuration creation in setup.
Using a dedicated configuration with test-specific settings for the database is good practice.
66-106
: Table creation and batch insertion.
Same as withTestClpMetadata
, this design is effective for a consistent test environment.presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java (1)
25-28
: Comprehensive coverage of pushdown tests
These test cases thoroughly validate various scenarios (substring matching, LIKE patterns, numeric comparisons, AND/OR/NOT logic, etc.), providing strong confidence in pushdown correctness. Good job!presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1)
43-54
: Constructor injection and null checks are well implemented
UsingrequireNonNull
in the constructor ensures dependencies are valid, which helps avoid runtime null pointer issues.presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1)
102-104
: Re-examine partial or empty results handling
When an SQLException occurs, the method logs the error but still returns any splits collected so far. If partial results are not desired, consider rethrowing an exception or returning an empty list for clarity.presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (3)
54-62
: Connection usage looks good.The connection management and database selection logic are straightforward. Ensure any external usage of
ClpConfig
is well-documented so users configure valid credentials.
64-67
: Identifier validation is concise.This method cleanly enforces the safe identifier check, minimizing the risk of SQL injection.
88-111
: Overall logic is sound.The approach to filtering invalid table names aligns well with the safe identifier pattern. No major concerns here.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java (2)
36-40
: Naming convention is clear.Returning
"clp"
for the connector name is consistent and self-explanatory.
42-46
: Handle resolver implementation is straightforward.Delegating to
ClpHandleResolver
is correct.presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (5)
37-48
: Internal node structure is well-defined.Defining a nested
ClpNode
class for the schema tree clarifies the approach to schema nesting.
50-56
: Constructor logic is minimal and effective.Instantiating the root node and setting the polymorphic type flag is clear.
58-77
: Mapping of custom type codes to Presto types is robust.The switch statement covers expected cases. The fallback to an exception on unsupported types is good, ensuring unknown cases are captured.
135-157
: Column collection for leaves and nested rows is sensible.Building
RowType
for children ensures proper representation of nested structures.
159-169
: Recursive row type construction is well-structured.The method correctly handles nested nodes, generating fields with optional names. This aligns well with Presto’s row type model.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (6)
1-13
: Licence header and package declaration look good.
No issues found with the Apache license header.
14-55
: Static fields and initial declarations are well-defined.
The usage of nestedRowType
fields forcity
is a good approach for testing complex types.
57-96
: SessionHolder encapsulation is clear.
TheSessionHolder
class appropriately bundles both theConnectorSession
andSession
, facilitating simpler testing contexts.
98-101
: SQL expression parsing is correct.
Theexpression(String sql)
method properly rewrites identifiers to symbol references usingExpressionUtils
.
103-114
: RowExpression translation is properly implemented.
The usage ofgetExpressionTypes
andSqlToRowExpressionTranslator
aligns with Presto’s recommended practices.
116-119
: Helper method getRowExpression is concise and coherent.
No issues found.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (4)
1-13
: License block validated.
No concerns found with the licensing header.
140-150
: Table metadata retrieval logic is consistent.
Building aConnectorTableMetadata
instance from the cached column handles is appropriate.
176-184
: Column handle mapping is straightforward.
No further issues detected.
186-193
: Column metadata retrieval is correct.
MappingClpColumnHandle
toColumnMetadata
is appropriate.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (2)
1-13
: License block validated.
No concerns found here.
19-190
: General configuration approach is solid.
Configuration setters and validation logic appear correct and adhere to Presto’s best practices.
public ConnectorSplitSource getSplits(ConnectorTransactionHandle transactionHandle, | ||
ConnectorSession session, | ||
ConnectorTableLayoutHandle layout, | ||
SplitSchedulingContext splitSchedulingContext) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Add missing import for SplitSchedulingContext.
The SplitSchedulingContext
parameter is used in the method signature, but there's no import statement for this class.
🏁 Script executed:
#!/bin/bash
# Verify if SplitSchedulingContext is a top-level class that needs to be imported
rg -g "**/*.java" "class SplitSchedulingContext" --type java
Length of output: 181
Action Required: Add missing import for SplitSchedulingContext
Based on our search, the class SplitSchedulingContext
is defined as an inner class in the file
presto-spi/src/main/java/com/facebook/presto/spi/connector/ConnectorSplitManager.java
.
To properly reference this class in the ClpSplitManager.java
file, please insert the following import:
import com.facebook.presto.spi.connector.ConnectorSplitManager.SplitSchedulingContext;
This change will resolve the missing import issue noted in the method signature.
@Override | ||
public boolean equals(Object obj) | ||
{ | ||
if (this == obj) { | ||
return true; | ||
} | ||
if (obj == null || getClass() != obj.getClass()) { | ||
return false; | ||
} | ||
ClpColumnHandle other = (ClpColumnHandle) obj; | ||
return Objects.equals(this.columnName, other.columnName) && | ||
Objects.equals(this.columnType, other.columnType); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Revisit equality checks to include all relevant fields.
Only columnName and columnType are considered in equality, but originalColumnName and nullable are omitted. This can lead to inconsistent behaviour.
return Objects.equals(this.columnName, other.columnName)
- && Objects.equals(this.columnType, other.columnType);
+ && Objects.equals(this.originalColumnName, other.originalColumnName)
+ && Objects.equals(this.columnType, other.columnType)
+ && this.nullable == other.nullable;
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
@Override | |
public boolean equals(Object obj) | |
{ | |
if (this == obj) { | |
return true; | |
} | |
if (obj == null || getClass() != obj.getClass()) { | |
return false; | |
} | |
ClpColumnHandle other = (ClpColumnHandle) obj; | |
return Objects.equals(this.columnName, other.columnName) && | |
Objects.equals(this.columnType, other.columnType); | |
} | |
@Override | |
public boolean equals(Object obj) | |
{ | |
if (this == obj) { | |
return true; | |
} | |
if (obj == null || getClass() != obj.getClass()) { | |
return false; | |
} | |
ClpColumnHandle other = (ClpColumnHandle) obj; | |
return Objects.equals(this.columnName, other.columnName) | |
&& Objects.equals(this.originalColumnName, other.originalColumnName) | |
&& Objects.equals(this.columnType, other.columnType) | |
&& this.nullable == other.nullable; | |
} |
String tablePathQuery = String.format(QUERY_SELECT_TABLE_METADATA, config.getMetadataTablePrefix(), tableName); | ||
String archivePathQuery = String.format(QUERY_SELECT_ARCHIVE_IDS, config.getMetadataTablePrefix(), tableName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Mitigate potential SQL injection
Concatenating tableName
directly into the SQL string can pose an injection risk if tableName
is not guaranteed to be sanitized. Consider validating or restricting tableName
to mitigate this risk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (3)
178-179
: Update Negation According to Coding Guidelines
The coding guidelines for Java code require usingfalse == condition
instead of!condition
. The relevant lines use!<expression>
. Consider applying this change to maintain style consistency:- if (!expression.getRemainingExpression().isPresent() || !expression.getDefinition().isPresent()) { + if (false == expression.getRemainingExpression().isPresent() || false == expression.getDefinition().isPresent()) { - if (!expression.getRemainingExpression().isPresent() || !expression.getDefinition().isPresent()) { + if (false == expression.getRemainingExpression().isPresent() || false == expression.getDefinition().isPresent()) { ... // Apply similarly to all occurrences of !<expression> in lines shown.Also applies to: 244-245, 303-304, 362-363, 383-384, 450-451, 584-585, 658-659
592-628
: Consolidate Comparison Logic
The logic inbuildClpExpression
for building query strings is clear. However, you may benefit from extracting repeated string formatting into helper methods, which can improve readability and maintainability when new comparison types are introduced.
630-687
: Efficient Substring Handling
ThetryInterpretSubstringEquality
andhandleLogicalBinary
methods handle substring equality thoroughly. For large queries, you might need to validate performance. An alternative approach could cache repeated substring computations or pre-check if start/length directly match short-circuit conditions.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java
(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitProvider.java
(1 hunks)presto-server/src/main/provisio/presto.xml
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- presto-server/src/main/provisio/presto.xml
- presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitProvider.java
- presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
: - Preferfalse == <expression>
rather than!<expression>
.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
⏰ Context from checks skipped due to timeout of 90000ms (20)
- GitHub Check: product-tests-basic-environment (17.0.13)
- GitHub Check: product-tests-basic-environment (8)
- GitHub Check: kudu (17.0.13)
- GitHub Check: kudu (8)
- GitHub Check: spark-integration (8)
- GitHub Check: hive-tests (17.0.13)
- GitHub Check: test (8, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (17.0.13, :presto-hive -P test-hive-materialized-aggregations)
- GitHub Check: test (17.0.13, :presto-cassandra -P test-cassandra-integration-smoke-test)
- GitHub Check: product-tests-specific-environment1 (17.0.13)
- GitHub Check: product-tests-specific-environment1 (8)
- GitHub Check: product-tests-specific-environment2 (17.0.13)
- GitHub Check: product-tests-specific-environment2 (8)
- GitHub Check: test-other-modules (8)
- GitHub Check: test-other-modules (17.0.13)
- GitHub Check: test (:presto-docs)
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: maven-checks (8)
- GitHub Check: test (:presto-base-arrow-flight)
🔇 Additional comments (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (3)
1-13
: License Header Is Properly Included
No issues observed with the license header.
52-67
: Well-Structured Class Documentation
The class-level JavaDoc is clear and thoroughly explains how expressions are converted to KQL.
262-286
: Potential Unescaped String Injection
Literal strings (e.g. user inputs) are concatenated into KQL queries without apparent sanitization or escaping. Malicious characters might break queries or produce unintended behavior. Consider adding an escaping mechanism or verifying KQL handles all inputs securely.Please run a verification script or add automated tests to ensure that special or control characters inside these literal strings cannot cause KQL injection issues.
Also applies to: 368-414
Description
This PR introduces a CLP connector. The background and proposed implementation details are outlined in the associated RFC.
This PR implements one part of phase 1 of the proposed implementation, namely the Java implementation for the coordinator. The worker implementation will leverage Velox as the default query engine, so once the Velox PR is merged, we will submit another PR to this repo to add the necessary changes to
presto-native-execution
.Like other connectors, we have created a
presto-clp
module and implemented all required connector interfaces as well as a few extras to support query push downs.The important classes in the connector are described below.
Core Classes in Java
ClpConfig
The configuration class for CLP. Currently, we support the following properties:
clp.archive-source
: Specifies the source of the CLP archive. Supported values includelocal
(local storage) ands3
(Amazon S3).clp.metadata-expire-interval
: Defines the time interval after which metadata entries are considered expired and removed from the cache.clp.metadata-refresh-interval
: Specifies how frequently metadata should be refreshed from the source to ensure up-to-date information.clp.polymorphic-type-enabled
: Enables or disables support for polymorphic types within CLP. This determines whether dynamic type resolution is allowed.clp.metadata-provider-type
: Defines the type of the metadata provider. It could be a database, a file-based store, or another external system. By default, we use MySQL.clp.metadata-db-url
: The connection URL for the metadata database, used whenclp.metadata-provider-type
is configured to use a database.clp.metadata-db-name
: The name of the metadata database.clp.metadata-db-user
: The database user with access to the metadata database.clp.metadata-db-password
: The password for the metadata database user.clp.metadata-table-prefix
: A prefix applied to table names in the metadata database.clp.split-provider-type
: Defines the type of split provider for query execution. By default, we use MySQL, and the connection parameters are the same as those for the metadata database.ClpSchemaTree
A helper class for constructing a nested schema representation from CLP’s column definitions. It supports hierarchical column names (e.g.,
a.b.c
), handles name/type conflicts when theclp.polymorphic-type-enabled
option is enabled, and maps serialized CLP types to Presto types. The schema tree produces a flat list ofClpColumnHandle
instances, includingRowType
for nested structures, making it suitable for dynamic or semi-structured data formats.When polymorphic types are enabled, conflicting fields are given unique names by appending a type-specific suffix to the column name. For instance, if an integer field named "a" and a
Varstring
(CLP type) field named "a" coexist in CLP’s schema tree, they are represented asa_bigint
anda_varchar
in Presto. This approach ensures that such fields remain queryable while adhering to Presto’s constraints.ClpMetadataProvider
An interface responsible for retrieving metadata from a specified source.
We provide a default implementation called
ClpMySqlMetadataProvider
, which uses two MySQL tables. One of these is the table metadata table, defined with the schema shown below. Currently, we support only a single Presto schema nameddefault
, and this metadata table stores all table names and paths associated with that Presto schema.table_name
VARCHAR(512)
PRIMARY KEY
table_path
VARCHAR(1024)
NOT NULL
The second MySQL table contains column metadata, defined by the schema shown below. Each Presto table is associated with a corresponding MySQL table that stores metadata about its columns.
name
VARCHAR(512)
NOT NULL
type
TINYINT
NOT NULL
name
,type
)ClpSplitProvider
In CLP, an archive is the fundamental unit for searching, and we treat each archive as a Presto Split. This allows independent parallel searches across archives. The
ClpSplitProvider
interface, shown below, defines how to retrieve split information from a specified source:We provide a default implementation called
ClpMySqlSplitProvider
. It also uses the metadata table mentioned above as well as another set of tables to store archive IDs associated with each tableid
BIGINT
AUTO_INCREMENT PRIMARY KRY
archive_id
VARCHAR(128)
NOT NULL
By combining these two sets of tables, we can retrieve all archive paths for a table by concatenating the table’s path with the archive ID (e.g.,
table_path + "/" + archive_id
).ClpMetadata
This interface enables Presto to access various metadata. All requests are delegated to
ClpMetadataProvider
For metadata management, it also maintains two caches and periodically refreshes the metadata.
columnHandleCache
: ALoadingCache<SchemaTableName, Set<ClpColumnHandle>>
that maps aSchemaTableName
to its corresponding set ofClpColumnHandle
objects.tableNameCache
: ALoadingCache<String, Set<String>>
that maps a schema name (String
) to a set of table names (Set<String>
).ClpConnectorPlanOptimizer
andClpFilterToKqlConverter
Presto exposes
PlanNode
to the connector, allowing the connector to push down relevant filters to CLP for improved query performance.There are three main steps
ClpTableLayoutHandle
and constructing a newTableScanNode
FilterNode
with any remaining predicates and the newTableScanNode
ClpFilterToKqlConverter
implementsRowExpressionVisitor<ClpExpression, Void>
and handles expression transformation and pushdown. Since KQL is not SQL-compatible, only certain types of filters can be converted, including:LIKE
predicate (the"^%[^%_]*%$"
pattern is not pushed down)AND
,OR
,NOT
)IS NULL
substr(a, start, length) = 'abc'
andsubstr(a, start) = 'abc'
formsFor comparison or match expressions, one side must contain a
VariableReferenceExpression
, and the other must be aConstantExpression
--specifically a string or numeric literal. Expressions likea > c
orsubstr(a, 2) = lower(c)
are not eligible for pushdown. In simple cases without logical operators, the SQL plan can be directly translated into a KQL query. However, for expressions involving logical operators, it's critical to ensure that all conditions are handled correctly.A naive approach would be to convert only the pushdown-eligible parts of the SQL query into KQL, letting Presto or Prestissimo handle the rest. But this can lead to incorrect execution plans and unintended behavior.
Consider the following query:
Since CLP doesn’t currently support
regexp_like
, if we simply push downb = 2
to CLP, Presto will only receive results whereb = 2
, effectively changing the query semantics toregexp_like(a, '\d+b') AND b = 2
.To prevent such issues, the pushdown logic follows these rules:
OR
pushdown: AnOR
condition is pushable if and only if all child expressions can be pushed down. In this case, all child expressions are pushed down together with the OR operator.AND
pushdown: AnAND
condition is not pushable if and only if none of its child expressions are pushable. Otherwise, pushable expressions are pushed down with theAND
operator, while non-pushable expressions remain in the original plan.CallExpression
and can be transformed into KQL syntax.Example transformations:
SQL WHERE Clause:
a = 2 AND b = '3'
a: 2 AND b: "3"
FilterNode
is removed.SQL WHERE Clause:
a.b LIKE '%string another_string' OR "c.d" = TRUE
a.b: "*string another_string" OR c.d: true
FilterNode
is removed.SQL WHERE Clause:
a > 2 OR regexp_like(b, '\d+b')
"*"
a > 2 OR regexp_like(b, '\d+b')
remains in theFilterNode
.SQL WHERE Clause:
a > 2 AND regexp_like(b, '\d+b')
a > 2
regexp_like(b, '\d+b')
remains in theFilterNode
.SQL WHERE Clause:
NOT substr(a, 2, 5) = 'Hello' and b IS NULL
NOT a: "?Hello*" AND NOT b: *
FilterNode
is removed.Motivation and Context
See the associated RFC.
Impact
This module is independent from other modules and will not affect any existing functionality.
Test Plan
Unit tests are included in this PR, and we have also done end-to-end tests.
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.
Summary by CodeRabbit
Summary by CodeRabbit
New Features
clp
connector option.Documentation
Tests