Skip to content

Clp integration fix #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Clp integration fix #6

wants to merge 7 commits into from

Conversation

wraymo
Copy link
Member

@wraymo wraymo commented May 9, 2025

Description

This PR introduces a CLP connector. The background and proposed implementation details are outlined in the associated RFC.

This PR implements one part of phase 1 of the proposed implementation, namely the Java implementation for the coordinator. The worker implementation will leverage Velox as the default query engine, so once the Velox PR is merged, we will submit another PR to this repo to add the necessary changes to presto-native-execution.

Like other connectors, we have created a presto-clp module and implemented all required connector interfaces as well as a few extras to support query push downs.

The important classes in the connector are described below.


Core Classes in Java

ClpConfig

The configuration class for CLP. Currently, we support the following properties:

  • clp.metadata-expire-interval: Defines the time interval after which metadata entries are considered expired and removed from the cache.
  • clp.metadata-refresh-interval: Specifies how frequently metadata should be refreshed from the source to ensure up-to-date information.
  • clp.polymorphic-type-enabled: Enables or disables support for polymorphic types within CLP. This determines whether dynamic type resolution is allowed.
  • clp.metadata-provider-type: Defines the type of the metadata provider. It could be a database, a file-based store, or another external system. By default, we use MySQL.
  • clp.metadata-db-url: The connection URL for the metadata database, used when clp.metadata-provider-type is configured to use a database.
  • clp.metadata-db-name: The name of the metadata database.
  • clp.metadata-db-user: The database user with access to the metadata database.
  • clp.metadata-db-password: The password for the metadata database user.
  • clp.metadata-table-prefix: A prefix applied to table names in the metadata database.
  • clp.split-provider-type: Defines the type of split provider for query execution. By default, we use MySQL, and the connection parameters are the same as those for the metadata database.

ClpSchemaTree

A helper class for constructing a nested schema representation from CLP’s column definitions. It supports hierarchical column names (e.g., a.b.c), handles name/type conflicts when the clp.polymorphic-type-enabled option is enabled, and maps serialized CLP types to Presto types. The schema tree produces a flat list of ClpColumnHandle instances, including RowType for nested structures, making it suitable for dynamic or semi-structured data formats.

When polymorphic types are enabled, conflicting fields are given unique names by appending a type-specific suffix to the column name. For instance, if an integer field named "a" and a Varstring (CLP type) field named "a" coexist in CLP’s schema tree, they are represented as a_bigint and a_varchar in Presto. This approach ensures that such fields remain queryable while adhering to Presto’s constraints.


ClpMetadataProvider

An interface responsible for retrieving metadata from a specified source.

public interface ClpMetadataProvider {
    List<ClpColumnHandle> listColumnHandles(SchemaTableName schemaTableName);  
    List<ClpTableHandle> listTableNames(String schema);  
}

We provide a default implementation called ClpMySqlMetadataProvider, which uses two MySQL tables. One of these is the datasets table, defined with the schema shown below. Currently, we support only a single Presto schema named default, and this metadata table stores all table names, paths, and storage types associated with that Presto schema.

Column Name Data Type Constraints
name VARCHAR(255) PRIMARY KEY
archive_storage_type VARCHAR(4096) NOT NULL
archive_storage_directory VARCHAR(4096) NOT NULL

The second MySQL table contains column metadata, defined by the schema shown below. Each Presto table is associated with a corresponding MySQL table that stores metadata about its columns.

Column Name Data Type Constraints
name VARCHAR(512) NOT NULL
type TINYINT NOT NULL
Primary Key (name, type)

ClpSplitProvider

In CLP, an archive is the fundamental unit for searching, and we treat each archive as a Presto Split. This allows independent parallel searches across archives. The ClpSplitProvider interface, shown below, defines how to retrieve split information from a specified source:

public interface ClpSplitProvider {
    List<ClpSplit> listSplits(ClpTableLayoutHandle clpTableLayoutHandle);  
}

We provide a default implementation called ClpMySqlSplitProvider. It uses an archive table to store archive IDs associated with each table. The table below shows part of the schema (some irrelevant fields are omitted).

Column Name Data Type Constraints
pagination_id BIGINT AUTO_INCREMENT PRIMARY KRY
id VARCHAR(128) NOT NULL
... ... ...

By concatenating the table path (archive_storage_directory) and the archive ID (id), we can retrieve all split paths for a table.


ClpMetadata

This interface enables Presto to access various metadata. All requests are delegated to ClpMetadataProvider

For metadata management, it also maintains two caches and periodically refreshes the metadata.

  • columnHandleCache: A LoadingCache<SchemaTableName, List<ClpColumnHandle>> that maps a SchemaTableName to its corresponding list of ClpColumnHandle objects.
  • tableHandleCache: A LoadingCache<String, List<ClpTableHandle>> that maps a schema name (String) to a list of ClpTableHandle objects

ClpConnectorPlanOptimizer and ClpFilterToKqlConverter

Presto exposes PlanNode to the connector, allowing the connector to push down relevant filters to CLP for improved query performance.

There are three main steps

  1. Transforming filter predicates into KQL queries
  2. Adding the generated KQL query to ClpTableLayoutHandle and constructing a new TableScanNode
  3. Reconstructing a FilterNode with any remaining predicates and the new TableScanNode

ClpFilterToKqlConverter implements RowExpressionVisitor<ClpExpression, Void> and handles expression transformation and pushdown. Since KQL is not SQL-compatible, only certain types of filters can be converted, including:

  • String exact match
  • LIKE predicate (the "^%[^%_]*%$" pattern is not pushed down)
  • Numeric comparisons
  • Logical operators (AND, OR, NOT)
  • IS NULL
  • substr(a, start, length) = 'abc' and substr(a, start) = 'abc' forms

For comparison or match expressions, one side must contain a VariableReferenceExpression, and the other must be a ConstantExpression--specifically a string or numeric literal. Expressions like a > c or substr(a, 2) = lower(c) are not eligible for pushdown. In simple cases without logical operators, the SQL plan can be directly translated into a KQL query. However, for expressions involving logical operators, it's critical to ensure that all conditions are handled correctly.

A naive approach would be to convert only the pushdown-eligible parts of the SQL query into KQL, letting Presto or Prestissimo handle the rest. But this can lead to incorrect execution plans and unintended behavior.

Consider the following query:

SELECT * from clp_table WHERE regexp_like(a, '\d+b') OR b = 2

Since CLP doesn’t currently support regexp_like, if we simply push down b = 2 to CLP, Presto will only receive results where b = 2, effectively changing the query semantics to regexp_like(a, '\d+b') AND b = 2.

To prevent such issues, the pushdown logic follows these rules:

  • OR pushdown: An OR condition is pushable if and only if all child expressions can be pushed down. In this case, all child expressions are pushed down together with the OR operator.
  • AND pushdown: An AND condition is not pushable if and only if none of its child expressions are pushable. Otherwise, pushable expressions are pushed down with the AND operator, while non-pushable expressions remain in the original plan.
  • General Pushdown Condition: An expression is considered pushable if it’s a CallExpression and can be transformed into KQL syntax.

Example transformations:

  1. SQL WHERE Clause: a = 2 AND b = '3'

    • KQL: a: 2 AND b: "3"
    • Effect: The FilterNode is removed.
  2. SQL WHERE Clause: a.b LIKE '%string another_string' OR "c.d" = TRUE

    • KQL: a.b: "*string another_string" OR c.d: true
    • Effect: The FilterNode is removed.
  3. SQL WHERE Clause: a > 2 OR regexp_like(b, '\d+b')

    • KQL: "*"
    • Effect: a > 2 OR regexp_like(b, '\d+b') remains in the FilterNode.
  4. SQL WHERE Clause: a > 2 AND regexp_like(b, '\d+b')

    • KQL: a > 2
    • Effect: regexp_like(b, '\d+b') remains in the FilterNode.
  5. SQL WHERE Clause: NOT substr(a, 2, 5) = 'Hello' and b IS NULL

    • KQL: NOT a: "?Hello*" AND NOT b: *
    • Effect: The FilterNode is removed.

Motivation and Context

See the associated RFC.

Impact

This module is independent from other modules and will not affect any existing functionality.

Test Plan

Unit tests are included in this PR, and we have also done end-to-end tests.

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

Connector Changes
* Add coordinator support for the CLP connector :pr:`24868`
* Add documentation for the CLP connector :doc:`../connector/clp` :pr:`24868`

wraymo added 5 commits April 4, 2025 16:22
…ath to splitPath

2. Fix a bug with array type annotation in ClpSchemaTree
3. Update table schema for ClpMySqlSplitProvider and ClpMySqlMetadataProvider and update tests accordingly
Copy link

coderabbitai bot commented May 9, 2025

Walkthrough

A new "presto-clp" connector module has been added to the Presto project. This includes all connector components, metadata and split providers (with MySQL support), configuration, error codes, planning and filter pushdown logic, and test coverage. Documentation and build/provisioning integration for the connector are also included.

Changes

File(s) / Path(s) Change Summary
pom.xml, presto-server/src/main/provisio/presto.xml Added the "presto-clp" module to the build, dependency management, and provisioning configuration.
presto-docs/src/main/sphinx/connector.rst, presto-docs/src/main/sphinx/connector/clp.rst Added CLP connector documentation, detailing configuration, usage, data type mappings, and capabilities. Updated connector list to include CLP.
presto-clp/pom.xml Introduced Maven descriptor for the new "presto-clp" module, declaring dependencies, packaging, and project metadata.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/* Implemented the CLP connector: core classes (ClpConnector, ClpConnectorFactory, ClpPlugin, ClpModule), configuration (ClpConfig), error codes, handle resolver, transaction handle, column/table/layout/split handle classes, record set provider, metadata, plan optimizer and provider, filter-to-KQL converter, and expression wrapper.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/* Added metadata provider interfaces and MySQL-backed implementation, schema tree for column handling, and node type enum for CLP types.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/* Added split provider interface and MySQL-backed implementation for listing data splits.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java Added integration-style tests for metadata retrieval, schema/table enumeration, and column metadata correctness using an embedded H2 database.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java Added comprehensive tests for filter pushdown and KQL translation, covering string, numeric, logical, and complex predicates.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java Added a test utility class for expression parsing, type resolution, and row expression translation, used as a base for query-related tests.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java Added tests for split provider, verifying correct split enumeration and path formatting using an embedded H2 database.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant PrestoEngine
    participant ClpConnector
    participant ClpMetadataProvider
    participant ClpSplitProvider
    participant MySQL

    User->>PrestoEngine: Submit SQL query on CLP catalog
    PrestoEngine->>ClpConnector: beginTransaction()
    PrestoEngine->>ClpConnector: getMetadata()
    PrestoEngine->>ClpConnector: getSplitManager()
    PrestoEngine->>ClpConnector: getRecordSetProvider()
    PrestoEngine->>ClpConnector: getConnectorPlanOptimizerProvider()
    PrestoEngine->>ClpMetadataProvider: listTableNames(), listColumnHandles()
    ClpMetadataProvider->>MySQL: Query metadata tables
    MySQL-->>ClpMetadataProvider: Return schema/table/column info
    ClpMetadataProvider-->>PrestoEngine: Return metadata
    PrestoEngine->>ClpSplitProvider: listSplits()
    ClpSplitProvider->>MySQL: Query archive/split tables
    MySQL-->>ClpSplitProvider: Return split info
    ClpSplitProvider-->>PrestoEngine: Return splits
    PrestoEngine->>ClpConnector: shutdown()
Loading
sequenceDiagram
    participant PrestoEngine
    participant ClpPlanOptimizer
    participant ClpFilterToKqlConverter

    PrestoEngine->>ClpPlanOptimizer: Optimize plan with filter on table scan
    ClpPlanOptimizer->>ClpFilterToKqlConverter: Convert filter RowExpression to KQL
    ClpFilterToKqlConverter-->>ClpPlanOptimizer: Return KQL and remaining predicate
    ClpPlanOptimizer-->>PrestoEngine: Rewrite plan with pushed-down KQL filter
Loading

Tip

⚡️ Faster reviews with caching
  • CodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure Review - Disable Cache at either the organization or repository level. If you prefer to disable all data retention across your organization, simply turn off the Data Retention setting under your Organization Settings.

Enjoy the performance boost—your workflow just got faster.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7ca1179 and f664d82.

📒 Files selected for processing (1)
  • pom.xml (2 hunks)
🔇 Additional comments (2)
pom.xml (2)

217-217: Add presto-clp to root modules list
The new presto-clp connector has been correctly added to the <modules> section, so it will be included in the aggregate build.


836-840: Declare presto-clp in dependencyManagement
Registering com.facebook.presto:presto-clp:${project.version} under <dependencyManagement> centralizes its version control for downstream modules.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Comment on lines 21 to 32
public interface ClpMetadataProvider
{
/**
* Returns the list of column handles for the given table.
*/
public List<ClpColumnHandle> listColumnHandles(SchemaTableName schemaTableName);

/**
* Returns the list of table names in the given schema.
*/
public List<String> listTableNames(String schema);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public interface ClpMetadataProvider
{
/**
* Returns the list of column handles for the given table.
*/
public List<ClpColumnHandle> listColumnHandles(SchemaTableName schemaTableName);
/**
* Returns the list of table names in the given schema.
*/
public List<String> listTableNames(String schema);
}
public interface ClpMetadataProvider
{
String DEFAULT_SCHEMA_NAME = "default";
/**
* Returns the list of column handles for the given table.
*/
public List<ClpColumnHandle> listColumnHandles(SchemaTableName schemaTableName);
/**
* Returns the list of table names in the given schema.
*/
public List<String> listTableNames(String schema);
List<String> listSchemaNames();
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEFAULT_SCHEMA_NAME is in ClpMetadata

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think provider should not be aware of DEFAULT_SCHEMA_NAME, and DEFAULT_SCHEMA_NAME is not used in the provider

private static final String DEFAULT_SCHEMA_NAME = "default";
private final ClpMetadataProvider clpMetadataProvider;
private final LoadingCache<SchemaTableName, List<ClpColumnHandle>> columnHandleCache;
private final LoadingCache<String, List<String>> tableNameCache;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private final LoadingCache<String, List<String>> tableNameCache;
private final LoadingCache<String, List<String>> tableNameCache;
private final LoadingCache<String, List<String>> schemaNameCache;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only support one schema, and there's no need to add a cache for it

Comment on lines 61 to 64
this.tableNameCache = CacheBuilder.newBuilder()
.expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS)
.refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS)
.build(CacheLoader.from(this::loadTableNames));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this.tableNameCache = CacheBuilder.newBuilder()
.expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS)
.refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS)
.build(CacheLoader.from(this::loadTableNames));
this.tableNameCache = CacheBuilder.newBuilder()
.expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS)
.refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS)
.build(CacheLoader.from(this::loadTableNames));
this.schemaNameCache = CacheBuilder.newBuilder()
.expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS)
.refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS)
.build(CacheLoader.from(this::loadSchemaNames));

{
return columnHandleCache.getUnchecked(schemaTableName);
}

Copy link
Member

@anlowee anlowee May 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private List<String> loadSchemaNames()
{
return clpMetadataProvider.listSchemaNames();
}

@Override
public List<SchemaTableName> listTables(ConnectorSession session, Optional<String> schemaName)
{
String schemaNameValue = schemaName.orElse(DEFAULT_SCHEMA_NAME);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
String schemaNameValue = schemaName.orElse(DEFAULT_SCHEMA_NAME);
String schemaNameValue = schemaName.orElse(ClpMetadataProvider.DEFAULT_SCHEMA_NAME);

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 49

🔭 Outside diff range comments (1)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (1)

195-205: 🧹 Nitpick (assertive)

Follow project guideline: prefer false == <expr> over !<expr>

Coding-guidelines explicitly require false == <expression> for negation in Java code.
!hasDefinition violates this rule and appears several times in the class (and in other new classes).

Example fix:

-            if (!expression.getDefinition().isPresent()) {
+            if (false == expression.getDefinition().isPresent()) {

Please run a quick search-replace to stay consistent with the code-base conventions.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c8566b4 and 550e448.

📒 Files selected for processing (34)
  • pom.xml (2 hunks)
  • presto-clp/pom.xml (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpHandleResolver.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlugin.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplitManager.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTransactionHandle.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpNodeType.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitProvider.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1 hunks)
  • presto-docs/src/main/sphinx/connector.rst (1 hunks)
  • presto-docs/src/main/sphinx/connector/clp.rst (1 hunks)
  • presto-server/src/main/provisio/presto.xml (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizerProvider.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpHandleResolver.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitProvider.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTransactionHandle.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpNodeType.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlugin.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplitManager.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpFilterToKqlConverter.java
🧬 Code Graph Analysis (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitProvider.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1)
  • ClpSplit (30-64)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1)
  • ClpTableLayoutHandle (23-73)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
  • ClpColumnHandle (26-114)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
presto-spi/src/main/java/com/facebook/presto/spi/ColumnMetadata.java (1)
  • ColumnMetadata (29-205)
⏰ Context from checks skipped due to timeout of 90000ms (20)
  • GitHub Check: hive-dockerized-tests (17.0.13)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (8, :presto-accumulo)
  • GitHub Check: test-other-modules (17.0.13)
  • GitHub Check: spark-integration (17.0.13)
  • GitHub Check: spark-integration (8)
  • GitHub Check: product-tests-specific-environment1 (17.0.13)
  • GitHub Check: product-tests-specific-environment2 (17.0.13)
  • GitHub Check: product-tests-specific-environment1 (8)
  • GitHub Check: product-tests-specific-environment2 (8)
  • GitHub Check: singlestore-dockerized-tests (17.0.13)
  • GitHub Check: singlestore-dockerized-tests (8)
  • GitHub Check: product-tests-basic-environment (17.0.13)
  • GitHub Check: product-tests-basic-environment (8)
  • GitHub Check: maven-checks (8)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: test (:presto-docs)
  • GitHub Check: arrowflight-java-tests (17.0.13, :presto-base-arrow-flight)
  • GitHub Check: arrowflight-java-tests (8, :presto-base-arrow-flight)
  • GitHub Check: prestocpp-linux-build-for-test
🔇 Additional comments (28)
presto-docs/src/main/sphinx/connector.rst (1)

17-17: Looks good!

The CLP connector entry has been added to the documentation in the proper alphabetical order, consistent with the existing format.

presto-server/src/main/provisio/presto.xml (1)

236-240: Proper integration of CLP connector artifact

The artifact definition follows the same structure as other connector plugins and is correctly positioned in the file. This ensures the CLP connector will be properly included in the Presto server distribution.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitProvider.java (1)

21-27: Interface design looks good

The interface is well-defined with a clear single responsibility - listing splits for a given table layout handle. The method signature and JavaDoc are appropriate.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTransactionHandle.java (1)

18-22: Proper transaction handle implementation

The singleton enum pattern is appropriate for a connector transaction handle. This implementation is consistent with other Presto connector transaction handles.

pom.xml (2)

217-217: Integration of presto-clp module looks good.

The new module has been properly added to the list of modules in the root pom.xml file.


823-827: Dependency management entry for presto-clp module is properly defined.

The dependency management section correctly includes the new presto-clp module with the appropriate project version, following the pattern used for other modules.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlugin.java (1)

1-28: Plugin class implementation looks good.

The ClpPlugin class correctly implements the Presto SPI Plugin interface and properly registers the connector factory.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java (1)

22-43: Well-structured error code implementation

The error code implementation follows Presto's conventions well with:

  • Descriptive names for each error scenario
  • Proper use of the EXTERNAL error type for connector errors
  • Good organization with sequential code values and a unique base offset (0x0400_0000)
  • Complete implementation of the ErrorCodeSupplier interface
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpHandleResolver.java (1)

23-55: Complete handle resolver implementation

The implementation correctly maps all the required handle types to their CLP-specific implementations, following the standard pattern for Presto connector handle resolvers.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (2)

39-46: Appropriate metadata provider configuration and error handling

The implementation correctly binds the MySQL metadata provider based on configuration and throws appropriate exceptions for unsupported types.


27-37: Core component bindings look good

The singleton bindings for core connector components follow Presto's standard practices.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpNodeType.java (2)

32-46: LGTM: Efficient lookup table initialization

The static initialization block correctly handles potential gaps in byte code values by finding the maximum type code and creating an appropriately sized lookup table.


55-57: 🧹 Nitpick (assertive)

Consider using yoda condition for the null check

The condition check could be rewritten to follow the coding guideline of preferring false == <expression> rather than !<expression> style.

-        if (type < 0 || type >= LOOKUP_TABLE.length || LOOKUP_TABLE[type] == null) {
+        if (type < 0 || type >= LOOKUP_TABLE.length || null == LOOKUP_TABLE[type]) {

Likely an incorrect or invalid review comment.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplitManager.java (1)

33-37: LGTM: Proper dependency injection with null check

The constructor properly injects the required dependency and validates it with requireNonNull.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java (1)

54-60: LGTM: Comprehensive binding of required services

The code correctly binds all required services from the ConnectorContext to make them available for dependency injection.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1)

36-39: LGTM: Proper JSON serialization annotations

The class correctly uses Jackson annotations for JSON serialization and deserialization.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1)

52-57: 🧹 Nitpick (assertive)

Use preferred null check format

According to the coding guidelines, prefer false == <expression> rather than !<expression>. This applies to the null check in the equals method.

-        if (o == null || getClass() != o.getClass()) {
+        if (false == (o != null) || false == (getClass() == o.getClass())) {
            return false;
        }

Likely an incorrect or invalid review comment.

presto-clp/pom.xml (1)

86-96:

✅ Verification successful

Inconsistent Airlift dependency group IDs

There's an inconsistency in the group IDs for Airlift dependencies. Early dependencies use com.facebook.airlift, but these use io.airlift, which might lead to conflicts.


🏁 Script executed:

#!/bin/bash
# Check for potential version conflicts between the different Airlift dependencies

# 1. Look for airlift dependencies in the current project
echo "Airlift dependencies in this project:"
grep -A2 "artifactId>.*airlift" presto-clp/pom.xml

# 2. Check parent pom for dependency definitions
echo -e "\nAirlift dependency definitions in parent pom:"
grep -A10 -n "<artifactId>.*airlift" pom.xml | head -20

Length of output: 302


To verify whether there’s truly a mix of com.facebook.airlift and io.airlift dependencies, let’s search for all Airlift groupId entries in both the child and parent POMs, and across the project:

This will confirm if both group IDs are used and pinpoint where they occur.


🏁 Script executed:

#!/bin/bash
set -e

echo "Searching for Airlift groupId entries in presto-clp/pom.xml"
grep -n "<groupId>.*airlift" presto-clp/pom.xml || true

echo -e "\nSearching for Airlift groupId entries in parent pom.xml"
grep -R -n "<groupId>.*airlift" pom.xml || true

echo -e "\nSearching for any com.facebook.airlift occurrences across the project"
grep -R -n "com.facebook.airlift" . || true

echo -e "\nSearching for any io.airlift occurrences across the project"
grep -R -n "io.airlift" . || true

Length of output: 125540


Airlift groupId mix is expected—no change needed
The project legitimately uses both com.facebook.airlift (for legacy modules) and io.airlift (for relocated modules like units and slice). Versions for all Airlift artifacts are controlled in the parent POM, so there’s no conflict at runtime.

• presto-clp/pom.xml lines 28–43: legacy com.facebook.airlift deps
• presto-clp/pom.xml lines 87–93: new io.airlift deps
• parent pom.xml: dependencyManagement pins versions for both groupIds

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpExpression.java (5)

29-41: Well-designed immutable class with proper documentation.

The class properly implements an immutable data structure with final fields and comprehensive JavaDoc that clearly explains the three possible states of expression conversion. The constructor correctly initializes all fields.


43-47: LGTM - constructor delegation is properly implemented.

The no-args constructor correctly delegates to the primary constructor with empty optionals for both fields, maintaining immutability.


49-53: LGTM - single definition constructor is well implemented.

This constructor properly handles the case of a fully translatable KQL string, setting the definition and leaving the remaining expression empty.


55-59: LGTM - remaining expression constructor is well implemented.

This constructor properly handles the case of a non-translatable expression, setting the remaining expression and leaving the definition empty.


61-69: LGTM - getters are properly implemented.

The getter methods are correctly implemented to return the Optional fields.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (3)

26-45: LGTM - Well-designed immutable column handle class.

The class properly implements the ColumnHandle interface and follows an immutable pattern with final fields. The JsonCreator and JsonProperty annotations are correctly used for serialization/deserialization.


47-50: LGTM - Convenience constructor with proper delegation.

This convenience constructor properly delegates to the primary constructor, setting the original column name to be the same as the column name.


76-83: LGTM - ColumnMetadata creation is well implemented.

The method correctly uses the builder pattern to create a ColumnMetadata instance with the appropriate properties.

presto-docs/src/main/sphinx/connector/clp.rst (1)

1-182: Comprehensive connector documentation with clear explanations.

The documentation is well-structured and provides comprehensive information about the CLP connector, including:

  • Clear overview and configuration instructions
  • Detailed explanation of all configuration properties with defaults
  • Thorough description of data type mappings
  • Explanation of metadata and split providers
  • Clear examples for various data structures
  • SQL support and limitations

The documentation will be very helpful for users setting up and using the CLP connector.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1)

67-70: Isolation level argument is silently ignored

beginTransaction() accepts an IsolationLevel, but the chosen ClpTransactionHandle.INSTANCE doesn’t seem to differentiate based on that parameter. Please confirm that CLP truly has no need to track isolation or read-only flags; otherwise we should create distinct handles or, at minimum, validate/throw on unsupported levels.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

♻️ Duplicate comments (17)
presto-clp/pom.xml (1)

5-19: Add <build> section to pom.xml

The POM file is missing a <build> section which is present in other connector modules. This section is necessary to configure build plugins for the presto-plugin packaging type.

Consider adding a <build> section after the <properties> block:

    <properties>
        <air.main.basedir>${project.parent.basedir}</air.main.basedir>
    </properties>

+    <build>
+        <plugins>
+            <plugin>
+                <groupId>org.apache.maven.plugins</groupId>
+                <artifactId>maven-surefire-plugin</artifactId>
+                <configuration>
+                    <skipTests>true</skipTests>
+                </configuration>
+            </plugin>
+        </plugins>
+    </build>
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1)

21-32: 🧹 Nitpick (assertive)

Consider adding a method to list available schemas

The interface provides methods to list tables and columns but lacks a method to list available schemas. This would provide a complete metadata API.

public interface ClpMetadataProvider
{
    /**
     * Returns the list of column handles for the given table.
     */
    List<ClpColumnHandle> listColumnHandles(SchemaTableName schemaTableName);

    /**
     * Returns the list of table names in the given schema.
     */
    List<String> listTableNames(String schema);
+
+    /**
+     * Returns the list of all available schema names.
+     */
+    List<String> listSchemaNames();
}
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (4)

50-50: Consider using platform-independent temp directory

The hard-coded path /tmp/split_testdb might not work on all operating systems, particularly Windows.

-        final String metadataDbUrl = "jdbc:h2:file:/tmp/split_testdb;MODE=MySQL;DATABASE_TO_UPPER=FALSE";
+        final String tempDir = System.getProperty("java.io.tmpdir");
+        final String metadataDbUrl = String.format("jdbc:h2:file:%s/split_testdb;MODE=MySQL;DATABASE_TO_UPPER=FALSE", tempDir);

80-81: Use platform-independent path construction

Hard-coded path separators may cause issues on different operating systems.

-                    pstmt.setString(2, "/tmp/archives/" + tableName);
+                    pstmt.setString(2, new File(System.getProperty("java.io.tmpdir"), "archives" + File.separator + tableName).getPath());

115-123: Improve file cleanup handling in tearDown method

The current implementation has several issues:

  1. Uses hard-coded paths that aren't platform-independent
  2. Doesn't handle exceptions during file deletion
  3. File deletion failures are silently ignored
    @AfterMethod
    public void tearDown()
    {
-        File dbFile = new File("/tmp/split_testdb.mv.db");
-        File lockFile = new File("/tmp/split_testdb.trace.db"); // Optional, H2 sometimes creates this
-        if (dbFile.exists()) {
-            dbFile.delete();
-            System.out.println("Deleted database file: " + dbFile.getAbsolutePath());
-        }
-        if (lockFile.exists()) {
-            lockFile.delete();
-        }
+        String tempDir = System.getProperty("java.io.tmpdir");
+        File dbFile = new File(tempDir, "split_testdb.mv.db");
+        File lockFile = new File(tempDir, "split_testdb.trace.db"); // Optional, H2 sometimes creates this
+        try {
+            if (dbFile.exists() && !dbFile.delete()) {
+                System.err.println("Failed to delete database file: " + dbFile.getAbsolutePath());
+            }
+            else if (dbFile.exists()) {
+                System.out.println("Deleted database file: " + dbFile.getAbsolutePath());
+            }
+            
+            if (lockFile.exists() && !lockFile.delete()) {
+                System.err.println("Failed to delete lock file: " + lockFile.getAbsolutePath());
+            }
+        }
+        catch (Exception e) {
+            System.err.println("Error cleaning up database files: " + e.getMessage());
+        }
    }

129-136: Enhance test coverage for ClpSplit properties

The test only verifies the split path format but doesn't check other properties of the splits.

    @Test
    public void testListSplits()
    {
        ClpSplitProvider splitProvider = new ClpMySqlSplitProvider(config);
        for (String tableName : TABLE_NAME_LIST) {
            ClpTableLayoutHandle layoutHandle = new ClpTableLayoutHandle(new ClpTableHandle(new SchemaTableName(TABLE_SCHEMA, tableName)), Optional.empty());
            List<ClpSplit> splits = splitProvider.listSplits(layoutHandle);
            assertEquals(splits.size(), NUM_SPLITS);
            for (int i = 0; i < NUM_SPLITS; i++) {
-                assertEquals(splits.get(i).getPath(), "/tmp/archives/" + tableName + "/id_" + i);
+                String expectedPath = "/tmp/archives/" + tableName + "/id_" + i;
+                assertEquals(splits.get(i).getPath(), expectedPath);
+                // Verify other important split properties
+                assertNotNull(splits.get(i).getConnectorId(), "ConnectorId should not be null");
+                assertNotNull(splits.get(i).getConnectorSplitIdentity(), "SplitIdentity should not be null");
+                // Add more assertions for other properties as needed
            }
        }
    }
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)

49-56: Appropriate split provider configuration and error handling

The implementation correctly binds the MySQL split provider based on configuration and throws appropriate exceptions for unsupported types.

For future extensibility, if more provider types are added later, you might consider refactoring these conditionals to a more scalable approach like a provider registry or factory pattern.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (2)

31-37: Document the meaning of null splitPath

The path field is used without documentation explaining what a null value means in this context.

Add a comment or Javadoc explaining the significance of a null path.


57-61: Consider providing more specific info in getInfo()

The getInfo() method returns the instance itself, which might not be the most informative approach if specific split information is needed.

-    public Object getInfo()
-    {
-        return this;
-    }
+    public Object getInfo()
+    {
+        return ImmutableMap.of("splitPath", path == null ? "NULL" : path);
+    }

This would require adding the following import:

import com.google.common.collect.ImmutableMap;
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)

97-99: 🛠️ Refactor suggestion

Align equality check with coding guidelines

The equality check doesn't follow the coding guideline to prefer false == <expression> rather than !<expression>.

-        if (obj == null || getClass() != obj.getClass()) {
+        if (false == (obj instanceof ClpColumnHandle)) {
            return false;
        }
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1)

85-93: Swallowed shutdown exception still unresolved (same as previous review)
The shutdown() method logs the exception but lets the connector continue shutting down silently, exactly as noted in the prior review. Consider re-throwing a RuntimeException or propagating a failed future so that coordinators/workers surface the failure.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1)

60-65: 🧹 Nitpick (assertive)

Conform to project negation style & guard against NPE

The condition violates the guideline favouring false == expr:

-        if (dbName != null && !dbName.isEmpty()) {
+        if (null != dbName && false == dbName.isEmpty()) {

(You might also consider issuing USE ? through a prepared statement rather than string concatenation.)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (2)

152-162: Style guideline breach – use false == expr instead of !expr

Lines 152 and 162 violate the project-wide rule that prefers false == <expression> to the logical-NOT operator. Earlier feedback already highlighted the same issue elsewhere in this class, so please sweep the new code as well.

-        if (!polymorphicTypeEnabled) {
+        if (false == polymorphicTypeEnabled) {-            else if (!existing.type.equals(newType)) {
+            else if (false == existing.type.equals(newType)) {

135-146: 🛠️ Refactor suggestion

Column order is nondeterministic – iterate deterministically

HashMap iteration makes the order of columns returned by collectColumnHandles() JVM-dependent, which can lead to unstable metadata and hard-to-debug mismatches between workers.

-        for (Map.Entry<String, ClpNode> entry : root.children.entrySet()) {
+        root.children.entrySet().stream()
+                .sorted(Map.Entry.comparingByKey()) // deterministic order
+                .forEach(entry -> {
+                    String name = entry.getKey();
+                    ClpNode child = entry.getValue();
+                    if (child.isLeaf()) {
+                        columns.add(new ClpColumnHandle(name, child.originalName, child.type, true));
+                    }
+                    else {
+                        Type rowType = buildRowType(child);
+                        columns.add(new ClpColumnHandle(name, child.originalName, rowType, true));
+                    }
+                });
-
-        }
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1)

98-113: Style guideline breach – prefer false == …

The three if statements at lines 98-100 and 112-113 negate predicates with !. To stay consistent with project conventions, rewrite them using false ==.

-        if (!listSchemaNames(session).contains(schemaNameValue)) {
+        if (false == listSchemaNames(session).contains(schemaNameValue)) {-        if (!listSchemaNames(session).contains(schemaName)) {
+        if (false == listSchemaNames(session).contains(schemaName)) {
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (2)

61-66: Make city immutable to avoid accidental state mutation

All other column-handle constants are declared final, but city is not. Making it mutable is unnecessary and increases the risk that tests mutate shared state between runs.

-    protected static final ClpColumnHandle city = new ClpColumnHandle("city", RowType.from(ImmutableList.of(
+    protected static final ClpColumnHandle city = new ClpColumnHandle("city", RowType.from(ImmutableList.of(

98-101: Reuse a single SqlParser instance to reduce object churn

expression() instantiates a new SqlParser for every call. These objects are heavyweight; keeping one static parser (thread-safe in Presto) improves test execution time.

-        return ExpressionUtils.rewriteIdentifiersToSymbolReferences(new SqlParser().createExpression(sql, new ParsingOptions(ParsingOptions.DecimalLiteralTreatment.AS_DECIMAL)));
+        return ExpressionUtils.rewriteIdentifiersToSymbolReferences(SQL_PARSER.createExpression(sql, PARSING_OPTIONS));

And add:

private static final SqlParser SQL_PARSER = new SqlParser();
private static final ParsingOptions PARSING_OPTIONS =
        new ParsingOptions(ParsingOptions.DecimalLiteralTreatment.AS_DECIMAL);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 550e448 and a824146.

📒 Files selected for processing (15)
  • presto-clp/pom.xml (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java
🧬 Code Graph Analysis (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
  • ClpColumnHandle (26-117)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
  • ClpConfig (21-189)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1)
  • ClpSplit (28-62)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1)
  • ClpTableLayoutHandle (25-79)
⏰ Context from checks skipped due to timeout of 90000ms (20)
  • GitHub Check: test (8, :presto-orc)
  • GitHub Check: test (8, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: product-tests-basic-environment (8)
  • GitHub Check: product-tests-basic-environment (17.0.13)
  • GitHub Check: test-other-modules (8)
  • GitHub Check: test-other-modules (17.0.13)
  • GitHub Check: hive-dockerized-tests (17.0.13)
  • GitHub Check: hive-tests (8)
  • GitHub Check: hive-dockerized-tests (8)
  • GitHub Check: hive-tests (17.0.13)
  • GitHub Check: kudu (8)
  • GitHub Check: kudu (17.0.13)
  • GitHub Check: spark-integration (17.0.13)
  • GitHub Check: spark-integration (8)
  • GitHub Check: maven-checks (8)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: arrowflight-java-tests (17.0.13, :presto-base-arrow-flight)
  • GitHub Check: arrowflight-java-tests (8, :presto-base-arrow-flight)
  • GitHub Check: test (:presto-docs)
  • GitHub Check: prestocpp-linux-build-for-test
🔇 Additional comments (9)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java (1)

28-35: Implementation looks good!

The getRecordSet method correctly throws an UnsupportedOperationException with a clear error message, which is better than returning null that would cause a NullPointerException at runtime.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (2)

33-38: Class bindings look good!

The module correctly binds all core connector components as singletons and configures the ClpConfig.


40-47: Appropriate metadata provider configuration and error handling

The implementation correctly builds the configuration object and binds the appropriate metadata provider based on the configured type, with proper error handling for unsupported types.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (2)

61-78: Implementation of equals, hashCode, and toString methods is correct

The methods correctly include both fields (table and kqlQuery), ensuring proper equality comparison and string representation.


57-59: 🛠️ Refactor suggestion

Align equality check with coding guidelines

The equality check doesn't follow the coding guideline to prefer false == <expression> rather than !<expression>.

-        if (o == null || getClass() != o.getClass()) {
+        if (false == (o instanceof ClpTableLayoutHandle)) {
            return false;
        }

Likely an incorrect or invalid review comment.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)

110-115: ToString method implementation looks good

The toString implementation now correctly includes all fields: columnName, originalColumnName, columnType, and nullable.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1)

55-58: ✔️ Cached optimiser provider – good catch

Switching from per-call construction to a constructor-injected, cached planOptimizerProvider eliminates unnecessary allocations and double-registration of optimisers. This addresses the earlier feedback perfectly.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (1)

110-116: originalName can diverge from map key after conflict renaming

When an intermediate node is renamed (lines 110-114), its originalName field remains the old segment. Down-stream consumers that rely on originalName for display or mapping might become confused.

Consider updating existing.originalName = renamedExisting; after the move, or add explanatory Javadoc clarifying why the old value is retained.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (1)

74-74: 🧹 Nitpick (assertive)

Make typeProvider final

This variable is initialized only once and never reassigned, so it should be marked as final to prevent accidental modification.

-    protected final TypeProvider typeProvider = TypeProvider.fromVariables(variableToColumnHandleMap.keySet());
+    protected static final TypeProvider typeProvider = TypeProvider.fromVariables(variableToColumnHandleMap.keySet());

Likely an incorrect or invalid review comment.

@wraymo wraymo force-pushed the clp_integration_fix branch from a824146 to 30cf714 Compare May 11, 2025 14:16
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

♻️ Duplicate comments (23)
presto-clp/pom.xml (1)

1-128: Add <build> section to presto-clp/pom.xml

The CLP connector POM omits a <build> block. All other connector modules (e.g. presto-accumulo, presto-base-arrow-flight) declare build plugins—such as the Maven Surefire and Enforcer plugins—to ensure the presto-plugin packaging is handled correctly.

Please add a <build> section immediately after the <properties> block in presto-clp/pom.xml. For example:

     </properties>
+
+    <build>
+        <plugins>
+            <plugin>
+                <groupId>org.apache.maven.plugins</groupId>
+                <artifactId>maven-surefire-plugin</artifactId>
+                <configuration>
+                    <skipTests>true</skipTests>
+                </configuration>
+            </plugin>
+            <!-- include any additional plugins (enforcer, packaging, etc.) as in other connectors -->
+        </plugins>
+    </build>
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)

49-56: LGTM! Appropriate split provider configuration and error handling

The implementation correctly binds the MySQL split provider based on configuration and throws appropriate exceptions for unsupported types.

For future extensibility, if more provider types are added later, you might consider refactoring these conditionals to a more scalable approach like a provider registry or factory pattern.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1)

57-61: getInfo() still returns this – consider exposing useful details.
This mirrors the earlier feedback but has not been addressed. Returning the full object leaks implementation details while providing little actionable information to the coordinator UI.

Re-expose only the data that is actually informative, e.g.

-    public Object getInfo()
-    {
-        return this;
-    }
+    public Object getInfo()
+    {
+        return ImmutableMap.of("path", path);
+    }

(This requires import com.google.common.collect.ImmutableMap;)

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (4)

50-56: Hard-coded /tmp makes tests OS-specific.

Prior review suggested using java.io.tmpdir; the literal /tmp will fail on Windows and some container setups.


80-83: Path construction is not platform-independent.

Concatenate via File or Paths to avoid “/” assumptions.


115-123: Tear-down still silently ignores deletion failures and hard-codes paths.

Re-use java.io.tmpdir, handle delete() return value, and log failures.


135-136: Assertion uses hard-coded separators.

Use File/Paths when building expectedPath so the test passes on Windows.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (2)

97-100: 🛠️ Refactor suggestion

Use instanceof to respect subclassing & follow style guide.

getClass() != obj.getClass() prevents equality with subclasses and violates the project’s guideline favouring false == (obj instanceof …) over negated comparisons.

-        if (obj == null || getClass() != obj.getClass()) {
+        if (false == (obj instanceof ClpColumnHandle)) {
             return false;
         }
-        ClpColumnHandle other = (ClpColumnHandle) obj;
+        ClpColumnHandle other = (ClpColumnHandle) obj;

101-105: 🧹 Nitpick (assertive)

Avoid boxing boolean in Objects.equals.

nullable is a primitive; wrapping it for Objects.equals is wasteful and less readable.

-                Objects.equals(this.columnType, other.columnType) &&
-                Objects.equals(this.nullable, other.nullable);
+                Objects.equals(this.columnType, other.columnType) &&
+                this.nullable == other.nullable;
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1)

85-92: Swallowing exceptions on shutdown hides resource-leak issues.

If lifeCycleManager.stop() throws, the connector reports success even though resources may still be open. Re-throw a RuntimeException (Presto will surface it) after logging, or return a failed future if this becomes async.

         catch (Exception e) {
             log.error(e, "Error shutting down connector");
-        }
+            throw new RuntimeException("Failed to shut down Clp connector", e);
+        }
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1)

106-107: Logger throwable placed in argument list, stack-trace is lost

Same issue flagged earlier: Logger expects the throwable as the first parameter.

-            log.warn("Database error while processing splits for %s: %s", tableName, e);
+            log.warn(e, "Database error while processing splits for %s", tableName);
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (3)

86-87: Throwable parameter position in log call

log.warn("…%s: %s", value, e) drops the stack trace. Pass e first, as already highlighted in previous reviews.

Also applies to: 111-112


75-80: ⚠️ Potential issue

Identifier validation not applied – SQL-injection risk repeats

listColumnHandles() embeds schemaTableName.getTableName() straight into the query. Unlike listTableNames(), no call to isValidIdentifier() is made.

-String query = String.format(QUERY_SELECT_COLUMNS, config.getMetadataTablePrefix(), schemaTableName.getTableName());
+String table = schemaTableName.getTableName();
+if (false == isValidIdentifier(table)) {
+    throw new IllegalArgumentException("Invalid table identifier: " + table);
+}
+String query = String.format(QUERY_SELECT_COLUMNS, config.getMetadataTablePrefix(), table);

Please apply similar hardening wherever table identifiers are interpolated.


59-63: 🧹 Nitpick (assertive)

Negation style & safer “USE” execution

Lines 61-63 violate the house style (false == expr) and build a SQL command with simple string concatenation.

-if (dbName != null && !dbName.isEmpty()) {
-    connection.createStatement().execute("USE " + dbName);
+if (null != dbName && false == dbName.isEmpty()) {
+    try (PreparedStatement stmt = connection.prepareStatement("USE ?")) {
+        stmt.setString(1, dbName);
+        stmt.execute();
+    }
}
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (1)

135-147: Iteration order of HashMap causes non-deterministic column ordering

HashMap does not guarantee iteration order; workers may disagree on column order which can surface as schema mismatches. Prior review suggested sorting; it is still outstanding.

-for (Map.Entry<String, ClpNode> entry : root.children.entrySet()) {
+root.children.entrySet().stream()
+        .sorted(Map.Entry.comparingByKey())
+        .forEach(entry -> {
+            String name = entry.getKey();
+            ClpNode child = entry.getValue();
+            if (child.isLeaf()) {
+                columns.add(new ClpColumnHandle(name, child.originalName, child.type, true));
+            }
+            else {
+                Type rowType = buildRowType(child);
+                columns.add(new ClpColumnHandle(name, child.originalName, rowType, true));
+            }
+        });
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java (1)

60-61: 🧹 Nitpick (assertive)

Consider reusing SessionHolder

The SessionHolder is instantiated separately inside each test method. Creating a single instance as a class field would improve test efficiency by reducing object creation.

-        SessionHolder sessionHolder = new SessionHolder();
+        // Use shared sessionHolder from class field
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (4)

98-101: 🛠️ Refactor suggestion

Coding style deviation: Use false == expr instead of !expr

According to the coding guidelines, use false == expression rather than !expression.

-        if (!listSchemaNames(session).contains(schemaNameValue)) {
+        if (false == listSchemaNames(session).contains(schemaNameValue)) {

112-118: 🛠️ Refactor suggestion

Coding style deviation: Use false == expr instead of !expr

Replace negation with explicit comparison to conform to coding guidelines.

-        if (!listSchemaNames(session).contains(schemaName)) {
+        if (false == listSchemaNames(session).contains(schemaName)) {
             return null;
         }

-        if (!listTables(schemaName).contains(tableName.getTableName())) {
+        if (false == listTables(schemaName).contains(tableName.getTableName())) {

158-160: 🛠️ Refactor suggestion

Coding style deviation: Use false == expr instead of !expr

Replace negation with explicit comparison to conform to coding guidelines.

-        if (schemaName != null && !listSchemaNames(session).contains(schemaName)) {
+        if (schemaName != null && false == listSchemaNames(session).contains(schemaName)) {

166-168: ⚠️ Potential issue

Possible NPE when prefix contains only table name

SchemaTableName constructor rejects null schema names, but prefix.getSchemaName() could be null.

-            schemaTableNames = ImmutableList.of(new SchemaTableName(prefix.getSchemaName(), prefix.getTableName()));
+            String schema = Optional.ofNullable(prefix.getSchemaName()).orElse(DEFAULT_SCHEMA_NAME);
+            schemaTableNames = ImmutableList.of(new SchemaTableName(schema, prefix.getTableName()));
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (3)

55-56: 🧹 Nitpick (assertive)

Add class-level documentation

This base class lacks documentation explaining its purpose. Consider adding Javadoc to improve maintainability.

+/**
+ * Base class for CLP query tests providing common functionality for expression parsing,
+ * type resolution, and session management.
+ */
 public class TestClpQueryBase

98-101: 🧹 Nitpick (assertive)

Use shared SqlParser instance to reduce object churn

Creating a new SqlParser for every expression parsing is inefficient. These objects are heavyweight; use a single static parser instance.

+    private static final SqlParser SQL_PARSER = new SqlParser();
+    private static final ParsingOptions PARSING_OPTIONS = 
+            new ParsingOptions(ParsingOptions.DecimalLiteralTreatment.AS_DECIMAL);

     public static Expression expression(String sql)
     {
-        return ExpressionUtils.rewriteIdentifiersToSymbolReferences(new SqlParser().createExpression(sql, new ParsingOptions(ParsingOptions.DecimalLiteralTreatment.AS_DECIMAL)));
+        return ExpressionUtils.rewriteIdentifiersToSymbolReferences(SQL_PARSER.createExpression(sql, PARSING_OPTIONS));
     }

107-108: 🧹 Nitpick (assertive)

Use shared SqlParser instance here as well

This method also creates a new SqlParser instance. For consistency and efficiency, use the same shared SqlParser instance.

         Map<NodeRef<Expression>, Type> expressionTypes = getExpressionTypes(
                 session,
                 metadata,
-                new SqlParser(),
+                SQL_PARSER,
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a824146 and 30cf714.

📒 Files selected for processing (15)
  • presto-clp/pom.xml (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java
🧬 Code Graph Analysis (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
  • ClpColumnHandle (26-117)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
presto-spi/src/main/java/com/facebook/presto/spi/ColumnMetadata.java (1)
  • ColumnMetadata (29-205)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (3)
presto-main-base/src/main/java/com/facebook/presto/Session.java (1)
  • Session (75-939)
presto-main-base/src/main/java/com/facebook/presto/SystemSessionProperties.java (1)
  • SystemSessionProperties (88-3177)
presto-spi/src/test/java/com/facebook/presto/spi/TestingSession.java (1)
  • TestingSession (34-142)
⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: prestocpp-linux-build-for-test
  • GitHub Check: test (:presto-docs)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: changes
🔇 Additional comments (7)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1)

57-58: 🧹 Nitpick (assertive)

Use 'false ==' coding pattern for boolean expressions

According to the project coding guidelines, prefer false == <expression> rather than !<expression> or <expression> == false.

-        if (o == null || getClass() != o.getClass()) {
+        if (false == (o != null) || getClass() != o.getClass()) {
            return false;
        }

Likely an incorrect or invalid review comment.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java (1)

29-35: Implementation throws appropriate exception

The implementation properly throws an UnsupportedOperationException instead of returning null, which would cause NullPointerException when the method is called.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)

40-47: LGTM! Appropriate metadata provider configuration and error handling

The implementation correctly binds the MySQL metadata provider based on configuration and throws appropriate exceptions for unsupported types.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java (3)

44-46: Good handling of missing KQL expression case

This check correctly asserts that when no KQL expression is expected, none should be present. This addresses the issue raised in a previous review comment about missing negative assertions.


70-71: Clear and informative comment

The comment effectively explains that wildcard pattern '%hello%' doesn't get pushed down, which helps understand the test case's purpose and the connector's limitations.


32-38: Well-organized test helper method

The testFilter helper method is well-designed to make test cases concise and readable. It properly converts SQL expressions to ClpExpressions and verifies both KQL output and remaining expressions.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (1)

76-96: Well-designed SessionHolder class

The SessionHolder class is well-structured to encapsulate session state for tests, providing both connector session and session objects with clean accessor methods.

@wraymo wraymo force-pushed the clp_integration_fix branch from 30cf714 to 5a68d91 Compare May 12, 2025 02:40
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (37)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java (1)

28-35: Implementation looks good.

The method correctly throws an UnsupportedOperationException to explicitly indicate that record set provision is not supported in this connector, rather than returning null which would cause a NullPointerException.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1)

22-33: 🛠️ Refactor suggestion

Add a listSchemaNames method for completeness.

The interface is well-defined but is missing a method to retrieve schema names, which would make it more complete and consistent with other connector implementations.

public interface ClpMetadataProvider
{
    /**
     * Returns the list of column handles for the given table.
     */
    List<ClpColumnHandle> listColumnHandles(SchemaTableName schemaTableName);

    /**
     * Returns the list of table names in the given schema.
     */
    List<ClpTableHandle> listTableHandles(String schema);
+
+    /**
+     * Returns the list of all schema names.
+     */
+    List<String> listSchemaNames();
}
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)

49-56: Appropriate split provider configuration and error handling.

The implementation correctly binds the MySQL split provider based on configuration and throws appropriate exceptions for unsupported types.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (2)

33-38: ⚠️ Potential issue

Add null-safety & clarify contract of path.

path is a fundamental identifier for the split, yet the constructor allows null to slip through and the field is not annotated as non-null. A single null value propagating into the engine will surface later as an NPE that is much harder to diagnose.

+import static java.util.Objects.requireNonNull;
+
 @JsonCreator
 public ClpSplit(@JsonProperty("path") String path)
 {
-    this.path = path;
+    this.path = requireNonNull(path, "path is null");
 }

If null is semantically meaningful (e.g., "unspecified location"), please document that and keep the field annotated with @Nullable; otherwise, enforce the pre-condition as above.


57-61: 🧹 Nitpick (assertive)

Consider providing more specific info in getInfo().

The getInfo() method returns the instance itself, which might not be the most informative approach if specific split information is needed.

@Override
public Object getInfo()
{
-    return this;
+    return ImmutableMap.of("splitPath", path == null ? "NULL" : path);
}

This would require adding the following import:

import com.google.common.collect.ImmutableMap;
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (5)

50-50: 🛠️ Refactor suggestion

Consider using platform-independent temp directory

The hard-coded path /tmp/split_testdb might not work on all operating systems, particularly Windows.

-        final String metadataDbUrl = "jdbc:h2:file:/tmp/split_testdb;MODE=MySQL;DATABASE_TO_UPPER=FALSE";
+        final String tempDir = System.getProperty("java.io.tmpdir");
+        final String metadataDbUrl = String.format("jdbc:h2:file:%s/split_testdb;MODE=MySQL;DATABASE_TO_UPPER=FALSE", tempDir);

84-84: 🛠️ Refactor suggestion

Use platform-independent path construction

Hard-coded path separators may cause issues on different operating systems.

-                    pstmt.setString(3, "/tmp/archives/" + tableName);
+                    pstmt.setString(3, new File(System.getProperty("java.io.tmpdir"), "archives" + File.separator + tableName).getPath());

118-126: 🛠️ Refactor suggestion

Improve file cleanup handling in tearDown method

The current implementation has several issues:

  1. Uses hard-coded paths that aren't platform-independent
  2. Doesn't handle exceptions during file deletion
  3. File deletion failures are silently ignored
    @AfterMethod
    public void tearDown()
    {
-        File dbFile = new File("/tmp/split_testdb.mv.db");
-        File lockFile = new File("/tmp/split_testdb.trace.db"); // Optional, H2 sometimes creates this
-        if (dbFile.exists()) {
-            dbFile.delete();
-            System.out.println("Deleted database file: " + dbFile.getAbsolutePath());
-        }
-        if (lockFile.exists()) {
-            lockFile.delete();
-        }
+        String tempDir = System.getProperty("java.io.tmpdir");
+        File dbFile = new File(tempDir, "split_testdb.mv.db");
+        File lockFile = new File(tempDir, "split_testdb.trace.db"); // Optional, H2 sometimes creates this
+        try {
+            if (dbFile.exists() && !dbFile.delete()) {
+                System.err.println("Failed to delete database file: " + dbFile.getAbsolutePath());
+            }
+            else if (dbFile.exists()) {
+                System.out.println("Deleted database file: " + dbFile.getAbsolutePath());
+            }
+            
+            if (lockFile.exists() && !lockFile.delete()) {
+                System.err.println("Failed to delete lock file: " + lockFile.getAbsolutePath());
+            }
+        }
+        catch (Exception e) {
+            System.err.println("Error cleaning up database files: " + e.getMessage());
+        }
    }

139-140: 🛠️ Refactor suggestion

Use platform-independent path for split verification

The expected path in the assertion uses hard-coded Unix-style separators.

-                assertEquals(splits.get(i).getPath(), "/tmp/archives/" + tableName + "/id_" + i);
+                String expectedPath = new File(new File(new File(System.getProperty("java.io.tmpdir"), "archives"), tableName), "id_" + i).getPath();
+                assertEquals(splits.get(i).getPath(), expectedPath);

129-143: 🧹 Nitpick (assertive)

Enhance test coverage for ClpSplit properties

The test only verifies the split path format but doesn't check other properties of the splits.

    @Test
    public void testListSplits()
    {
        ClpSplitProvider splitProvider = new ClpMySqlSplitProvider(config);
        for (String tableName : TABLE_NAME_LIST) {
            ClpTableLayoutHandle layoutHandle = new ClpTableLayoutHandle(
                    new ClpTableHandle(new SchemaTableName(TABLE_SCHEMA, tableName), ClpTableHandle.StorageType.FS),
                    Optional.empty());
            List<ClpSplit> splits = splitProvider.listSplits(layoutHandle);
            assertEquals(splits.size(), NUM_SPLITS);
            for (int i = 0; i < NUM_SPLITS; i++) {
                assertEquals(splits.get(i).getPath(), "/tmp/archives/" + tableName + "/id_" + i);
+                // Verify other important split properties
+                assertNotNull(splits.get(i).getConnectorId(), "ConnectorId should not be null");
+                assertNotNull(splits.get(i).getConnectorSplitIdentity(), "SplitIdentity should not be null");
            }
        }
    }
presto-clp/pom.xml (1)

1-128: 🛠️ Refactor suggestion

Add <build> section to presto-clp/pom.xml

The CLP connector POM omits a <build> block. All other connector modules declare build plugins to ensure the presto-plugin packaging is handled correctly.

Please add a <build> section immediately after the <properties> block in presto-clp/pom.xml:

     </properties>
+
+    <build>
+        <plugins>
+            <plugin>
+                <groupId>org.apache.maven.plugins</groupId>
+                <artifactId>maven-surefire-plugin</artifactId>
+                <configuration>
+                    <skipTests>true</skipTests>
+                </configuration>
+            </plugin>
+            <!-- include any additional plugins (enforcer, packaging, etc.) as in other connectors -->
+        </plugins>
+    </build>
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)

104-104: 🧹 Nitpick (assertive)

Simplify boolean comparison

Since nullable is a primitive boolean, using Objects.equals() is unnecessary.

-                Objects.equals(this.nullable, other.nullable);
+                this.nullable == other.nullable;
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java (2)

72-75: 🧹 Nitpick (assertive)

Use Objects.equals for null-safe field comparison

While fields are likely non-null due to constructor initialization, using Objects.equals provides additional safety and follows best practices.

        ClpTableHandle other = (ClpTableHandle) obj;
-        return this.schemaTableName.equals(other.schemaTableName) &&
-               this.storageType == other.storageType;
+        return Objects.equals(this.schemaTableName, other.schemaTableName) &&
+               this.storageType == other.storageType;

63-71: 🛠️ Refactor suggestion

Follow coding guidelines for boolean expressions in equals method

The equals method uses !expression pattern instead of the preferred false == expression pattern specified in the coding guidelines.

    @Override
    public boolean equals(Object obj)
    {
        if (this == obj) {
            return true;
        }
-        if (obj == null || getClass() != obj.getClass()) {
+        if (false == (obj != null) || false == (getClass() == obj.getClass())) {
            return false;
        }
        ClpTableHandle other = (ClpTableHandle) obj;
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1)

84-93: Avoid swallowing exceptions during shutdown

If lifeCycleManager.stop() fails, the connector logs the exception but still returns normally. This can mask resource-leak issues in production and complicate graceful shutdown.

    @Override
    public final void shutdown()
    {
        try {
            lifeCycleManager.stop();
        }
        catch (Exception e) {
            log.error(e, "Error shutting down connector");
+            throw new RuntimeException("Error shutting down connector", e);
        }
    }
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (3)

68-76: Improve database connection security and reliability

The connection method needs several improvements:

  1. Follows coding guideline using false == expression pattern
  2. Uses prepared statement for DB selection
  3. Adds connection timeout
private Connection getConnection() throws SQLException
{
-    Connection connection = DriverManager.getConnection(config.getMetadataDbUrl(), config.getMetadataDbUser(), config.getMetadataDbPassword());
+    Properties connectionProps = new Properties();
+    connectionProps.setProperty("user", config.getMetadataDbUser());
+    connectionProps.setProperty("password", config.getMetadataDbPassword());
+    connectionProps.setProperty("connectTimeout", "10000"); // 10 second timeout
+    
+    Connection connection = DriverManager.getConnection(config.getMetadataDbUrl(), connectionProps);
    String dbName = config.getMetadataDbName();
-    if (dbName != null && !dbName.isEmpty()) {
-        connection.createStatement().execute("USE " + dbName);
+    if (null != dbName && false == dbName.isEmpty()) {
+        try (PreparedStatement statement = connection.prepareStatement("USE ?")) {
+            statement.setString(1, dbName);
+            statement.execute();
+        }
    }
    return connection;
}

85-86: Validate table name to prevent SQL injection

The archivePathQuery still uses string formatting with tableName directly, which could be vulnerable to SQL injection.

String tablePathQuery = String.format(SQL_SELECT_DATASETS_TEMPLATE, config.getMetadataTablePrefix(), tableName);
-String archivePathQuery = String.format(SQL_SELECT_ARCHIVES_TEMPLATE, config.getMetadataTablePrefix(), tableName);
+// Ensure tableName is a valid identifier
+if (false == ClpConfig.SAFE_SQL_IDENTIFIER.matcher(tableName).matches()) {
+    throw new IllegalArgumentException("Invalid table identifier: " + tableName);
+}
+String archiveTable = config.getMetadataTablePrefix() + tableName + ARCHIVE_TABLE_SUFFIX;
+String archivePathQuery = String.format("SELECT %s FROM %s", ARCHIVES_TABLE_COLUMN_ID, archiveTable);

117-119: Correct throwable logging signature

Logger expects the throwable as the first argument when you want proper stack-trace output.

-            log.warn("Database error while processing splits for %s: %s", tableName, e);
+            log.warn(e, "Database error while processing splits for %s", tableName);
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (3)

60-66: Avoid hard-coded /tmp paths in tests

Using a fixed on-disk file (/tmp/metadata_testdb) makes tests non-hermetic and can fail when run concurrently by multiple build agents or on systems without write access to /tmp. Prefer an in-memory H2 instance or Files.createTempDirectory.

-        final String metadataDbUrl = "jdbc:h2:file:/tmp/metadata_testdb;MODE=MySQL;DATABASE_TO_UPPER=FALSE";
+        final String metadataDbUrl = "jdbc:h2:mem:metadata_testdb;MODE=MySQL;DATABASE_TO_UPPER=FALSE";

136-144: Check return value of File.delete() and avoid System.out

The test silently ignores failed deletions and writes to stdout. Switching to assertion methods will surface issues and keep test output clean.

    @AfterMethod
    public void tearDown()
    {
        File dbFile = new File("/tmp/metadata_testdb.mv.db");
        File lockFile = new File("/tmp/metadata_testdb.trace.db"); // Optional, H2 sometimes creates this
        if (dbFile.exists()) {
-            dbFile.delete();
-            System.out.println("Deleted database file: " + dbFile.getAbsolutePath());
+            assertTrue(dbFile.delete(), "Failed to delete database file: " + dbFile.getAbsolutePath());
        }
        if (lockFile.exists()) {
-            lockFile.delete();
+            assertTrue(lockFile.delete(), "Failed to delete lock file: " + lockFile.getAbsolutePath());
        }
    }

105-105: 🧹 Nitpick (assertive)

Avoid hard-coded paths in test data

The absolute path /tmp/archives/ is used for test data, which could cause issues on systems without write access to /tmp.

-                pstmt.setString(3, "/tmp/archives/" + TABLE_NAME);
+                pstmt.setString(3, "./test-archives/" + TABLE_NAME);
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (3)

135-146: Column order is still nondeterministic

Iteration over root.children.entrySet() yields a JVM-dependent order, so the resulting list of column handles can vary between workers, jeopardizing metadata consistency.
Please sort the entries or switch to an ordered map.


152-163: Negation style violates the project guideline

if (!polymorphicTypeEnabled) and else if (!existing.type.equals(newType)) should use the prescribed false == <expr> form.

-        if (!polymorphicTypeEnabled) {
+        if (false == polymorphicTypeEnabled) {
...
-        else if (!existing.type.equals(newType)) {
+        else if (false == existing.type.equals(newType)) {

190-196: Nested field order remains nondeterministic

HashMap traversal in buildRowType() means inner RowType field order can fluctuate.
Sort the child entries or use a deterministic map to ensure reproducible type signatures.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (4)

71-74: Style + NPE guard on database name check

The negation style and null-check order break project conventions and risk an NPE.
Use guideline-compliant ordering:

-        if (dbName != null && !dbName.isEmpty()) {
+        if (null != dbName && false == dbName.isEmpty()) {

86-90: Potential SQL-injection via unvalidated table name

schemaTableName.getTableName() is interpolated into query without first verifying it with isValidIdentifier().
Validate (or quote) the identifier before building the SQL to avoid malicious input.


98-100: Incorrect log-formatting drops stack trace

Passing the throwable as a format argument suppresses the stack trace.
Follow Airlift logging API: log.warn(e, "...").

-            log.warn("Failed to load table schema for %s: %s", schemaTableName.getTableName(), e);
+            log.warn(e, "Failed to load table schema for %s", schemaTableName.getTableName());

120-126: Same log-formatting issue when listing tables

-            log.warn("Failed to load table names: %s", e);
+            log.warn(e, "Failed to load table names");
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java (1)

60-66: Minor test optimisation: reuse SessionHolder

Each test method instantiates its own SessionHolder, increasing setup time.
Move it to a @BeforeMethod or class field to reduce overhead.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (3)

55-56: 🧹 Nitpick (assertive)

Add descriptive Javadoc to clarify purpose and usage
This is a reusable test-base that will be inherited by multiple suites, yet it still lacks any explanatory header. A short Javadoc block improves discoverability for future contributors.


98-101: 🧹 Nitpick (assertive)

Reuse a single static SqlParser instead of instantiating per call
Creating a new SqlParser and ParsingOptions for every invocation of expression() is unnecessarily expensive. Past feedback already highlighted this; consider caching them as private static final fields.

-private static Expression expression(String sql)
-{
-    return ExpressionUtils.rewriteIdentifiersToSymbolReferences(
-            new SqlParser().createExpression(sql,
-                    new ParsingOptions(ParsingOptions.DecimalLiteralTreatment.AS_DECIMAL)));
-}
+private static final SqlParser SQL_PARSER = new SqlParser();
+private static final ParsingOptions PARSING_OPTIONS =
+        new ParsingOptions(ParsingOptions.DecimalLiteralTreatment.AS_DECIMAL);
+
+private static Expression expression(String sql)
+{
+    return ExpressionUtils.rewriteIdentifiersToSymbolReferences(
+            SQL_PARSER.createExpression(sql, PARSING_OPTIONS));
+}

104-110: 🧹 Nitpick (assertive)

toRowExpression repeats new SqlParser() – align with the shared instance
The second instantiation duplicates the overhead warned about above; please switch to the same static parser.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (5)

69-73: 🧹 Nitpick (assertive)

Add error handling around provider calls
clpMetadataProvider.listColumnHandles may throw and poison the cache, propagating as UncheckedExecutionException. Wrapping in try/catch with meaningful logging will make operational debugging easier.


98-101: 🧹 Nitpick (assertive)

Follow project coding-style: use false == expr instead of !expr
The guideline (false == <expression>) applies to all Java sources in this repo. Please update the condition.

-        if (!listSchemaNames(session).contains(schemaNameValue)) {
+        if (false == listSchemaNames(session).contains(schemaNameValue)) {

111-114: 🧹 Nitpick (assertive)

Style consistency: prefer false == expression
Same style concern as above for the getTableHandle guard.

-        if (!listSchemaNames(session).contains(schemaName)) {
+        if (false == listSchemaNames(session).contains(schemaName)) {

156-158: 🧹 Nitpick (assertive)

Maintain style guideline in prefix validation
Yet another occurrence of the !expr pattern—please update.

-        if (schemaName != null && !listSchemaNames(session).contains(schemaName)) {
+        if (schemaName != null && false == listSchemaNames(session).contains(schemaName)) {

161-167: ⚠️ Potential issue

Possible NullPointerException when schema is omitted
prefix.getSchemaName() can be null; passing it to the SchemaTableName constructor will throw. This was previously reported but remains unresolved.

-            schemaTableNames = ImmutableList.of(new SchemaTableName(prefix.getSchemaName(), prefix.getTableName()));
+            String schema = Optional.ofNullable(prefix.getSchemaName()).orElse(DEFAULT_SCHEMA_NAME);
+            schemaTableNames = ImmutableList.of(new SchemaTableName(schema, prefix.getTableName()));
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 30cf714 and 5a68d91.

📒 Files selected for processing (17)
  • presto-clp/pom.xml (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java
🧬 Code Graph Analysis (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
presto-spi/src/main/java/com/facebook/presto/spi/ColumnMetadata.java (1)
  • ColumnMetadata (29-205)
⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: spark-integration (8)
  • GitHub Check: hive-tests (17.0.13)
  • GitHub Check: hive-dockerized-tests (8)
  • GitHub Check: hive-tests (8)
  • GitHub Check: arrowflight-java-tests (17.0.13, :presto-base-arrow-flight)
  • GitHub Check: test (:presto-docs)
  • GitHub Check: prestocpp-linux-build-for-test
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: maven-checks (8)
🔇 Additional comments (18)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)

40-47: Good conditional binding based on config, with proper error handling.

The implementation correctly binds the MySQL metadata provider based on configuration and throws appropriate exceptions for unsupported types.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (2)

57-62: LGTM: ClpConfig setup is correctly configured

The configuration is properly set up with the necessary parameters for metadata database connection.


97-98: 🧹 Nitpick (assertive)

Follow coding guidelines for boolean expressions

According to the coding guidelines, prefer false == <expression> rather than negated expressions.

-                        archiveTableName);
-                stmt.execute(createArchiveTableSQL);
+                        archiveTableName);
+                stmt.execute(createArchiveTableSQL);

Likely an incorrect or invalid review comment.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (3)

25-49: Implementation of TableLayoutHandle looks good

The class correctly implements ConnectorTableLayoutHandle with appropriate fields for table and KQL query.


61-78: Good implementation of equals, hashCode and toString

The implementation properly includes all fields in equals, hashCode, and toString methods, ensuring consistent behavior.


57-58: 🧹 Nitpick (assertive)

Follow coding guidelines for boolean expressions

According to the coding guidelines, prefer false == <expression> rather than using negated expressions.

-        if (o == null || getClass() != o.getClass()) {
+        if (false == (o != null) || getClass() != o.getClass()) {
            return false;
        }

Likely an incorrect or invalid review comment.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (2)

76-83: Good implementation of getColumnMetadata

The method correctly builds a ColumnMetadata object with the appropriate properties.


97-98: 🧹 Nitpick (assertive)

Follow coding guidelines for boolean expressions

According to the coding guidelines, prefer false == <expression> rather than using negated expressions.

-        if (obj == null || getClass() != obj.getClass()) {
+        if (false == (obj != null) || getClass() != obj.getClass()) {
            return false;
        }

Likely an incorrect or invalid review comment.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java (1)

77-84: LGTM! Good implementation of toString with detailed output

The toString implementation properly uses toStringHelper to create a detailed and helpful string representation including all fields.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (2)

35-52: LGTM! Good dependency injection and null checks

The constructor properly uses dependency injection and applies null checks to all parameters using requireNonNull.


54-58: LGTM! Good use of injected provider

The connector properly returns the injected optimizer provider instance rather than creating a new one.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (3)

48-53: LGTM! SQL templates with parameterized query for datasets

The SQL_SELECT_DATASETS_TEMPLATE correctly uses a parameterized query with ? placeholder for the WHERE clause.


91-100: LGTM! Good use of prepared statement with parameter binding

The code correctly uses a prepared statement with proper parameter binding to safely insert the table name.


102-102: 🧹 Nitpick (assertive)

Follow coding guidelines for boolean expressions

Use false == expression pattern for boolean expressions per coding guidelines.

-            if (tablePath == null || tablePath.isEmpty()) {
+            if (null == tablePath || false == tablePath.isEmpty()) {

Likely an incorrect or invalid review comment.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (3)

147-151: LGTM! Good test for schema names listing

The test correctly verifies that the expected schema names are returned by the metadata provider.


153-159: LGTM! Good test for table listing

The test properly verifies that the expected tables are returned by the metadata provider.


161-204: LGTM! Comprehensive test for table metadata

The test thoroughly verifies complex metadata structures including nested row types and arrays.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java (1)

40-55: Negative-assertion gap resolved – looks good

The added else branches now assert the absence of unexpected push-downs and residuals.
This closes the regression hole flagged earlier.

@wraymo wraymo force-pushed the clp_integration_fix branch from 5a68d91 to bf9da2e Compare May 12, 2025 21:46
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (41)
presto-clp/pom.xml (2)

5-19: Add <build> section to presto-clp/pom.xml

The CLP connector POM omits a <build> block. All other connector modules declare build plugins—such as the Maven Surefire and Enforcer plugins—to ensure the presto-plugin packaging is handled correctly.

Please add a <build> section immediately after the <properties> block.


22-25: Consider specifying a version range for mysql-connector-j

The MySQL connector dependency doesn't specify a version constraint, which might lead to unexpected behavior if the version changes.

<dependency>
    <groupId>com.mysql</groupId>
    <artifactId>mysql-connector-j</artifactId>
+   <version>[8.0.0,)</version>
    <scope>runtime</scope>
</dependency>
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1)

22-33: Consider adding a listSchemaNames method for completeness

The interface is well-defined, but consider adding a listSchemaNames() method to retrieve all available schema names. This would make the interface more complete and consistent with other connector implementations.

public interface ClpMetadataProvider
{
    /**
     * Returns the list of column handles for the given table.
     */
    List<ClpColumnHandle> listColumnHandles(SchemaTableName schemaTableName);

    /**
     * Returns the list of table names in the given schema.
     */
    List<ClpTableHandle> listTableHandles(String schema);
+
+    /**
+     * Returns the list of all schema names.
+     */
+    List<String> listSchemaNames();
}
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)

49-56: Appropriate split provider configuration and error handling

The implementation correctly binds the MySQL split provider based on configuration and throws appropriate exceptions for unsupported types.

For future extensibility, if more provider types are added later, you might consider refactoring these conditionals to a more scalable approach like a provider registry or factory pattern.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (2)

33-37: ⚠️ Potential issue

Add null-safety to protect against NPEs

The constructor allows null values for the path parameter without validation, which could lead to NullPointerExceptions later in execution.

+import static java.util.Objects.requireNonNull;
+
 @JsonCreator
 public ClpSplit(@JsonProperty("path") String path)
 {
-    this.path = path;
+    this.path = requireNonNull(path, "path is null");
 }

57-61: 🧹 Nitpick (assertive)

Consider providing more specific info in getInfo()

The getInfo() method returns the instance itself, which might not be the most informative approach if specific split information is needed.

+import com.google.common.collect.ImmutableMap;
+
 @Override
 public Object getInfo()
 {
-    return this;
+    return ImmutableMap.of("splitPath", path);
 }
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (5)

50-50: ⚠️ Potential issue

Use platform-independent temp directory

The hard-coded path /tmp/split_testdb might not work on all operating systems, particularly Windows.

-        final String metadataDbUrl = "jdbc:h2:file:/tmp/split_testdb;MODE=MySQL;DATABASE_TO_UPPER=FALSE";
+        final String tempDir = System.getProperty("java.io.tmpdir");
+        final String metadataDbUrl = String.format("jdbc:h2:file:%s/split_testdb;MODE=MySQL;DATABASE_TO_UPPER=FALSE", tempDir);

83-84: 🛠️ Refactor suggestion

Use platform-independent path construction

Hard-coded path separators may cause issues on different operating systems.

-                    pstmt.setString(3, "/tmp/archives/" + tableName);
+                    pstmt.setString(3, new File(System.getProperty("java.io.tmpdir"), "archives" + File.separator + tableName).getPath());

118-126: 🛠️ Refactor suggestion

Improve file cleanup handling in tearDown method

The current implementation has several issues:

  1. Uses hard-coded paths that aren't platform-independent
  2. Doesn't handle exceptions during file deletion
  3. File deletion failures are silently ignored
    @AfterMethod
    public void tearDown()
    {
-        File dbFile = new File("/tmp/split_testdb.mv.db");
-        File lockFile = new File("/tmp/split_testdb.trace.db"); // Optional, H2 sometimes creates this
-        if (dbFile.exists()) {
-            dbFile.delete();
-            System.out.println("Deleted database file: " + dbFile.getAbsolutePath());
-        }
-        if (lockFile.exists()) {
-            lockFile.delete();
-        }
+        String tempDir = System.getProperty("java.io.tmpdir");
+        File dbFile = new File(tempDir, "split_testdb.mv.db");
+        File lockFile = new File(tempDir, "split_testdb.trace.db"); // Optional, H2 sometimes creates this
+        try {
+            if (dbFile.exists() && !dbFile.delete()) {
+                System.err.println("Failed to delete database file: " + dbFile.getAbsolutePath());
+            }
+            else if (dbFile.exists()) {
+                System.out.println("Deleted database file: " + dbFile.getAbsolutePath());
+            }
+            
+            if (lockFile.exists() && !lockFile.delete()) {
+                System.err.println("Failed to delete lock file: " + lockFile.getAbsolutePath());
+            }
+        }
+        catch (Exception e) {
+            System.err.println("Error cleaning up database files: " + e.getMessage());
+        }
    }

139-140: 🛠️ Refactor suggestion

Use platform-independent path for split verification

The expected path in the assertion uses hard-coded Unix-style separators.

-                assertEquals(splits.get(i).getPath(), "/tmp/archives/" + tableName + "/id_" + i);
+                String expectedPath = new File(new File(new File(System.getProperty("java.io.tmpdir"), "archives"), tableName), "id_" + i).getPath();
+                assertEquals(splits.get(i).getPath(), expectedPath);

129-143: 🧹 Nitpick (assertive)

Enhance test coverage for ClpSplit properties

The test only verifies the split path format but doesn't check other properties of the splits.

    @Test
    public void testListSplits()
    {
        ClpSplitProvider splitProvider = new ClpMySqlSplitProvider(config);
        for (String tableName : TABLE_NAME_LIST) {
            ClpTableLayoutHandle layoutHandle = new ClpTableLayoutHandle(
                    new ClpTableHandle(new SchemaTableName(TABLE_SCHEMA, tableName), ClpTableHandle.StorageType.FS),
                    Optional.empty());
            List<ClpSplit> splits = splitProvider.listSplits(layoutHandle);
            assertEquals(splits.size(), NUM_SPLITS);
            for (int i = 0; i < NUM_SPLITS; i++) {
                assertEquals(splits.get(i).getPath(), "/tmp/archives/" + tableName + "/id_" + i);
+                // Verify other important split properties
+                assertEquals(splits.get(i).getNodeSelectionStrategy(), NodeSelectionStrategy.NO_PREFERENCE, "Node selection strategy should be NO_PREFERENCE");
+                assertTrue(splits.get(i).getPreferredNodes(null).isEmpty(), "Preferred nodes list should be empty");
+                assertNotNull(splits.get(i).getInfo(), "Split info should not be null");
            }
        }
    }
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)

104-104: 🧹 Nitpick (assertive)

Simplify boolean comparison

Since nullable is a primitive boolean, using Objects.equals() is unnecessary.

-                Objects.equals(this.nullable, other.nullable);
+                this.nullable == other.nullable;
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java (1)

63-75: Follow project coding guidelines for conditional expressions.

The equals method uses the pattern !expression instead of the preferred false == expression style mentioned in the coding guidelines. Additionally, consider using Objects.equals() for null-safe comparison of schemaTableName.

@Override
public boolean equals(Object obj)
{
    if (this == obj) {
        return true;
    }
-    if (obj == null || getClass() != obj.getClass()) {
+    if (null == obj || false == (getClass() == obj.getClass())) {
        return false;
    }
    ClpTableHandle other = (ClpTableHandle) obj;
-    return this.schemaTableName.equals(other.schemaTableName) &&
+    return Objects.equals(this.schemaTableName, other.schemaTableName) &&
           this.storageType == other.storageType;
}
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1)

85-93: Swallowing exceptions during shutdown may hide problems

If lifeCycleManager.stop() fails, the connector logs the exception but still returns normally. That can mask resource-leak issues in production and complicates graceful shutdown on coordinators/workers. Consider re-throwing as a RuntimeException (Presto catches those and surfaces them) after logging, or at least returning a failed future if the lifecycle is managed asynchronously.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (3)

70-76: Improve connection handling for security and resiliency

Several improvements should be made to the database connection handling:

  1. No connection timeout is specified, which could lead to hanging if the database is unreachable
  2. The "USE dbName" statement is executed directly without using a prepared statement
  3. The method doesn't follow the coding guideline - it uses !dbName.isEmpty() rather than false == dbName.isEmpty()
private Connection getConnection() throws SQLException
{
-    Connection connection = DriverManager.getConnection(config.getMetadataDbUrl(), config.getMetadataDbUser(), config.getMetadataDbPassword());
+    // Set connection properties including timeout
+    Properties connectionProps = new Properties();
+    connectionProps.setProperty("user", config.getMetadataDbUser());
+    connectionProps.setProperty("password", config.getMetadataDbPassword());
+    connectionProps.setProperty("connectTimeout", "10000"); // 10 second timeout
+    
+    Connection connection = DriverManager.getConnection(config.getMetadataDbUrl(), connectionProps);
    String dbName = config.getMetadataDbName();
-    if (dbName != null && !dbName.isEmpty()) {
-        connection.createStatement().execute("USE " + dbName);
+    if (null != dbName && false == dbName.isEmpty()) {
+        try (PreparedStatement statement = connection.prepareStatement("USE ?")) {
+            statement.setString(1, dbName);
+            statement.execute();
+        }
    }
    return connection;
}

85-86: Prevent SQL injection in archive query construction

While the datasets table query properly uses a parameterized query (line 92), the archive query still embeds the table name directly via string formatting. This creates a SQL injection risk.

- String archivePathQuery = String.format(SQL_SELECT_ARCHIVES_TEMPLATE, config.getMetadataTablePrefix(), tableName);
+ // Ensure tableName is a valid identifier
+ if (false == tableName.matches("[A-Za-z0-9_]+")) {
+     throw new IllegalArgumentException("Invalid table name: " + tableName);
+ }
+ String archiveTable = config.getMetadataTablePrefix() + tableName + ARCHIVE_TABLE_SUFFIX;
+ String archivePathQuery = "SELECT " + ARCHIVES_TABLE_COLUMN_ID + " FROM " + archiveTable;

118-119: Incorrect throwable logging signature

Logger expects the throwable as the first argument when you want stack-trace output:

-            log.warn("Database error while processing splits for %s: %s", tableName, e);
+            log.warn(e, "Database error while processing splits for %s", tableName);

Without this change the stack trace is lost and %s is filled with toString() of the exception.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (2)

60-66: Avoid hard-coded /tmp paths in tests

Using a fixed on-disk location (/tmp/metadata_testdb and related directories) makes the test non-hermetic, can break when two builds run concurrently, and fails on systems where /tmp is not writable (e.g., Windows agents).
Consider switching to an in-memory H2 database (jdbc:h2:mem:) or a randomly generated temporary directory created via Files.createTempDirectory().


136-144: Check the return value of File.delete() and remove System.out usage

File.delete() returns false when the deletion fails, but the result is ignored, potentially masking real problems.
Also, emitting to stdout from unit tests clutters CI logs; prefer logging (Airlift Logger) or assertions.

-        if (dbFile.exists()) {
-            dbFile.delete();
-            System.out.println("Deleted database file: " + dbFile.getAbsolutePath());
-        }
+        if (dbFile.exists()) {
+            assertTrue(dbFile.delete(), "Failed to delete test database file: " + dbFile.getAbsolutePath());
+        }
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (4)

136-146: Column order is nondeterministic

HashMap iteration order varies between JVM runs, leading to unstable metadata and potential worker mismatches.
Use a sorted map or sort the entries before adding to columns.


152-154: Negation style violates project guideline

The guideline says “prefer false == <expr> rather than !<expr>”.
if (!polymorphicTypeEnabled) should be rewritten to comply.


162-165: Negation style & equals usage

Same guideline issue: replace !existing.type.equals(newType) with false == existing.type.equals(newType).


191-196: Nested field order is nondeterministic

buildRowType() iterates over HashMap, giving inconsistent RowType signatures across nodes.
Sort the child entries or use an ordered map for reproducibility.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (4)

70-74: Guideline-compliant negation & null check

if (dbName != null && !dbName.isEmpty()) violates the house style and calls isEmpty() when dbName may be null.

-        if (dbName != null && !dbName.isEmpty()) {
+        if (null != dbName && false == dbName.isEmpty()) {

86-90: Validate table identifier before building query

schemaTableName.getTableName() is interpolated into the SQL without validation, opening a path for SQL-injection-style exploits from a compromised metadata store.
Call isValidIdentifier() and reject/escape unsafe names.


99-101: Fix log formatting to preserve stack trace

Treat the throwable as the first argument to Airlift’s logger, not part of the format string:

-            log.warn("Failed to load table schema for %s: %s", schemaTableName.getTableName(), e);
+            log.warn(e, "Failed to load table schema for %s", schemaTableName.getTableName());

125-126: Same logging issue when listing tables

Pass the exception as the first parameter:

-            log.warn("Failed to load table names: %s", e);
+            log.warn(e, "Failed to load table names");
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java (2)

57-66: 🧹 Nitpick (assertive)

Consider reusing SessionHolder to improve test performance

Each test method creates its own instance of SessionHolder. Creating a single instance as a class field or in a @BeforeMethod would reduce object creation overhead and improve test performance.

+    private SessionHolder sessionHolder;
+
+    @BeforeMethod
+    public void setUp()
+    {
+        sessionHolder = new SessionHolder();
+    }
+
     @Test
     public void testStringMatchPushdown()
     {
-        SessionHolder sessionHolder = new SessionHolder();

128-132: 🧹 Nitpick (assertive)

Consider separating this test case for clarity

This test case verifies different behavior (when parts of an OR expression can't be pushed down) from other test cases in the method. Moving it to a separate test method would improve readability and test isolation.

+    @Test
+    public void testOrPushdownWithUnpushableExpression()
+    {
+        SessionHolder sessionHolder = new SessionHolder();
+
+        testFilter("fare > 0 OR city.Name like 'b%' OR lower(city.Region.Name) = 'hello world' OR city.Region.Id != 1",
+                Optional.empty(),
+                Optional.of("fare > 0 OR city.Name like 'b%' OR lower(city.Region.Name) = 'hello world' OR city.Region.Id != 1"),
+                sessionHolder);
+    }
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (2)

34-43: 🛠️ Refactor suggestion

Validate required DB connection properties to fail fast

When the metadata provider is MYSQL, properties like metadataDbUrl, metadataDbName, metadataDbUser, and metadataDbPassword are essential but can currently remain null, leading to hard-to-trace NPEs deep in the call stack.

Add validation to ensure mandatory fields are supplied when using MySQL:

+import static com.facebook.presto.spi.StandardErrorCode.GENERIC_INTERNAL_ERROR;
+import static java.util.Objects.requireNonNull;
+
+import javax.annotation.PostConstruct;

public class ClpConfig
{
+    @PostConstruct
+    public void validate()
+    {
+        if (MetadataProviderType.MYSQL.equals(metadataProviderType)) {
+            requireNonNull(metadataDbUrl, "clp.metadata-db-url must be set when using MYSQL metadata provider");
+            requireNonNull(metadataDbName, "clp.metadata-db-name must be set when using MYSQL metadata provider");
+            requireNonNull(metadataDbUser, "clp.metadata-db-user must be set when using MYSQL metadata provider");
+            requireNonNull(metadataDbPassword, "clp.metadata-db-password must be set when using MYSQL metadata provider");
+        }
+    }

123-132: 🧹 Nitpick (assertive)

Add toLowerCase() for consistent table prefix naming

Users often mix cases in table-prefixes. Converting the validated value to lowercase would prevent later mismatches with case-sensitive backends.

+import java.util.Locale;
+
 @Config("clp.metadata-table-prefix")
 public ClpConfig setMetadataTablePrefix(String metadataTablePrefix)
 {
     if (metadataTablePrefix == null || !SAFE_SQL_IDENTIFIER.matcher(metadataTablePrefix).matches()) {
         throw new PrestoException(ClpErrorCode.CLP_UNSUPPORTED_CONFIG_OPTION, "Invalid metadataTablePrefix: " +
                 metadataTablePrefix + ". Only alphanumeric characters and underscores are allowed.");
     }

-    this.metadataTablePrefix = metadataTablePrefix;
+    this.metadataTablePrefix = metadataTablePrefix.toLowerCase(Locale.ROOT);
     return this;
 }
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (4)

55-56: 🧹 Nitpick (assertive)

Add class-level documentation

This base class provides important testing infrastructure, but lacks documentation explaining its purpose and usage.

+/**
+ * Base class for CLP query tests providing common functionality for expression parsing,
+ * type resolution, and session management. It establishes the test environment for
+ * validating CLP-specific query components and expression translations.
+ */
 public class TestClpQueryBase
 {

61-67: 🧹 Nitpick (assertive)

Make 'city' immutable to prevent accidental state mutation

All other column handle constants are declared final, but city is not. Making it immutable would prevent accidental state mutation between test runs.

-    protected static final ClpColumnHandle city = new ClpColumnHandle("city", RowType.from(ImmutableList.of(
+    protected static final ClpColumnHandle city = new ClpColumnHandle("city", RowType.from(ImmutableList.of(

98-101: 🧹 Nitpick (assertive)

Reuse a single SqlParser instance to reduce object churn

Creating a new SqlParser for every method call is inefficient. These objects are heavyweight; using a single static parser instance would improve test execution performance.

+    private static final SqlParser SQL_PARSER = new SqlParser();
+    private static final ParsingOptions PARSING_OPTIONS = 
+            new ParsingOptions(ParsingOptions.DecimalLiteralTreatment.AS_DECIMAL);

     public static Expression expression(String sql)
     {
-        return ExpressionUtils.rewriteIdentifiersToSymbolReferences(new SqlParser().createExpression(sql, new ParsingOptions(ParsingOptions.DecimalLiteralTreatment.AS_DECIMAL)));
+        return ExpressionUtils.rewriteIdentifiersToSymbolReferences(SQL_PARSER.createExpression(sql, PARSING_OPTIONS));
     }

108-108: 🧹 Nitpick (assertive)

Use shared SqlParser instance for consistency

This method also creates a new SqlParser instance. For consistency and efficiency, use the same shared SqlParser instance as recommended for the expression() method.

-                new SqlParser(),
+                SQL_PARSER,
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (6)

57-65: 🛠️ Refactor suggestion

Consider bounding cache size to avoid unbounded memory growth

Both caches are time-bounded but not size-bounded. In long-running clusters with large schema counts, this can exhaust heap memory.

Add size limits to the caches:

+import static com.facebook.airlift.configuration.ConfigDefaults.getMaxCacheSize;

 @Inject
 public ClpMetadata(ClpConfig clpConfig, ClpMetadataProvider clpMetadataProvider)
 {
     this.columnHandleCache = CacheBuilder.newBuilder()
+            .maximumSize(getMaxCacheSize(10000))
             .expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS)
             .refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS)
             .build(CacheLoader.from(this::loadColumnHandles));
     this.tableHandleCache = CacheBuilder.newBuilder()
+            .maximumSize(getMaxCacheSize(1000))
             .expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS)
             .refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS)
             .build(CacheLoader.from(this::loadTableHandles));

99-101: 🧹 Nitpick (assertive)

Use preferred coding style: false == expr rather than !expr

According to the coding guidelines, expressions like !listSchemaNames(...).contains(...) should be written as false == listSchemaNames(...).contains(...).

-    if (!listSchemaNames(session).contains(schemaNameValue)) {
+    if (false == listSchemaNames(session).contains(schemaNameValue)) {
         return ImmutableList.of();
     }

112-114: 🧹 Nitpick (assertive)

Use preferred coding style: false == expr rather than !expr

According to the coding guidelines, expressions like !listSchemaNames(...).contains(...) should be written as false == listSchemaNames(...).contains(...).

-    if (!listSchemaNames(session).contains(schemaName)) {
+    if (false == listSchemaNames(session).contains(schemaName)) {
         return null;
     }

157-159: 🧹 Nitpick (assertive)

Use preferred coding style: false == expr rather than !expr

According to the coding guidelines, expressions like !listSchemaNames(...).contains(...) should be written as false == listSchemaNames(...).contains(...).

-    if (schemaName != null && !listSchemaNames(session).contains(schemaName)) {
+    if (schemaName != null && false == listSchemaNames(session).contains(schemaName)) {
         return ImmutableMap.of();
     }

165-167: ⚠️ Potential issue

Prevent potential NPE when prefix contains only table name

When prefix.getSchemaName() is null, creating a new SchemaTableName will throw an exception because its constructor rejects null components.

     else {
-        schemaTableNames = ImmutableList.of(new SchemaTableName(prefix.getSchemaName(), prefix.getTableName()));
+        String schema = Optional.ofNullable(prefix.getSchemaName()).orElse(DEFAULT_SCHEMA_NAME);
+        schemaTableNames = ImmutableList.of(new SchemaTableName(schema, prefix.getTableName()));
     }

69-72: 🛠️ Refactor suggestion

Add error handling for cache loading failures

The cache loading methods don't contain any error handling. Without proper error handling, exceptions during metadata loading might be difficult to troubleshoot.

+import com.facebook.presto.spi.PrestoException;
+import static com.facebook.presto.plugin.clp.ClpErrorCode.CLP_METADATA_ERROR;

 private List<ClpColumnHandle> loadColumnHandles(SchemaTableName schemaTableName)
 {
-    return clpMetadataProvider.listColumnHandles(schemaTableName);
+    try {
+        return clpMetadataProvider.listColumnHandles(schemaTableName);
+    }
+    catch (Exception e) {
+        throw new PrestoException(CLP_METADATA_ERROR, 
+                "Failed to load column metadata for " + schemaTableName, e);
+    }
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5a68d91 and bf9da2e.

📒 Files selected for processing (19)
  • presto-clp/pom.xml (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1 hunks)
  • presto-docs/src/main/sphinx/connector/clp.rst (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java
🧬 Code Graph Analysis (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
  • ClpColumnHandle (26-117)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java (1)
  • ClpTableHandle (25-85)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
presto-spi/src/main/java/com/facebook/presto/spi/ColumnMetadata.java (1)
  • ColumnMetadata (29-205)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (1)
  • ClpMySqlMetadataProvider (33-129)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1)
  • ClpMySqlSplitProvider (33-123)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
  • ClpConfig (21-170)
⏰ Context from checks skipped due to timeout of 90000ms (17)
  • GitHub Check: product-tests-basic-environment (17.0.13)
  • GitHub Check: singlestore-dockerized-tests (8)
  • GitHub Check: spark-integration (17.0.13)
  • GitHub Check: test (8, :presto-cassandra -P test-cassandra-integration-smoke-test)
  • GitHub Check: test-other-modules (17.0.13)
  • GitHub Check: kudu (8)
  • GitHub Check: kudu (17.0.13)
  • GitHub Check: product-tests-specific-environment1 (8)
  • GitHub Check: product-tests-specific-environment2 (17.0.13)
  • GitHub Check: product-tests-specific-environment2 (8)
  • GitHub Check: product-tests-specific-environment1 (17.0.13)
  • GitHub Check: maven-checks (8)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: test (:presto-docs)
  • GitHub Check: arrowflight-java-tests (17.0.13, :presto-base-arrow-flight)
  • GitHub Check: arrowflight-java-tests (8, :presto-base-arrow-flight)
  • GitHub Check: prestocpp-linux-build-for-test
🔇 Additional comments (4)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java (1)

28-35: Implementation correctly throws UnsupportedOperationException

The implementation properly throws an exception rather than returning null, which is a good practice for unsupported operations. This ensures callers receive a clear error message instead of encountering a NullPointerException.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)

40-47: Appropriate error handling for unsupported metadata providers

The implementation correctly checks the metadata provider type and throws an appropriate exception with a descriptive message when an unsupported type is configured.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)

97-98: 🧹 Nitpick (assertive)

Follow project coding guidelines for boolean expressions

According to the coding guidelines, prefer false == <expression> rather than !<expression> for boolean expressions.

-        if (obj == null || getClass() != obj.getClass()) {
+        if (false == (obj != null) || getClass() != obj.getClass()) {
             return false;
         }

Likely an incorrect or invalid review comment.

presto-docs/src/main/sphinx/connector/clp.rst (1)

1-179: Documentation looks comprehensive and clear

The documentation provides a clear explanation of the CLP connector, its configuration options, data type mappings, and usage limitations. It aligns well with the implementation code and provides users with the necessary information to set up and use the connector effectively.

@wraymo wraymo force-pushed the clp_integration_fix branch from bf9da2e to 1b8d043 Compare May 14, 2025 13:50
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🔭 Outside diff range comments (1)
presto-docs/src/main/sphinx/connector/clp.rst (1)

177-179: 🛠️ Refactor suggestion

Missing information about query predicate pushdown capabilities.

The documentation explains that the connector only provides read access, but there's no mention of predicate/filter pushdown capabilities - an important feature for query performance that appears to be implemented in this connector based on the AI summary about ClpPlanOptimizer and ClpFilterToKqlConverter.

Add a section that documents the connector's predicate pushdown capabilities, which types of filters can be pushed down to the underlying storage, and any limitations of these capabilities.

♻️ Duplicate comments (40)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1)

22-33: Missing listSchemaNames() keeps the interface from parity with other Presto connectors
Re-raising prior observation.

Many Presto connectors expose a listSchemaNames() helper, which simplifies metadata enumeration and avoids unnecessary caller-side filtering. Consider adding it for consistency.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (4)

33-37: Add null-safety for the path parameter

The path parameter is a fundamental identifier for the split, yet the constructor allows null values which could lead to NullPointerExceptions later.

+import static java.util.Objects.requireNonNull;
+
 @JsonCreator
 public ClpSplit(@JsonProperty("path") String path)
 {
-    this.path = path;
+    this.path = requireNonNull(path, "path is null");
 }

39-43: Document the returned path field

The path getter method lacks documentation explaining what this field represents.

 @JsonProperty
+/**
+ * Returns the path to the split's data location.
+ * This path is used by the record set provider to read data from the corresponding location.
+ */
 public String getPath()
 {
     return path;
 }

57-61: Consider providing more specific information in getInfo()

The current implementation returns the split instance itself, which might not be the most informative approach if specific split information is needed.

-    public Object getInfo()
-    {
-        return this;
-    }
+    public Object getInfo()
+    {
+        return ImmutableMap.of("splitPath", path);
+    }

This would require adding the following import:

import com.google.common.collect.ImmutableMap;

31-31: 🧹 Nitpick (assertive)

Document the meaning of the path field

Add documentation explaining what this field represents in the context of a CLP split.

-    private final String path;
+    /**
+     * Path to the data location for this split.
+     * This is used by the record set provider to read data from the corresponding location.
+     */
+    private final String path;
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (4)

50-50: Use platform-independent temp directory

The hard-coded path /tmp/split_testdb might not work on all operating systems, particularly Windows.

-        final String metadataDbUrl = "jdbc:h2:file:/tmp/split_testdb;MODE=MySQL;DATABASE_TO_UPPER=FALSE";
+        final String tempDir = System.getProperty("java.io.tmpdir");
+        final String metadataDbUrl = String.format("jdbc:h2:file:%s/split_testdb;MODE=MySQL;DATABASE_TO_UPPER=FALSE", tempDir);

83-84: Use platform-independent path construction

Hard-coded path separators may cause issues on different operating systems.

-                    pstmt.setString(3, "/tmp/archives/" + tableName);
+                    pstmt.setString(3, new File(System.getProperty("java.io.tmpdir"), "archives" + File.separator + tableName).getPath());

118-126: Improve file cleanup handling in tearDown method

The current implementation has several issues:

  1. Uses hard-coded paths that aren't platform-independent
  2. Doesn't handle exceptions during file deletion
  3. File deletion failures are silently ignored
-        File dbFile = new File("/tmp/split_testdb.mv.db");
-        File lockFile = new File("/tmp/split_testdb.trace.db"); // Optional, H2 sometimes creates this
-        if (dbFile.exists()) {
-            dbFile.delete();
-            System.out.println("Deleted database file: " + dbFile.getAbsolutePath());
-        }
-        if (lockFile.exists()) {
-            lockFile.delete();
-        }
+        String tempDir = System.getProperty("java.io.tmpdir");
+        File dbFile = new File(tempDir, "split_testdb.mv.db");
+        File lockFile = new File(tempDir, "split_testdb.trace.db"); // Optional, H2 sometimes creates this
+        try {
+            if (dbFile.exists() && !dbFile.delete()) {
+                System.err.println("Failed to delete database file: " + dbFile.getAbsolutePath());
+            }
+            else if (dbFile.exists()) {
+                System.out.println("Deleted database file: " + dbFile.getAbsolutePath());
+            }
+            
+            if (lockFile.exists() && !lockFile.delete()) {
+                System.err.println("Failed to delete lock file: " + lockFile.getAbsolutePath());
+            }
+        }
+        catch (Exception e) {
+            System.err.println("Error cleaning up database files: " + e.getMessage());
+        }

142-142: Use platform-independent path for split verification

The expected path in the assertion uses hard-coded Unix-style separators.

-                assertEquals(splits.get(i).getPath(), "/tmp/archives/" + tableName + "/id_" + i);
+                String expectedPath = new File(new File(new File(System.getProperty("java.io.tmpdir"), "archives"), tableName), "id_" + i).getPath();
+                assertEquals(splits.get(i).getPath(), expectedPath);
presto-clp/pom.xml (2)

5-19: Add <build> section to pom.xml

The CLP connector POM omits a <build> block. All other connector modules declare build plugins to ensure the presto-plugin packaging is handled correctly.

Add a <build> section immediately after the <properties> block:

     </properties>
+
+    <build>
+        <plugins>
+            <plugin>
+                <groupId>org.apache.maven.plugins</groupId>
+                <artifactId>maven-surefire-plugin</artifactId>
+                <configuration>
+                    <skipTests>${air.check.skip-tests}</skipTests>
+                </configuration>
+            </plugin>
+        </plugins>
+    </build>

22-25: Specify a version constraint for mysql-connector-j

The MySQL connector dependency doesn't specify a version constraint, which might lead to unexpected behavior if the version changes.

        <dependency>
            <groupId>com.mysql</groupId>
            <artifactId>mysql-connector-j</artifactId>
+            <version>[8.0.0,)</version>
            <scope>runtime</scope>
        </dependency>
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1)

57-58: Follow project coding guidelines for boolean expressions

According to the coding guidelines, prefer false == <expression> rather than !<expression> for boolean expressions.

-        if (o == null || getClass() != o.getClass()) {
+        if (false == (o != null) || getClass() != o.getClass()) {
             return false;
         }
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)

104-104: Simplify boolean comparison.

Since nullable is a primitive boolean, using Objects.equals() is unnecessary.

-                Objects.equals(this.nullable, other.nullable);
+                this.nullable == other.nullable;
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1)

92-93: Incorrect throwable logging signature.

Logger expects the throwable as the first argument when you want stack-trace output:

-            log.warn("Database error while processing splits for %s: %s", tableName, e);
+            log.warn(e, "Database error while processing splits for %s", tableName);
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1)

85-93: Swallowing exceptions during shutdown may hide problems.

If lifeCycleManager.stop() fails, the connector logs the exception but still returns normally. That can mask resource-leak issues in production and complicates graceful shutdown on coordinators/workers.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (3)

60-66: Hard-coded /tmp path re-introduces non-hermetic test failures

The H2 database is still persisted under /tmp/metadata_testdb. This is fragile when tests run concurrently, on read-only filesystems, or inside containers that remap /tmp. Please switch to an in-memory URL (jdbc:h2:mem:) or create a truly private temp directory with Files.createTempDirectory(...).


109-118: Remove heavy Pair dependency for trivial tuples

Pulling in commons-math3 solely for Pair bloats the test class-path (>1 MB) and slows the build. A tiny record Pair(String name, ClpNodeType type) {} or Map.Entry is enough.


136-141: Check deletion result & avoid System.out in tests

File.delete() can silently fail, and System.out.println pollutes test logs. Replace with an assertion and a logger, e.g.:

-        if (dbFile.exists()) {
-            dbFile.delete();
-            System.out.println("Deleted database file: " + dbFile.getAbsolutePath());
-        }
+        if (dbFile.exists()) {
+            assertTrue(dbFile.delete(), "Failed to delete " + dbFile);
+        }
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java (1)

78-84: equals() still ignores null-safety advice

The earlier recommendation to use Objects.equals for the nullable schemaTableName and to remove the raw null check hasn’t been applied. This keeps the method brittle and inconsistent with the rest of the codebase.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (3)

135-146: 🛠️ Refactor suggestion

Column order remains nondeterministic – may break workers

Iterating over HashMap yields random order, so two workers can disagree on the column list, producing “cannot resolve symbol” errors at runtime. Sort the entries (or use LinkedHashMap) before emitting handles.

-        for (Map.Entry<String, ClpNode> entry : root.children.entrySet()) {
+        root.children.entrySet().stream()
+                .sorted(Map.Entry.comparingByKey())
+                .forEach(entry -> {
                     String name = entry.getKey();
                     ClpNode child = entry.getValue();
                     if (child.isLeaf()) {
                         columns.add(new ClpColumnHandle(name, child.originalName, child.type, true));
                     }
                     else {
                         Type rowType = buildRowType(child);
                         columns.add(new ClpColumnHandle(name, child.originalName, rowType, true));
                     }
-        }
+                });

190-196: 🛠️ Refactor suggestion

Deterministic field order for nested RowType is still missing

HashMap iteration leaks randomness into nested types as well. Apply the same sorting strategy when building RowType to guarantee stable signatures across the cluster.


152-163: 🧹 Nitpick (assertive)

Style: use guideline-compliant negation & null-safe comparison

The project mandates false == <expr> over !<expr>, and Objects.equals for null safety. Lines 152 and 162 still violate this rule.

-        if (!polymorphicTypeEnabled) {
+        if (false == polymorphicTypeEnabled) {

-            else if (!existing.type.equals(newType)) {
+            else if (false == existing.type.equals(newType)) {
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java (2)

60-61: 🧹 Nitpick (assertive)

Consider reusing SessionHolder instance to improve performance

Each test method creates a separate SessionHolder instance. Creating a single instance as a field or in a @BeforeMethod would improve test efficiency.

+    private SessionHolder sessionHolder;
+
+    @BeforeMethod
+    public void setup()
+    {
+        sessionHolder = new SessionHolder();
+    }
+
     @Test
     public void testStringMatchPushdown()
     {
-        SessionHolder sessionHolder = new SessionHolder();

128-132: 🧹 Nitpick (assertive)

Consider separating this OR test case to a dedicated test method

This test case verifies different behavior (OR expressions with unpushable components) than other tests in the method. It would be clearer as a separate test method.

+    @Test
+    public void testOrPushdownWithUnpushableExpression()
+    {
+        SessionHolder sessionHolder = new SessionHolder();
+
+        testFilter("fare > 0 OR city.Name like 'b%' OR lower(city.Region.Name) = 'hello world' OR city.Region.Id != 1",
+                Optional.empty(),
+                Optional.of("fare > 0 OR city.Name like 'b%' OR lower(city.Region.Name) = 'hello world' OR city.Region.Id != 1"),
+                sessionHolder);
+    }
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (4)

102-103: 🛠️ Refactor suggestion

Fix logging to properly include exception information

The current logging treats the exception as a format parameter, losing the stack trace information.

-            log.warn("Failed to load table schema for %s: %s", schemaTableName.getTableName(), e);
+            log.warn(e, "Failed to load table schema for %s", schemaTableName.getTableName());

130-131: 🛠️ Refactor suggestion

Fix logging to properly include exception information

The current logging treats the exception as a format parameter, losing the stack trace information.

-            log.warn("Failed to load table names: %s", e);
+            log.warn(e, "Failed to load table names");

75-77: 🛠️ Refactor suggestion

Fix negation style to conform to coding guidelines

The code uses !dbName.isEmpty() which violates the coding guideline that prefers false == <expression> rather than !<expression>.

-        if (dbName != null && !dbName.isEmpty()) {
+        if (null != dbName && false == dbName.isEmpty()) {
             connection.createStatement().execute(String.format("USE `%s`", dbName));
         }

89-91: ⚠️ Potential issue

Missing table identifier validation - SQL injection risk

The code directly interpolates schemaTableName.getTableName() into the SQL query without validation, creating a potential SQL injection vulnerability.

+        String tableName = schemaTableName.getTableName();
+        if (false == isValidIdentifier(tableName)) {
+            throw new IllegalArgumentException("Invalid table identifier: " + tableName);
+        }
         String query = String.format(SQL_SELECT_COLUMN_METADATA_TEMPLATE,
-                config.getMetadataTablePrefix(), schemaTableName.getTableName());
+                config.getMetadataTablePrefix(), tableName);
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (2)

126-132: 🧹 Nitpick (assertive)

Add case normalization for consistent table prefix handling

The metadata table prefix validation doesn't normalize case, which could lead to inconsistencies when working with case-sensitive database backends.

+import java.util.Locale;

...

         if (metadataTablePrefix == null || !SAFE_SQL_IDENTIFIER.matcher(metadataTablePrefix).matches()) {
             throw new PrestoException(ClpErrorCode.CLP_UNSUPPORTED_CONFIG_OPTION, "Invalid metadataTablePrefix: " +
                     metadataTablePrefix + ". Only alphanumeric characters and underscores are allowed.");
         }
 
-        this.metadataTablePrefix = metadataTablePrefix;
+        this.metadataTablePrefix = metadataTablePrefix.toLowerCase(Locale.ROOT);

33-42: ⚠️ Potential issue

Add validation for required DB connection properties

When using the MySQL metadata provider, several properties are required but not validated, which could lead to runtime failures.

import com.facebook.airlift.configuration.Config;
import com.facebook.presto.spi.PrestoException;

+import javax.annotation.PostConstruct;
+import static com.facebook.presto.spi.StandardErrorCode.CONFIGURATION_INVALID;
+import static java.util.Objects.requireNonNull;

...

+    @PostConstruct
+    public void validate()
+    {
+        if (MetadataProviderType.MYSQL.equals(metadataProviderType)) {
+            if (null == metadataDbUrl) {
+                throw new PrestoException(CONFIGURATION_INVALID, "clp.metadata-db-url must be set when using MYSQL provider");
+            }
+            if (null == metadataDbUser) {
+                throw new PrestoException(CONFIGURATION_INVALID, "clp.metadata-db-user must be set when using MYSQL provider");
+            }
+            if (null == metadataDbPassword) {
+                throw new PrestoException(CONFIGURATION_INVALID, "clp.metadata-db-password must be set when using MYSQL provider");
+            }
+        }
+    }
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (4)

55-56: 🧹 Nitpick (assertive)

Add class-level documentation

This base class lacks Javadoc explaining its purpose and usage, which would improve maintainability.

+/**
+ * Base class for CLP query tests providing shared utilities for expression parsing,
+ * type resolution, and session management to support testing filter pushdown and
+ * expression translation.
+ */
 public class TestClpQueryBase

61-66: 🧹 Nitpick (assertive)

Make 'city' variable immutable to match other field declarations

All other column handle constants are declared final, but city is not, creating inconsistency and potential for accidental state mutation.

-    protected static final ClpColumnHandle city = new ClpColumnHandle("city", RowType.from(ImmutableList.of(
+    protected static final ClpColumnHandle city = new ClpColumnHandle("city", RowType.from(ImmutableList.of(

98-101: 🧹 Nitpick (assertive)

Reuse a single SqlParser instance to improve performance

Creating a new SqlParser for every call is inefficient. Use a shared instance to reduce object churn and improve test performance.

+    private static final SqlParser SQL_PARSER = new SqlParser();
+    private static final ParsingOptions PARSING_OPTIONS = 
+            new ParsingOptions(ParsingOptions.DecimalLiteralTreatment.AS_DECIMAL);

     public static Expression expression(String sql)
     {
-        return ExpressionUtils.rewriteIdentifiersToSymbolReferences(new SqlParser().createExpression(sql, new ParsingOptions(ParsingOptions.DecimalLiteralTreatment.AS_DECIMAL)));
+        return ExpressionUtils.rewriteIdentifiersToSymbolReferences(SQL_PARSER.createExpression(sql, PARSING_OPTIONS));
     }

108-108: 🧹 Nitpick (assertive)

Use the shared SqlParser instance here as well

This method also creates a new SqlParser instance rather than reusing a shared instance.

         Map<NodeRef<Expression>, Type> expressionTypes = getExpressionTypes(
                 session,
                 metadata,
-                new SqlParser(),
+                SQL_PARSER,
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (6)

57-65: 🛠️ Refactor suggestion

Add cache size limits to prevent unbounded memory growth

Both the columnHandleCache and tableHandleCache have time-based expiration but no size limits. This could lead to excessive memory usage in large deployments with many tables.

 this.columnHandleCache = CacheBuilder.newBuilder()
+        .maximumSize(clpConfig.getMetadataCacheMaximumEntries())
         .expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS)
         .refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS)
         .build(CacheLoader.from(this::loadColumnHandles));
 this.tableHandleCache = CacheBuilder.newBuilder()
+        .maximumSize(clpConfig.getMetadataCacheMaximumEntries())
         .expireAfterWrite(clpConfig.getMetadataExpireInterval(), SECONDS)
         .refreshAfterWrite(clpConfig.getMetadataRefreshInterval(), SECONDS)
         .build(CacheLoader.from(this::loadTableHandles));

69-72: 🛠️ Refactor suggestion

Add error handling for metadata loading operations

The column handle loading method has no error handling, which could lead to unclear error messages when metadata queries fail.

 private List<ClpColumnHandle> loadColumnHandles(SchemaTableName schemaTableName)
 {
-    return clpMetadataProvider.listColumnHandles(schemaTableName);
+    try {
+        return clpMetadataProvider.listColumnHandles(schemaTableName);
+    }
+    catch (Exception e) {
+        throw new PrestoException(ClpErrorCode.CLP_METADATA_ERROR,
+                "Failed to load column metadata for " + schemaTableName, e);
+    }
 }

74-77: 🛠️ Refactor suggestion

Add error handling for table handle loading

Similar to column handle loading, table handle loading should include error handling for better diagnostics.

 private List<ClpTableHandle> loadTableHandles(String schemaName)
 {
-    return clpMetadataProvider.listTableHandles(schemaName);
+    try {
+        return clpMetadataProvider.listTableHandles(schemaName);
+    }
+    catch (Exception e) {
+        throw new PrestoException(ClpErrorCode.CLP_METADATA_ERROR,
+                "Failed to load table metadata for schema " + schemaName, e);
+    }
 }

98-101: 🧹 Nitpick (assertive)

Follow project coding guideline for boolean expressions

Project guidelines prefer false == expression rather than !expression.

-        if (!listSchemaNames(session).contains(schemaNameValue)) {
+        if (false == listSchemaNames(session).contains(schemaNameValue)) {
             return ImmutableList.of();
         }

112-114: 🧹 Nitpick (assertive)

Follow project coding guideline for boolean expressions

Project guidelines prefer false == expression rather than !expression.

-        if (!listSchemaNames(session).contains(schemaName)) {
+        if (false == listSchemaNames(session).contains(schemaName)) {
             return null;
         }

166-167: ⚠️ Potential issue

Handle potential NPE when prefix contains only table name

SchemaTableName's constructor will throw NullPointerException if schema name is null. This needs to be handled when creating SchemaTableName from a prefix that might only have a table name.

-            schemaTableNames = ImmutableList.of(new SchemaTableName(prefix.getSchemaName(), prefix.getTableName()));
+            String schema = Optional.ofNullable(prefix.getSchemaName()).orElse(DEFAULT_SCHEMA_NAME);
+            schemaTableNames = ImmutableList.of(new SchemaTableName(schema, prefix.getTableName()));
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bf9da2e and 1b8d043.

📒 Files selected for processing (19)
  • presto-clp/pom.xml (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1 hunks)
  • presto-docs/src/main/sphinx/connector/clp.rst (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java
🧬 Code Graph Analysis (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (1)
  • ClpMySqlMetadataProvider (33-134)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1)
  • ClpMySqlSplitProvider (32-97)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
  • ClpConfig (21-170)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
presto-spi/src/main/java/com/facebook/presto/spi/ColumnMetadata.java (1)
  • ColumnMetadata (29-205)
⏰ Context from checks skipped due to timeout of 90000ms (20)
  • GitHub Check: product-tests-specific-environment2 (8)
  • GitHub Check: product-tests-specific-environment1 (17.0.13)
  • GitHub Check: test-other-modules (8)
  • GitHub Check: test (17.0.13, :presto-cassandra -P test-cassandra-integration-smoke-test)
  • GitHub Check: hive-dockerized-tests (8)
  • GitHub Check: hive-dockerized-tests (17.0.13)
  • GitHub Check: hive-tests (17.0.13)
  • GitHub Check: hive-tests (8)
  • GitHub Check: singlestore-dockerized-tests (8)
  • GitHub Check: singlestore-dockerized-tests (17.0.13)
  • GitHub Check: spark-integration (17.0.13)
  • GitHub Check: spark-integration (8)
  • GitHub Check: product-tests-basic-environment (17.0.13)
  • GitHub Check: product-tests-basic-environment (8)
  • GitHub Check: arrowflight-java-tests (17.0.13, :presto-base-arrow-flight)
  • GitHub Check: arrowflight-java-tests (8, :presto-base-arrow-flight)
  • GitHub Check: test (:presto-docs)
  • GitHub Check: prestocpp-linux-build-for-test
  • GitHub Check: maven-checks (8)
  • GitHub Check: maven-checks (17.0.13)
🔇 Additional comments (4)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java (1)

28-35: UnsupportedOperationException is a sensible interim guard

Nice follow-up to the earlier feedback—returning null would have blown up later; the explicit exception makes the limitation clear at call-site.
No further action needed unless/​until physical row extraction is implemented.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1)

78-78:

❌ Incorrect review comment

Potential SQL injection vulnerability in query construction.

The query is constructed using string formatting with user-provided table names, which could be vulnerable to SQL injection attacks. Use prepared statements with proper parameter binding instead.


🏁 Script executed:

#!/bin/bash
# Check if ClpConfig has validation for table names
grep -A 10 "SAFE_SQL_IDENTIFIER" presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java

Length of output: 960


🏁 Script executed:

#!/bin/bash
# Display the beginning of ClpMySqlSplitProvider to inspect tableName origin and query template
sed -n '1,200p' presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java

Length of output: 3805


🏁 Script executed:

#!/bin/bash
# Examine ClpTableHandle to see how schema and table names are handled
sed -n '1,200p' presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java

Length of output: 2717


Ignore SQL injection warning in ClpMySqlSplitProvider.java

The metadataTablePrefix is already validated against SAFE_SQL_IDENTIFIER (only alphanumerics and underscores), tableName comes from Presto’s parsed SchemaTableName (also restricted to valid identifiers), and the combined table name is wrapped in backticks. There is no injection vector in this query construction – no changes required.

Likely an incorrect or invalid review comment.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java (1)

40-46: Proper negative assertion for KQL expression - great!

The code now correctly asserts that no KQL expression is present when none is expected, addressing a previous concern about hiding regressions.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)

126-126: 🛠️ Refactor suggestion

Fix negation style to conform to coding guidelines

The code uses metadataTablePrefix == null which violates the coding guideline that prefers null == expression over expression == null.

-        if (metadataTablePrefix == null || !SAFE_SQL_IDENTIFIER.matcher(metadataTablePrefix).matches()) {
+        if (null == metadataTablePrefix || false == SAFE_SQL_IDENTIFIER.matcher(metadataTablePrefix).matches()) {

Likely an incorrect or invalid review comment.

@wraymo wraymo force-pushed the clp_integration_fix branch from 1b8d043 to 30ec77f Compare May 14, 2025 14:47
2. rename splitPath to path in ClpSplit
3. add more properties for equals, hashCode, and toString in ClpTableLayoutHandle
4. remove public from methods in ClpMetadataProvider
5. change most fo log.error to log.warn in ClpMySqlSplitProvider and ClpMySqlMetadataProvider and fix one logging statement in ClpMySqlMetadataProvider
6. fix two minor issues in ClpMetadata
7. throw an exception in getRecordSet
8. improve TestClpOptimizer
9. refactor ClpSchemaTree and fix several bugs
10. add StorageType to ClpTableHandle
11. add archive_storage_type to ClpMySqlMetadataProvider and ClpMySqlSplitProvider
12. change listing table names to listing ClpTableHandle in ClpMetadataProvider reflect the changes in ClpMetadata
13. use constants in ClpMySqlMetadataProvider and ClpMySqlSplitProvider
14. update TestClpSplit and TestClpMetadata to include archive_storage_type
15. remove archive-source from ClpConfig and docs
16. fix legacy configuration names in clp docs
17. quote table names and column names in ClpMySqlMetadataProvider and ClpMySqlSplitProvider
18. add tablePath to ClpTableHandle and simplifies the logic in ClpMySqlSplitProvider
19. update docstring for ClpMySqlMetadataProvider
@wraymo wraymo force-pushed the clp_integration_fix branch from 30ec77f to 7ca1179 Compare May 14, 2025 14:47
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

♻️ Duplicate comments (36)
presto-clp/pom.xml (2)

16-19: Add <build> section to presto-clp/pom.xml

The CLP connector POM omits a <build> block. All other connector modules (e.g. presto-accumulo, presto-base-arrow-flight) declare build plugins—such as the Maven Surefire and Enforcer plugins—to ensure the presto-plugin packaging is handled correctly.

Please add a <build> section immediately after the <properties> block in presto-clp/pom.xml. For example:

     </properties>
+
+    <build>
+        <plugins>
+            <plugin>
+                <groupId>org.apache.maven.plugins</groupId>
+                <artifactId>maven-surefire-plugin</artifactId>
+                <configuration>
+                    <skipTests>true</skipTests>
+                </configuration>
+            </plugin>
+            <!-- include any additional plugins (enforcer, packaging, etc.) as in other connectors -->
+        </plugins>
+    </build>

22-25: Consider specifying a version range for mysql-connector-j

The MySQL connector dependency doesn't specify a version constraint, which might lead to unexpected behavior if the version changes.

        <dependency>
            <groupId>com.mysql</groupId>
            <artifactId>mysql-connector-j</artifactId>
+            <version>[8.0.0,)</version>
            <scope>runtime</scope>
        </dependency>
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1)

22-33: 🛠️ Refactor suggestion

Add missing listSchemaNames method

The interface is well-defined, but it should also include a listSchemaNames() method to retrieve all available schema names. This would make the interface more complete and consistent with other connector implementations.

public interface ClpMetadataProvider
{
    /**
     * Returns the list of column handles for the given table.
     */
    List<ClpColumnHandle> listColumnHandles(SchemaTableName schemaTableName);

    /**
     * Returns the list of table handles in the given schema.
     */
    List<ClpTableHandle> listTableHandles(String schema);
+
+    /**
+     * Returns the list of all schema names.
+     */
+    List<String> listSchemaNames();
}
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (2)

40-47: 🧹 Nitpick (assertive)

Refactor provider selection logic for better scalability

The current if/else chain with immediate PrestoException will become unwieldy as more provider types are added. Consider using a map-based approach to improve maintainability:

-        if (config.getMetadataProviderType() == ClpConfig.MetadataProviderType.MYSQL) {
-            binder.bind(ClpMetadataProvider.class).to(ClpMySqlMetadataProvider.class).in(Scopes.SINGLETON);
-        }
-        else {
-            throw new PrestoException(ClpErrorCode.CLP_UNSUPPORTED_METADATA_SOURCE,
-                    "Unsupported metadata provider type: " + config.getMetadataProviderType());
-        }
+        Map<ClpConfig.MetadataProviderType, Class<? extends ClpMetadataProvider>> metadataProviders =
+                ImmutableMap.of(ClpConfig.MetadataProviderType.MYSQL, ClpMySqlMetadataProvider.class);
+        
+        Class<? extends ClpMetadataProvider> metadataProviderClass = metadataProviders.get(config.getMetadataProviderType());
+        if (metadataProviderClass == null) {
+            throw new PrestoException(ClpErrorCode.CLP_UNSUPPORTED_METADATA_SOURCE,
+                    "Unsupported metadata provider type: " + config.getMetadataProviderType());
+        }
+        binder.bind(ClpMetadataProvider.class).to(metadataProviderClass).in(Scopes.SINGLETON);

49-55: 🧹 Nitpick (assertive)

Refactor split provider selection logic similarly

Apply the same map-based approach to the split provider selection:

-        if (config.getSplitProviderType() == ClpConfig.SplitProviderType.MYSQL) {
-            binder.bind(ClpSplitProvider.class).to(ClpMySqlSplitProvider.class).in(Scopes.SINGLETON);
-        }
-        else {
-            throw new PrestoException(ClpErrorCode.CLP_UNSUPPORTED_SPLIT_SOURCE,
-                    "Unsupported split provider type: " + config.getSplitProviderType());
-        }
+        Map<ClpConfig.SplitProviderType, Class<? extends ClpSplitProvider>> splitProviders =
+                ImmutableMap.of(ClpConfig.SplitProviderType.MYSQL, ClpMySqlSplitProvider.class);
+        
+        Class<? extends ClpSplitProvider> splitProviderClass = splitProviders.get(config.getSplitProviderType());
+        if (splitProviderClass == null) {
+            throw new PrestoException(ClpErrorCode.CLP_UNSUPPORTED_SPLIT_SOURCE,
+                    "Unsupported split provider type: " + config.getSplitProviderType());
+        }
+        binder.bind(ClpSplitProvider.class).to(splitProviderClass).in(Scopes.SINGLETON);
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (3)

33-37: Add null-safety check to constructor

The constructor accepts a path without verifying it's non-null, which could lead to NullPointerExceptions later when the path is used.

 @JsonCreator
 public ClpSplit(@JsonProperty("path") String path)
 {
-    this.path = path;
+    this.path = requireNonNull(path, "path is null");
 }

This would require adding the following import:

import static java.util.Objects.requireNonNull;

39-43: Document the returned path field

The getter method lacks documentation explaining what this path represents in the context of a CLP split.

 @JsonProperty
+/**
+ * Returns the path to the split's data location.
+ * This path is used by the record set provider to read data from the corresponding location.
+ */
 public String getPath()
 {
     return path;
 }

57-61: Consider providing more specific info in getInfo()

The getInfo() method returns the instance itself, which might not be the most informative approach if specific split information is needed.

-    public Object getInfo()
-    {
-        return this;
-    }
+    public Object getInfo()
+    {
+        return ImmutableMap.of("splitPath", path == null ? "NULL" : path);
+    }

This would require adding the following import:

import com.google.common.collect.ImmutableMap;
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1)

57-58: Follow project coding guidelines for boolean expressions

According to the coding guidelines, prefer false == <expression> rather than !<expression> for boolean expressions.

-        if (o == null || getClass() != o.getClass()) {
+        if (false == (o != null) || getClass() != o.getClass()) {
             return false;
         }
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (5)

50-50: Consider using platform-independent temp directory

The hard-coded path /tmp/split_testdb might not work on all operating systems, particularly Windows.

-        final String metadataDbUrl = "jdbc:h2:file:/tmp/split_testdb;MODE=MySQL;DATABASE_TO_UPPER=FALSE";
+        final String tempDir = System.getProperty("java.io.tmpdir");
+        final String metadataDbUrl = String.format("jdbc:h2:file:%s/split_testdb;MODE=MySQL;DATABASE_TO_UPPER=FALSE", tempDir);

84-84: Use platform-independent path construction

Hard-coded path separators may cause issues on different operating systems.

-                    pstmt.setString(2, "/tmp/archives/" + tableName);
+                    pstmt.setString(2, new File(System.getProperty("java.io.tmpdir"), "archives" + File.separator + tableName).getPath());

118-126: Improve file cleanup handling in tearDown method

The current implementation has several issues:

  1. Uses hard-coded paths that aren't platform-independent
  2. Doesn't handle exceptions during file deletion
  3. File deletion failures are silently ignored
-        File dbFile = new File("/tmp/split_testdb.mv.db");
-        File lockFile = new File("/tmp/split_testdb.trace.db"); // Optional, H2 sometimes creates this
-        if (dbFile.exists()) {
-            dbFile.delete();
-            System.out.println("Deleted database file: " + dbFile.getAbsolutePath());
-        }
-        if (lockFile.exists()) {
-            lockFile.delete();
-        }
+        String tempDir = System.getProperty("java.io.tmpdir");
+        File dbFile = new File(tempDir, "split_testdb.mv.db");
+        File lockFile = new File(tempDir, "split_testdb.trace.db"); // Optional, H2 sometimes creates this
+        try {
+            if (dbFile.exists() && !dbFile.delete()) {
+                System.err.println("Failed to delete database file: " + dbFile.getAbsolutePath());
+            }
+            else if (dbFile.exists()) {
+                System.out.println("Deleted database file: " + dbFile.getAbsolutePath());
+            }
+            
+            if (lockFile.exists() && !lockFile.delete()) {
+                System.err.println("Failed to delete lock file: " + lockFile.getAbsolutePath());
+            }
+        }
+        catch (Exception e) {
+            System.err.println("Error cleaning up database files: " + e.getMessage());
+        }

136-137: Use platform-independent path in table handle construction

Hard-coded Unix-style paths might not work on all platforms.

-                    new ClpTableHandle(new SchemaTableName(TABLE_SCHEMA, tableName),
-                            "/tmp/archives/" + tableName,
+                    new ClpTableHandle(new SchemaTableName(TABLE_SCHEMA, tableName),
+                            new File(System.getProperty("java.io.tmpdir"), "archives" + File.separator + tableName).getPath(),

142-142: Use platform-independent path for split verification

The expected path in the assertion uses hard-coded Unix-style separators.

-                assertEquals(splits.get(i).getPath(), "/tmp/archives/" + tableName + "/id_" + i);
+                String expectedPath = new File(new File(new File(System.getProperty("java.io.tmpdir"), "archives"), tableName), "id_" + i).getPath();
+                assertEquals(splits.get(i).getPath(), expectedPath);
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (2)

97-98: Coding guideline violation in the equals method

The equals method doesn't follow the coding guideline that requires using false == (expression) rather than the negation form. The type check should be updated to match the project's style.

-        if (obj == null || getClass() != obj.getClass()) {
+        if (false == (obj instanceof ClpColumnHandle)) {
             return false;
         }

104-104: Simplify boolean comparison

Since nullable is a primitive boolean, using Objects.equals() is unnecessary.

-                Objects.equals(this.nullable, other.nullable);
+                this.nullable == other.nullable;
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (3)

61-68: Still violates coding guideline & risks SQL-injection in USE clause

The if (dbName != null && !dbName.isEmpty()) check continues to (a) use !expr instead of the mandated false == expr, and (b) concatenates dbName straight into an SQL statement. These exact issues were raised in earlier reviews.


45-47: Validate identifiers & avoid string-interpolated SQL

SQL_SELECT_ARCHIVES_TEMPLATE and the subsequent String.format(...) embed tableName into the query without validation. This leaves the provider open to SQL-injection via crafted table names coming from the metastore. Previous reviews already suggested:

  1. Verify tableName against a safe [A-Za-z0-9_]+ regex.
  2. Build the dynamic table identifier only after validation (identifiers cannot be parameterised but must at least be quoted/back-ticked).
  3. Keep value predicates (WHERE …) parameterised.

Also applies to: 78-79


80-90: Logger signature still drops stack-trace

log.warn("Database error while processing splits for %s: %s", tableName, e); passes the throwable as a formatting argument instead of the first parameter, so no stack trace is emitted. This was already pointed out in an earlier review.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1)

84-92: shutdown() still suppresses failures

lifeCycleManager.stop() exceptions are logged and then ignored, potentially masking resource leaks. This exact issue was raised previously; please propagate the exception (or wrap in RuntimeException) so Presto can fail the connector cleanly.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (3)

60-66: Avoid fixed /tmp database path – use a temp directory for hermetic tests

Hard-coding /tmp/metadata_testdb causes flakiness on CI agents running tests in parallel or on systems without write access to /tmp.
Use Files.createTempDirectory() (or H2’s jdbc:h2:mem:) to obtain an isolated location and clean it up automatically.


110-118: Drop heavy Pair dependency for simple tuples

Pulling in org.apache.commons.math3 for the Pair class bloats the test class-path.
A minimal replacement is Map.Entry<String, ClpNodeType> or a local record.


136-145: Check return values when deleting files and avoid System.out

File.delete() can silently fail and System.out pollutes test logs.
Assert deletion succeeded or use java.nio.file.Files.deleteIfExists(Path) with proper exception handling.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (1)

135-148: Column order is nondeterministic – iterate deterministically to avoid inconsistent schemas

HashMap iteration order varies between JVM runs, so the order of columns returned by collectColumnHandles() is unstable.
This can cause different workers to disagree on table metadata, leading to query failures.

-        for (Map.Entry<String, ClpNode> entry : root.children.entrySet()) {
+        root.children.entrySet().stream()
+                .sorted(Map.Entry.comparingByKey())     // deterministic order
+                .forEach(entry -> {
                     String name = entry.getKey();
                     ClpNode child = entry.getValue();
                     if (child.isLeaf()) {
                         columns.add(new ClpColumnHandle(name, child.originalName, child.type, true));
                     }
                     else {
                         Type rowType = buildRowType(child);
                         columns.add(new ClpColumnHandle(name, child.originalName, rowType, true));
                     }
-            }
+                });
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java (1)

60-61: 🧹 Nitpick (assertive)

Reuse SessionHolder to shorten tests & cut object churn

SessionHolder is created afresh inside every test method, but it is completely stateless after construction.
Declaring a single instance as a field initialised in a @BeforeMethod (or @BeforeClass) would eliminate dozens of allocations and shave a few hundred ms off suite runtime.

This is exactly the optimisation raised in an earlier review; feel free to ignore if you decided against it for explicit-is-better reasons.

Also applies to: 85-86, 100-101, 121-122, 142-143, 162-163, 182-183, 200-201

presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (3)

74-77: 🧹 Nitpick (assertive)

Follow negation guideline & guard against NPE when selecting DB

Project guideline: prefer false == expression over !expression.
The current code also evaluates isEmpty() when dbName can be null.

-        String dbName = config.getMetadataDbName();
-        if (dbName != null && !dbName.isEmpty()) {
-            connection.createStatement().execute(String.format("USE `%s`", dbName));
+        String dbName = config.getMetadataDbName();
+        if (null != dbName && false == dbName.isEmpty()) {
+            // Consider a prepared statement to avoid edge-case quoting issues
+            try (Statement useStmt = connection.createStatement()) {
+                useStmt.execute(String.format("USE `%s`", dbName));
+            }
         }

102-103: 🧹 Nitpick (assertive)

Logging pattern discards stack-trace – swap argument order

Using the throwable as a format argument loses the stack-trace.
Swap parameters so the exception becomes the first argument to log.warn/error.

-            log.warn("Failed to load table schema for %s: %s", schemaTableName.getTableName(), e);
+            log.warn(e, "Failed to load table schema for %s", schemaTableName.getTableName());

-            log.warn("Failed to load table names: %s", e);
+            log.warn(e, "Failed to load table names");

Also applies to: 130-132


89-93: ⚠️ Potential issue

Validate table name before embedding it in SQL

schemaTableName.getTableName() is interpolated directly into the SELECT text.
Although listTableHandles validates identifiers, listColumnHandles does not, so a malicious or corrupted metastore entry could lead to SQL-injection-style exploits or runtime errors.

Add an explicit isValidIdentifier() check and fail fast:

-        String query = String.format(SQL_SELECT_COLUMN_METADATA_TEMPLATE,
-                config.getMetadataTablePrefix(), schemaTableName.getTableName());
+        String table = schemaTableName.getTableName();
+        if (false == isValidIdentifier(table)) {
+            throw new IllegalArgumentException("Invalid table identifier: " + table);
+        }
+        String query = String.format(SQL_SELECT_COLUMN_METADATA_TEMPLATE,
+                config.getMetadataTablePrefix(), table);
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (2)

70-116: 🛠️ Refactor suggestion

Fail fast: validate mandatory MySQL connection properties

When metadataProviderType == MYSQL, the URL, user, and password are required.
Currently the first null dereference will surface deep in DriverManager.getConnection, making diagnosis harder.

Consider a @PostConstruct validator:

@PostConstruct
public void validate()
{
    if (MetadataProviderType.MYSQL.equals(metadataProviderType)) {
        checkState(null != metadataDbUrl,  "clp.metadata-db-url must be set");
        checkState(null != metadataDbUser, "clp.metadata-db-user must be set");
        checkState(null != metadataDbPassword, "clp.metadata-db-password must be set");
    }
}

124-131: 🧹 Nitpick (assertive)

Normalise table prefix for case-insensitive back-ends

Lower-casing the validated prefix avoids accidental mismatches where users mix cases.

-        this.metadataTablePrefix = metadataTablePrefix;
+        this.metadataTablePrefix = metadataTablePrefix.toLowerCase(Locale.ROOT);
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (2)

98-101: Reuse the shared SqlParser instance instead of allocating a new one per call

This identical issue was pointed out in an earlier review. Instantiating SqlParser every time expression() is invoked creates unnecessary object churn and slows the test-suite. A single, static, thread-safe parser (plus a reused ParsingOptions) will suffice.


105-109: toRowExpression suffers from the same repeated SqlParser allocation

For consistency with the fix requested above (and the past review), replace the new SqlParser() here with the shared static instance.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (4)

57-65: Caches are still unbounded – risk of un-controlled heap growth

Previous feedback recommended adding a maximumSize(...) (or weight) limit to both columnHandleCache and tableHandleCache. Without it, long-running clusters that see many tables/schemas can exhaust heap.


98-100: Prefer false == expr over !expr to follow project style-guide

The conditional negations here (!listSchemaNames(...)) deviate from the prescribed coding guideline for *.java files. Replace with false == listSchemaNames(...).contains(...) for consistency.

Also applies to: 112-113, 157-158


166-167: Possible NullPointerException when schema is omitted

prefix.getSchemaName() may be null. Passing it directly to new SchemaTableName(...) will throw. Guard with Optional.ofNullable(...).orElse(DEFAULT_SCHEMA_NAME) as suggested previously.


69-78: Add error handling when loading metadata from provider

loadColumnHandles and loadTableHandles bubble up unchecked exceptions from the provider, which can crash planning threads. Wrap the calls in try-catch and translate to a PrestoException with a clear error code for easier diagnostics (see prior review).

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1b8d043 and 7ca1179.

📒 Files selected for processing (19)
  • presto-clp/pom.xml (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1 hunks)
  • presto-docs/src/main/sphinx/connector/clp.rst (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpPlanOptimizer.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpSchemaTree.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java
🧬 Code Graph Analysis (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java (1)
  • ClpColumnHandle (26-117)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableHandle.java (1)
  • ClpTableHandle (25-96)
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: spark-integration (8)
  • GitHub Check: spark-integration (17.0.13)
🔇 Additional comments (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpRecordSetProvider.java (1)

25-36: Implementation correctly throws UnsupportedOperationException

The implementation correctly throws an UnsupportedOperationException with a clear message, as suggested in previous reviews. This is better than returning null, which would cause a NullPointerException.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1)

29-32: Javadoc should match method signature

The Javadoc comment mentions "table handles" which correctly matches the return type of List<ClpTableHandle>. This is an improvement over previous versions.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)

49-49: 🧹 Nitpick (assertive)

Apply yoda condition here too

Same issue as above:

-        if (config.getSplitProviderType() == ClpConfig.SplitProviderType.MYSQL) {
+        if (ClpConfig.SplitProviderType.MYSQL == config.getSplitProviderType()) {

Likely an incorrect or invalid review comment.

Comment on lines +33 to +38
binder.bind(ClpConnector.class).in(Scopes.SINGLETON);
binder.bind(ClpMetadata.class).in(Scopes.SINGLETON);
binder.bind(ClpPlanOptimizerProvider.class).in(Scopes.SINGLETON);
binder.bind(ClpRecordSetProvider.class).in(Scopes.SINGLETON);
binder.bind(ClpSplitManager.class).in(Scopes.SINGLETON);
configBinder(binder).bindConfig(ClpConfig.class);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider consolidating singleton bindings for better readability

The five repetitive binder.bind(X.class).in(Scopes.SINGLETON); lines reduce readability. Consider introducing a helper method to make the code more concise and maintainable.

-        binder.bind(ClpConnector.class).in(Scopes.SINGLETON);
-        binder.bind(ClpMetadata.class).in(Scopes.SINGLETON);
-        binder.bind(ClpPlanOptimizerProvider.class).in(Scopes.SINGLETON);
-        binder.bind(ClpRecordSetProvider.class).in(Scopes.SINGLETON);
-        binder.bind(ClpSplitManager.class).in(Scopes.SINGLETON);
+        bindSingletons(binder,
+                ClpConnector.class,
+                ClpMetadata.class,
+                ClpPlanOptimizerProvider.class,
+                ClpRecordSetProvider.class,
+                ClpSplitManager.class);

You would need to add a helper method:

private void bindSingletons(Binder binder, Class<?>... types) {
    for (Class<?> type : types) {
        binder.bind(type).in(Scopes.SINGLETON);
    }
}

configBinder(binder).bindConfig(ClpConfig.class);

ClpConfig config = buildConfigObject(ClpConfig.class);
if (config.getMetadataProviderType() == ClpConfig.MetadataProviderType.MYSQL) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Use yoda conditions for better error prevention

Per coding guidelines, prefer false == <expression> rather than !<expression> pattern for conditional checks. For enum comparisons, this translates to putting the constant on the left side:

-        if (config.getMetadataProviderType() == ClpConfig.MetadataProviderType.MYSQL) {
+        if (ClpConfig.MetadataProviderType.MYSQL == config.getMetadataProviderType()) {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (config.getMetadataProviderType() == ClpConfig.MetadataProviderType.MYSQL) {
if (ClpConfig.MetadataProviderType.MYSQL == config.getMetadataProviderType()) {

Comment on lines +31 to +32
private final String path;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add @nullable annotation for the path field

The path field is not annotated with @Nullable, but it appears to accept null values based on the constructor implementation. This creates ambiguity about whether null values are allowed.

-    private final String path;
+    @Nullable
+    private final String path;

This would require importing:

import javax.annotation.Nullable;

Comment on lines +91 to +96
catch (SQLException e) {
log.warn("Database error while processing splits for %s: %s", tableName, e);
}

return splits;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Swallowing SQLException hides failures

After catching SQLException the method just logs and returns an (often empty) list. Callers cannot distinguish between “no splits” and “backend failure”, which can lead to silent data loss in production. Consider re-throwing a PrestoException with an appropriate error code.

-        catch (SQLException e) {
-            log.warn("Database error while processing splits for %s: %s", tableName, e);
-        }
+        catch (SQLException e) {
+            log.warn(e, "Database error while processing splits for %s", tableName);
+            throw new PrestoException(GENERIC_INTERNAL_ERROR,
+                    format("Unable to list CLP splits for table %s: %s", tableName, e.getMessage()), e);
+        }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
catch (SQLException e) {
log.warn("Database error while processing splits for %s: %s", tableName, e);
}
return splits;
}
catch (SQLException e) {
log.warn(e, "Database error while processing splits for %s", tableName);
throw new PrestoException(GENERIC_INTERNAL_ERROR,
format("Unable to list CLP splits for table %s: %s", tableName, e.getMessage()), e);
}
return splits;

Comment on lines +13 to +15
The CLP Connector enables SQL-based querying of CLP-S archives from Presto. This document describes how to set up the
CLP Connector to run SQL queries.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Explain what “CLP-S archives” are

The doc still assumes readers know what a “CLP-S archive” is. A two-sentence primer (format, typical producers, why it matters) would improve accessibility.

Comment on lines +151 to +163
{
if (!polymorphicTypeEnabled) {
return baseName;
}

boolean conflictDetected = false;
if (parent.children.containsKey(baseName)) {
ClpNode existing = parent.children.get(baseName);
if (existing.type == null) {
conflictDetected = true;
}
else if (!existing.type.equals(newType)) {
String existingSuffix = getTypeSuffix(existing.type);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Negation style violates project guideline

The project mandates false == <expression> instead of !<expression> (e.g., line 152 and 162).
Please adjust to keep the codebase stylistically uniform.

Comment on lines +190 to +196
List<RowType.Field> fields = new ArrayList<>();
for (Map.Entry<String, ClpNode> entry : node.children.entrySet()) {
String name = entry.getKey();
ClpNode child = entry.getValue();
Type fieldType = child.isLeaf() ? child.type : buildRowType(child);
fields.add(new RowType.Field(Optional.of(name), fieldType));
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Nested RowType field order also nondeterministic

buildRowType() iterates over a HashMap, producing unpredictable RowType signatures.
Sort the entries (or use LinkedHashMap) before adding fields so workers build identical types.

Comment on lines +81 to +84
private boolean isValidIdentifier(String identifier)
{
return identifier != null && ClpConfig.SAFE_SQL_IDENTIFIER.matcher(identifier).matches();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Negation style in isValidIdentifier violates coding guideline

return identifier != null && ClpConfig.SAFE_SQL_IDENTIFIER.matcher(identifier).matches();

Should read:

-return identifier != null && ClpConfig.SAFE_SQL_IDENTIFIER.matcher(identifier).matches();
+return null != identifier && true == ClpConfig.SAFE_SQL_IDENTIFIER.matcher(identifier).matches();

Comment on lines +119 to +121
if (isValidIdentifier(tableName) && archiveStorageDirectory != null &&
!archiveStorageDirectory.isEmpty()) {
tableHandles.add(new ClpTableHandle(new SchemaTableName(schemaName, tableName),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Negation guideline in emptiness check

archiveStorageDirectory != null && !archiveStorageDirectory.isEmpty()

Be consistent with the project rule:

-                if (isValidIdentifier(tableName) && archiveStorageDirectory != null &&
-                        !archiveStorageDirectory.isEmpty()) {
+                if (true == isValidIdentifier(tableName) && null != archiveStorageDirectory &&
+                        false == archiveStorageDirectory.isEmpty()) {

Comment on lines +124 to +131
ConnectorTableHandle table,
Constraint<ColumnHandle> constraint,
Optional<Set<ColumnHandle>> desiredColumns)
{
ClpTableHandle tableHandle = (ClpTableHandle) table;
ConnectorTableLayout layout = new ConnectorTableLayout(new ClpTableLayoutHandle(tableHandle, Optional.empty()));
return ImmutableList.of(new ConnectorTableLayoutResult(layout, constraint.getSummary()));
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Missed opportunity to push predicates into the layout handle

getTableLayouts ignores the supplied Constraint and always builds a layout with Optional.empty(). This forfeits filter push-down – contradicting the intent of the CLP connector to embed KQL predicates at planning time. Consider incorporating constraint.getSummary() (or the decomposed predicate) into the ClpTableLayoutHandle, e.g.:

-        ConnectorTableLayout layout = new ConnectorTableLayout(new ClpTableLayoutHandle(tableHandle, Optional.empty()));
+        ConnectorTableLayout layout = new ConnectorTableLayout(
+                new ClpTableLayoutHandle(tableHandle, Optional.of(constraint.getSummary())));

This small change enables richer predicate push-down without altering downstream components.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ConnectorTableHandle table,
Constraint<ColumnHandle> constraint,
Optional<Set<ColumnHandle>> desiredColumns)
{
ClpTableHandle tableHandle = (ClpTableHandle) table;
ConnectorTableLayout layout = new ConnectorTableLayout(new ClpTableLayoutHandle(tableHandle, Optional.empty()));
return ImmutableList.of(new ConnectorTableLayoutResult(layout, constraint.getSummary()));
}
ConnectorTableHandle table,
Constraint<ColumnHandle> constraint,
Optional<Set<ColumnHandle>> desiredColumns)
{
ClpTableHandle tableHandle = (ClpTableHandle) table;
- ConnectorTableLayout layout = new ConnectorTableLayout(new ClpTableLayoutHandle(tableHandle, Optional.empty()));
+ ConnectorTableLayout layout = new ConnectorTableLayout(
+ new ClpTableLayoutHandle(tableHandle, Optional.of(constraint.getSummary())));
return ImmutableList.of(new ConnectorTableLayoutResult(layout, constraint.getSummary()));
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants