-
Notifications
You must be signed in to change notification settings - Fork 298
JDBC: Use PrepareStatement #1802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
...tional-jdbc/src/main/java/org/apache/polaris/persistence/relational/jdbc/QueryGenerator.java
Outdated
Show resolved
Hide resolved
...tional-jdbc/src/main/java/org/apache/polaris/persistence/relational/jdbc/QueryGenerator.java
Outdated
Show resolved
Hide resolved
} | ||
} | ||
|
||
public static <T> PreparedQuery generateSelectQuery( | ||
@Nonnull Converter<T> entity, @Nonnull Map<String, Object> whereClause) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, SQL generation should be based on object type and use case, but not depend on runtime parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: keep SQL scoped to classes (basically static
data). Also, mind IJ warnings about "unsafe" SQL - I believe IJ's SQL analysis is fairly good. If it flags and issue, there usually is an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IJ is flagging this as well : PreparedStatement statement = connection.prepareStatement(query.getSql()) ideally this should not be flagged right since we are using ?
and setting values later ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as Eric pointed out - Map
keys go into the SQL text - it's all still risky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm suggesting to keep SQL at class level, then column names will be fixed, no runtime input -> safer SQL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PreparedStatement
by itself is not very meaningful unless the SQL that goes into it is detached from the runtime parameters. The PreparedStatement
JDBC interface ensures that parameters are separated at execution time. I believe it is also important to isolate runtime parameters at SQL text generation time (I do not think all SQL in this Persistence can be constant, but we should make the SQL generation work without using runtime entities as input).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with static SQL generation, though I would also request, to take up avoid generating the SQL at run-time as fast follow-up, this requires some more work and discussion since if now the QueryGenerator is shared across models and some things like INSERT, DELETE are all in common place, never the less the DBOperations class is not aware of what to bind the PreparedStatement against, the parameters to bind is also decided in the QueryGenerator itself, all the DbOps sees a PreparedStatement and Parameters and it opens preparedStatement and feeds both stuff in, now if we go this ways we need to make DbOperation understand what type of query it is and what to bind against. I am still thinking of clean way to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generating SQL text in runtime if fine from my POV as long as entities are not uses as input for the generating code (using entity types as input is ok).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also on board the static prepared statement bandwagon here (although maybe not necessarily at compile time). But also agree probably is not required for this this release. I also think statically defining the prepared statements but doing that programmatically is also fine. IE, for each class a statement is defined statically on class initialization but I haven't thought this through. The key thing would be differentiating between binding of runtime values vs class entities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Update] I refactored the QueryGenerator to now containing only 5 ways to generate the query so now it has only
- SELECT and where
- UPDATE all columns and where
- DELETE and where
- SELECT with IN query with dynamic list
- SELECT with one special disjunction
These are all single table queries, all the projection are taken from the static member and the filters column reference has an assertion that keys in filter are subset of the columns from the static member of the model.
All this functions return a PreparedQuery -> SQL string with ?
and the ordered list of parameters which we can feed to the JDBC, it now doesn't need entity obj to create these.
Need to refactor this more to completely decouple the query generation and parameter feeding, working on it, it gonna take some more time.
map.put("properties", this.getProperties()); | ||
map.put("internal_properties", this.getInternalProperties()); | ||
if (databaseType.equals(DatabaseType.POSTGRES)) { | ||
map.put("properties", toPGobject(this.getProperties())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to do PG specializations in a separate PR after getting prepared statements working in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PG specialization is now required for the CI to make happy as what happens unlinke Statements prepareStatement expects the PG type hence, i added the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. fair enough.
7d7f737
to
108a87b
Compare
...tional-jdbc/src/main/java/org/apache/polaris/persistence/relational/jdbc/QueryGenerator.java
Outdated
Show resolved
Hide resolved
...tional-jdbc/src/main/java/org/apache/polaris/persistence/relational/jdbc/QueryGenerator.java
Outdated
Show resolved
Hide resolved
...bc/src/main/java/org/apache/polaris/persistence/relational/jdbc/JdbcBasePersistenceImpl.java
Outdated
Show resolved
Hide resolved
} | ||
} | ||
|
||
public static <T> PreparedQuery generateSelectQuery( | ||
@Nonnull Converter<T> entity, @Nonnull Map<String, Object> whereClause) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to agree with @dimas-b, SQL should be statically defined at compile time for maximum security and efficiency.
It should be possible to achieve that by declaring a few constants in each model class. e.g. for ModelEntity
:
public class ModelEntity implements Converter<PolarisBaseEntity> {
public static final String TABLE_NAME = "ENTITIES";
public static final List<String> PK_COLUMNS = List.of("realm_id", "id");
public static final List<String> ALL_COLUMNS =
List.of(
"realm_id",
"id",
"catalog_id",
"parent_id",
"type_code",
"name",
"entity_version",
"sub_type_code",
"create_timestamp",
"drop_timestamp",
"purge_timestamp",
"to_purge_timestamp",
"last_update_timestamp",
"properties",
"internal_properties",
"grant_records_version");
public static final String LOOKUP_QUERY =
"SELECT "
+ String.join(", ", ALL_COLUMNS)
+ " FROM "
+ TABLE_NAME
+ " WHERE "
+ String.join(" = ? AND ", PK_COLUMNS)
+ " = ?";
public static final String LIST_QUERY =
"SELECT "
+ String.join(", ", ALL_COLUMNS)
+ " FROM "
+ TABLE_NAME
+ " WHERE realm_id = ? AND catalog_id = ? AND parent_id = ? AND type_code = ?";
public static void bindLookupQuery(PreparedStatement stmt, ModelEntity entity, String realmId) throws SQLException {
stmt.setObject(1, realmId);
stmt.setObject(2, entity.getId());
}
public static void bindListQuery(PreparedStatement stmt, ModelEntity entity, String realmId) throws SQLException {
stmt.setObject(1, realmId);
stmt.setObject(2, entity.getCatalogId());
stmt.setObject(3, entity.getParentId());
stmt.setObject(4, entity.getTypeCode());
}
...
a177831
to
1d76852
Compare
68140d1
to
97be8e1
Compare
97be8e1
to
18dd2ec
Compare
...tional-jdbc/src/main/java/org/apache/polaris/persistence/relational/jdbc/QueryGenerator.java
Outdated
Show resolved
Hide resolved
...tional-jdbc/src/main/java/org/apache/polaris/persistence/relational/jdbc/QueryGenerator.java
Outdated
Show resolved
Hide resolved
...tional-jdbc/src/main/java/org/apache/polaris/persistence/relational/jdbc/QueryGenerator.java
Outdated
Show resolved
Hide resolved
...tional-jdbc/src/main/java/org/apache/polaris/persistence/relational/jdbc/QueryGenerator.java
Show resolved
Hide resolved
try { | ||
PreparedQuery query = | ||
queryGenerator.generateSelectQuery( | ||
ModelEntity.ALL_COLUMNS, ModelEntity.TABLE_NAME, params); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it would be preferable to completely isolate SQL generation from input values... E.g. use params.keySet()
instead of all params
.
Taking this one step further, since the key set is known for this use case, we could have a specialized generator like LookupEntitiesQuery.sql()
.
Currently, SQL specializations are contained in methods that call queryGenerator.generateSelectQuery()
. I'm proposing to make sub-classes for those cases. I believe the amount of code is going to be roughly the same, but we can achieve static SQL generation is most cases. In the remaining cases, I'm sure we can still take runtime entities out of the SQL generation methods (perhaps the only non-static case is going to be IN
queries). WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it would be preferable to completely isolate SQL generation from input values... E.g. use params.keySet() instead of all params.
I agree with you, but presently, we just use params.keySet even today for query generation the user values are just feed to the PreparedQuery record's parameter (which we need to do anyways), i can't think off any way of this being manipulated, specially since we assert the keys in params are strict subset of the columns of the table. checking if i can stream line it more and do the PreparedQuery in JDBC persistence it self
Taking this one step further, since the key set is known for this use case, we could have a specialized generator like LookupEntitiesQuery.sql().
There are 33 sql's presently, I agree static would be preferable, I am parallely trying to do the same as well, here is a gist in which i have drilled it down into all static queries late last night https://gist.github.com/singhpk234/a63e7236a570892fcaf46bbf4cb89f34 except IN (?) but i haven't tested it E2E, would need some more time for that, just putting this pr out to address immediate concerns.
@@ -206,6 +215,17 @@ public void runWithinTransaction(TransactionCallback callback) throws SQLExcepti | |||
}); | |||
} | |||
|
|||
public Integer execute(Connection connection, QueryGenerator.PreparedQuery preparedQuery) | |||
throws SQLException { | |||
try (PreparedStatement statement = connection.prepareStatement(preparedQuery.sql())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code appears to be called one per entity in writeEntities
. I believe we should try and refactor the code to do prepareStatement
once, execute many in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that in general prepared statements should be created at application startup, and stored for later retrieval. That being said, I believe the Postgres JDBC driver has an internal prepared statements cache, so this wouldn't generate a PREPARE
instruction each time it's executed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not mean preparing at startup, but at least reusing the PreparedStatement
inside loops 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... but good point about lack of expected runtime benefits here. I think it should ok to defer it (perhaps indefinitely).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I was able to follow the proposed new code, generated SQL appears to be a function of object types and use cases. No runtime entity values leak into SQL text that is used to make PreparedStatement
instances.
I still think SQL generation can be improved to be more isolated from runtime values (it can be done statically in many cases), but I'd be ok doing that in a follow up PR.
...tional-jdbc/src/main/java/org/apache/polaris/persistence/relational/jdbc/QueryGenerator.java
Outdated
Show resolved
Hide resolved
if (databaseType.equals(DatabaseType.POSTGRES)) { | ||
map.put("properties", toJsonbPGobject(this.getProperties())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: use an interface for converting values to driver-specific types and provide different implementations per DatabaseType
(it could even be obtained from DatabaseType.converter()
perhaps) instead of if
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late review. Thanks a lot for the quick fix, @singhpk234! LGTM overall. Left minor comments.
PreparedStatement statement = connection.prepareStatement(query.sql())) { | ||
List<Object> params = query.parameters(); | ||
for (int i = 0; i < params.size(); i++) { | ||
statement.setObject(i + 1, params.get(i)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was recommend to use three parameters if we knew the type to make it safer. It will make the programming a bit harder as we have to pass the model entity here to infer the SQLType. I don't think it's a blocker for this PR though.
default void setObject(int parameterIndex, Object x, SQLType targetSqlType)
String realmId) { | ||
List<String> finalColumns = new ArrayList<>(allColumns); | ||
List<Object> finalValues = new ArrayList<>(values); | ||
finalColumns.add("realm_id"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we include realm_id in model classes? In that case,
- We don't have to deal with it as a special column like here.
- The model class are more consistent with the SQL table schema.
Not a blocker for this PR thought.
Map<String, Object> toMap(); | ||
Map<String, Object> toMap(DatabaseType databaseType); | ||
|
||
default PGobject toJsonbPGobject(String props) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this seems more a util method only used by Postgres impl. I don't think it belongs here. We may create a class like PostgresUtil to hold it.
...tional-jdbc/src/main/java/org/apache/polaris/persistence/relational/jdbc/QueryGenerator.java
Outdated
Show resolved
Hide resolved
...tional-jdbc/src/main/java/org/apache/polaris/persistence/relational/jdbc/QueryGenerator.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just did a review and it looks good to me.
There are potentially some improvements that we can address later. I'm in favor of merging this to fix the initial issue and cut 0.10.0-beta rc5 including this fix.
Thank you so much every one for the quick review, I tried addressing most of the feedback here : I hope this addresses major concern around this PR and i sincerly thank the whole community to jump in and help here ! |
DatabaseType databaseType; | ||
try { | ||
databaseType = getDatabaseType(); | ||
} catch (SQLException e) { | ||
throw new RuntimeException(e); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we could push this into method getDatasourceOperations()
|
||
private final Random random = new Random(); | ||
|
||
public DatasourceOperations( | ||
DataSource datasource, RelationalJdbcConfiguration relationalJdbcConfiguration) { | ||
DataSource datasource, | ||
DatabaseType databaseType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need to pass the database type, as we can compute it from the datasource
.
@singhpk234 awesome, I'm cherry picking on |
No description provided.