-
Notifications
You must be signed in to change notification settings - Fork 8
s3cluster joins, part 1 #972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
s3cluster joins, part 1 #972
Conversation
Don't forget to change branch to merge on |
src/Storages/IStorageCluster.cpp
Outdated
auto table_function_node = table_function_searcher.getNode(); | ||
|
||
if (!table_function_node) | ||
throw Exception(ErrorCodes::LOGICAL_ERROR, "Can't fiund table function node"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throw Exception(ErrorCodes::LOGICAL_ERROR, "Can't fiund table function node"); | |
throw Exception(ErrorCodes::LOGICAL_ERROR, "Can't find table function node"); |
|
||
- `local` — Replaces the database and table in the subquery with local ones for the destination server (shard), leaving the normal `IN`/`JOIN.` | ||
- `global` — Unsupported for now. Replaces the `IN`/`JOIN` query with `GLOBAL IN`/`GLOBAL JOIN.` | ||
- `allow` — Default value. Allows the use of these types of subqueries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `allow` — Default value. Allows the use of these types of subqueries. | |
- `allow` — Default value. Allows the use of these types of subqueries. The join will be executed on the initiator. |
should we also support 'forbid' ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW - what will happen when joining xxxCluster with non-distributed table? Maybe a test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's a default old behavour - query executed on remote nodes "as is" and join executed there.
I guess this is optimal for casual, non-swarm mode, when right table is present on all cluster nodes.
using Base = InDepthQueryTreeVisitorWithContext<SearcherVisitor>; | ||
using Base::Base; | ||
|
||
explicit SearcherVisitor(QueryTreeNodeType type_, ContextPtr context) : Base(context), type(type_) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like the code is only for analyzer (i.e. build on the top of QueryTree, not on the top of AST). It's ok for new features, but maybe it's better to document it clearly. I was also thinking about adding throw when analizer is disabled. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Full support for non-anylyzer code can be more challenging (see for example JoinToSubqueryTransformVisitor)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added exception
passed_node = node; | ||
} | ||
|
||
QueryTreeNodePtr getNode() const { return passed_node; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually i never looked close on the anazyzer code, thought it should be simpler (i.e. avoiding tree traversal, but i can be wrong).
query_tree_distributed->setAlias(table_function_ast.alias); | ||
|
||
// Find add used columns from table function to make proper projection list | ||
CollectUsedColumnsForSourceVisitor collector(table_function_node, context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe things like tryResolveIdentifierFromJoinTreeNode / tryResolveIdentifierFromJoinTree / tryResolveIdentifierFromTableExpression can be reused.
Possible values: | ||
|
||
- `local` — Replaces the database and table in the subquery with local ones for the destination server (shard), leaving the normal `IN`/`JOIN.` | ||
- `global` — Unsupported for now. Replaces the `IN`/`JOIN` query with `GLOBAL IN`/`GLOBAL JOIN.` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it is unsupported, then why was it added? I suggest ditching that value from enum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's for part 2, when right table is calculated on initiator and is sent as part of the query to the swarm nodes.
src/Core/Settings.cpp
Outdated
@@ -1713,6 +1713,22 @@ Possible values: | |||
- `global` — Replaces the `IN`/`JOIN` query with `GLOBAL IN`/`GLOBAL JOIN.` | |||
- `allow` — Allows the use of these types of subqueries. | |||
)", IMPORTANT) \ | |||
DECLARE(ObjectStorageClusterJoinMode, object_storage_cluster_join_mode, ObjectStorageClusterJoinMode::ALLOW, R"( | |||
Changes the behaviour of object storage cluster function ot table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes the behaviour of object storage cluster function ot table. | |
Changes the behaviour of object storage cluster function of table. |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'or' :)
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
JOIN with *Cluster table functions
Documentation entry for user-facing changes
Solved #819
Queries like
execute JOIN's on remote nodes.
This behavior is with default value for new setting
object_storage_cluster_join_mode='allow'
.With this PR with setting
object_storage_cluster_join_mode='local'
query rewrited onand JOIN's executed on initiator.
'global' mode (select from right table and send result to swarm nodes to make join on swarm nodes) somewhere later in separate PR.
Exclude tests: