-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: DSL parser + AST implementation #23
Conversation
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
String, | ||
Bool, | ||
Float64, | ||
Array(Box<Type>), // Array types like [T] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remember to change these to rust comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, seems like we can parse the DSL now, and we need some more work to make the interpreter to work with it.
@@ -0,0 +1,258 @@ | |||
/*use std::collections::HashSet; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rm unused files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah damn i pushed name analyzer in here.
@@ -0,0 +1,325 @@ | |||
use pest::iterators::Pair; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if this could be simplified with https://github.com/pest-parser/ast in the future, basically it's a conversion from pest::Pair -> AST
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⬇️ more reviews on the rule correctness
schema_len = input.schema_len | ||
} | ||
|
||
Logical Aggregate(child: Logical, group_keys: [Scalar], aggs: [(Scalar, String)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why aggs [Scalar, String]? should be something like [Scalar] or [AggFunction]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string is the aggfunction for now
we don't support UDTs yet
// Rules | ||
|
||
// memo(alexis): support member function for operators like apply_children | ||
def rewrite_column_refs(predicate: Scalar, map: Map[Int64, Int64]): Scalar = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need handle the case where map doesn't contain a mapping of a column referenced, throw an error that it can't rewrite in such cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was planning to do that in the map implementation
@rule(Logical) | ||
def join_commute(expr: Logical): Logical = | ||
match expr | ||
case Join("Inner", left, right, cond) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grammar wise, is the order required here? i.e., would Join("Inner", right, left, cond) produce a correct match?
proposing syntex,
Join { type: "Inner", left, right, cond }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes order is required
val left_indices = 0..left_len; | ||
|
||
val remapping = (left_indices.map(i => (i, i + right_len)) ++ | ||
right_indices.map(i => (left_len + i, i))).to_map(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right_indices don't seem to be set correctly, should be i -> i - left_len?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no this is correct, the pair is reversed
|
||
Project( | ||
Join("Inner", right, left, rewrite_column_refs(cond, remapping)), | ||
left_indices.map(i => ColumnRef(i + right_len)) ++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be right -> right + left_len, left -> left - right_len?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah there seems to be smt wrong here
case op @ Join("Inner", Join("Inner", a, b, cond_inner), c, cond_outer) => | ||
val a_len = a.schema_len; | ||
if !has_refs_in_range(cond_outer, 0, a_len) then | ||
val remap_inner = (a.schema_len..op.schema_len).map(i => (i, i - a_len)).to_map(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/remap_inner/remap_outer?
Problem
We want a DSL to be able to write rules & operators in a declarative fashion. This makes the code more maintainable, maximizes compatibility, and speeds up the writing of newer rules.
Summary of changes
Wrote a parser of the OPTD-DSL using Pest. The syntax is highly functional and inspired from Scala. Some snippets below:
Next steps
Semantic analysis, type checking, and low-level IR generation.
Also switching to Chumsky as parser, since Pest is not very maintainable.