-
Notifications
You must be signed in to change notification settings - Fork 587
[FR] Add white space checking for KQL parse #3789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[FR] Add white space checking for KQL parse #3789
Conversation
This should probably be handled in the grammar instead |
The current grammar requires white space to be ignored. I think the way you are suggesting would require a refactor of both the grammar and the parsing to handle this. This would not only be a refactor/overhaul but in effect a full replacement as most of not all of the code would need to be updated compared to the relatively minor change I am suggesting. |
…ors-does-not-raise-error
…ors-does-not-raise-error
Enhancement - GuidelinesThese guidelines serve as a reminder set of considerations when addressing adding a feature to the code. Documentation and Context
Code Standards and Practices
Testing
Additional Checks
|
…ors-does-not-raise-error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this is functional and looks like it addresses the immediate issue. What do you think about a small refactor to break up some of the compact logic?
column=column, | ||
source=line, | ||
width=len(token), | ||
trailer=None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we (do we need to) provide more specific debugging information here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think we do need more specific information, not sure what else we would provide other than the line and column of where the needed whitespace would be missing.
That being said, if we include not
as a binary operator, then we would want the error message to be a warning as it is allowed without whitespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially add rule_name, or path or something in the event that we run this against many rules at one time.
check_whitespace(collect_token_positions(tree, "and"), 'and', lines) | ||
check_whitespace(collect_token_positions(tree, "or"), 'or', lines) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be more optimized to do a single traversal to collect all the token values (instead of traversing twice)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, it would save the second (or potentially third) traversal, no particular need to parse twice as far as I can recall.
@@ -103,3 +103,10 @@ def test_optimization(self): | |||
"{'match': {'destination.ip': '169.254.169.254'}}]}}]}}" | |||
) | |||
self.assertEqual(dsl_str, good_case, "DSL string does not match the good case, optimization failed.") | |||
|
|||
def test_blank_space(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about multi-line queries with tokens spanning lines or queries with leading/trailing whitespace on lines themselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure I am following, is not '"Test-ServiceDaclPermission" or\n "Update-ExeFunctions"'
a multi-line query with a token spanning the query? It would be rendered as:
"Test-ServiceDaclPermission" or
"Update-ExeFunctions"
Which I think would be a query with the token spanning the line as the or requires both lines, and the needed second whitespace operator is leading on the second line? Could you give me an example, I'm not sure I am following?
|
||
def test_blank_space(self): | ||
with self.assertRaises(kql.KqlParseError): | ||
kql.lark_parse('"Test-ServiceDaclPermission" or"Update-ExeFunctions"') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add failure case with the space missing on the left side and failure case when \n
is involved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a test case where and or or is part of a field name.
Related Issues
Resolves #2700
Summary
This addresses an issue where lark parses KQL queries without whitespace around certain tokens, where KQL does not.
E.g.
"Get-NetComputerSiteName" or "Get-NetLocalGroup"
vs"Get-NetComputerSiteName" or"Get-NetLocalGroup"
. Both of which parse via lark/ANTLR, but the second fails in Kibana.Some notes about alternative implementations:
This approach adds a post-processing step to the lark parsing to tell us where the and and or tokens are in the original string, then compare to see if those tokens locations have the appropriate spacing.
Note since this PR updates the KQL lib please make sure to update the KQL lib version appropriately.
Contributor checklist