Skip to content

[FR] Add white space checking for KQL parse #3789

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

eric-forte-elastic
Copy link
Contributor

@eric-forte-elastic eric-forte-elastic commented Jun 14, 2024

Related Issues

Resolves #2700

Summary

This addresses an issue where lark parses KQL queries without whitespace around certain tokens, where KQL does not.

E.g. "Get-NetComputerSiteName" or "Get-NetLocalGroup" vs "Get-NetComputerSiteName" or"Get-NetLocalGroup". Both of which parse via lark/ANTLR, but the second fails in Kibana.

Some notes about alternative implementations:

  1. We don't want to update the grammar because we need to ignore white space
  2. We don't want to use regex pre-processing because it will catch things like "category" as invalid unavoidably

This approach adds a post-processing step to the lark parsing to tell us where the and and or tokens are in the original string, then compare to see if those tokens locations have the appropriate spacing.

Note since this PR updates the KQL lib please make sure to update the KQL lib version appropriately.

Contributor checklist

@brokensound77
Copy link
Contributor

This should probably be handled in the grammar instead

@eric-forte-elastic
Copy link
Contributor Author

eric-forte-elastic commented Jun 20, 2024

This should probably be handled in the grammar instead

The current grammar requires white space to be ignored. I think the way you are suggesting would require a refactor of both the grammar and the parsing to handle this. This would not only be a refactor/overhaul but in effect a full replacement as most of not all of the code would need to be updated compared to the relatively minor change I am suggesting.

@Mikaayenson Mikaayenson added the kql related to the kql module label Jan 27, 2025
@eric-forte-elastic eric-forte-elastic added enhancement New feature or request python Internal python for the repository patch and removed backlog labels Jul 11, 2025
Copy link
Contributor

Enhancement - Guidelines

These guidelines serve as a reminder set of considerations when addressing adding a feature to the code.

Documentation and Context

  • Describe the feature enhancement in detail (alternative solutions, description of the solution, etc.) if not already documented in an issue.
  • Include additional context or screenshots.
  • Ensure the enhancement includes necessary updates to the documentation and versioning.

Code Standards and Practices

  • Code follows established design patterns within the repo and avoids duplication.
  • Ensure that the code is modular and reusable where applicable.

Testing

  • New unit tests have been added to cover the enhancement.
  • Existing unit tests have been updated to reflect the changes.
  • Provide evidence of testing and validating the enhancement (e.g., test logs, screenshots).
  • Validate that any rules affected by the enhancement are correctly updated.
  • Ensure that performance is not negatively impacted by the changes.
  • Verify that any release artifacts are properly generated and tested.
  • Conducted system testing, including fleet, import, and create APIs (e.g., run make test-cli, make test-remote-cli, make test-hunting-cli)

Additional Checks

  • Verify that the enhancement works across all relevant environments (e.g., different OS versions).
  • Confirm that the proper version label is applied to the PR patch, minor, major.

@eric-forte-elastic eric-forte-elastic marked this pull request as ready for review July 11, 2025 15:46
Copy link
Contributor

@Mikaayenson Mikaayenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is functional and looks like it addresses the immediate issue. What do you think about a small refactor to break up some of the compact logic?

column=column,
source=line,
width=len(token),
trailer=None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we (do we need to) provide more specific debugging information here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we do need more specific information, not sure what else we would provide other than the line and column of where the needed whitespace would be missing.

That being said, if we include not as a binary operator, then we would want the error message to be a warning as it is allowed without whitespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially add rule_name, or path or something in the event that we run this against many rules at one time.

Comment on lines +384 to +385
check_whitespace(collect_token_positions(tree, "and"), 'and', lines)
check_whitespace(collect_token_positions(tree, "or"), 'or', lines)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be more optimized to do a single traversal to collect all the token values (instead of traversing twice)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, it would save the second (or potentially third) traversal, no particular need to parse twice as far as I can recall.

@@ -103,3 +103,10 @@ def test_optimization(self):
"{'match': {'destination.ip': '169.254.169.254'}}]}}]}}"
)
self.assertEqual(dsl_str, good_case, "DSL string does not match the good case, optimization failed.")

def test_blank_space(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about multi-line queries with tokens spanning lines or queries with leading/trailing whitespace on lines themselves?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I am following, is not '"Test-ServiceDaclPermission" or\n "Update-ExeFunctions"' a multi-line query with a token spanning the query? It would be rendered as:

"Test-ServiceDaclPermission" or
 "Update-ExeFunctions"

Which I think would be a query with the token spanning the line as the or requires both lines, and the needed second whitespace operator is leading on the second line? Could you give me an example, I'm not sure I am following?


def test_blank_space(self):
with self.assertRaises(kql.KqlParseError):
kql.lark_parse('"Test-ServiceDaclPermission" or"Update-ExeFunctions"')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add failure case with the space missing on the left side and failure case when \n is involved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test case where and or or is part of a field name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport: auto enhancement New feature or request kql related to the kql module patch python Internal python for the repository
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Missing Spaces Between Logic Operators Does Not Raise Error
4 participants