-
Notifications
You must be signed in to change notification settings - Fork 598
[FR] Refactor Schema Validation & Support Multi-Dataset Sequence Validation #5059
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FR] Refactor Schema Validation & Support Multi-Dataset Sequence Validation #5059
Conversation
Enhancement - GuidelinesThese guidelines serve as a reminder set of considerations when addressing adding a feature to the code. Documentation and Context
Code Standards and Practices
Testing
Additional Checks
|
@Mikaayenson have we tried a rule or query that is truly separate data sources (separate integrations)? Like Okta and Azure Activity logs? The rule mentioned is Azure integration, but Entra ID Protection logs and Entra ID Audit logs as separate data streams. Similar to how we correlate Entra ID Sign ins to Microsoft Graph activity here, but its an ESQL rule. The closest I believe to true separate data sources is this Okta rule which looks at Okta system logs and any logs reported by a Windows endpoint, but does not use |
@terrancedejesus Did you see the unit tests? |
rgr, thanks for sharing. I see from the testing we do the following: 1 integration:2+ datastreams That covers my question. Thank you! |
…hub.com:elastic/detection-rules into support-multidatasource-eql-integration-queries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verified code changes across back ports. Unit tests passes.
🟢 Tested that rule validation plan and rule execution. Only Rule that had a target that was not validated was a development rule which was expected. See details for additional testing. Details
test_loader_output.txt Use the following to test: from detection_rules.rule_loader import RuleCollection
rc = RuleCollection.default()
print("Done") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟢 Peer review, looks good to me! 👍
Remote CLI Tests also work
|
Pull Request
Issue link(s):
Summary - What I changed
In #4688 @Samirbous is adding the first multi-dataset query to the repo. His PR leveraged EQL to correlate across different datasources per subquery. This PR refactors the integration validation to support multiple datasources used within an eql sequence query (multiple packages with a single integration, multiple integrations).
Important
Instead of validating an entire eql sequence query with a single merged schema, we're not validating subqueries individually with the proper schemas.
To cleanup some of the
# type: ignore[reportUnknownVariableType]
litter, might be good to@typing.no_type_check
As part of this large refactor, the major change is that previously, we had several branching conditions and multiple validation calls per query just to double check validation. Now for each rule, we build a validation plan by pulling all the right schemas needed. Then execute that validation plan.
Warning
Bumping this to a minor version bump as it may break validation for users (now that we're identifying new potential errors).
How To Test
Failing Rules
Each have to be manually checked: double_check_siem_rules.txt
Note
Unit test will fail until these rules are tuned.
#5072
Additional Context
EQL’s parser accepts a single flat schema per parse. It has no concept of “schema scoped by dataset per subquery.” If you pass the whole sequence with a merged schema, you lose the ability to enforce that each subquery uses only the fields from its own integration/package.
Why not validate once with a merged schema
Why per-subquery validation is necessary
Checklist
bug
,enhancement
,schema
,maintenance
,Rule: New
,Rule: Deprecation
,Rule: Tuning
,Hunt: New
, orHunt: Tuning
so guidelines can be generatedmeta:rapid-merge
label if planning to merge within 24 hours