[FR] Refactor Schema Validation & Support Multi-Dataset Sequence Validation #5059

Mikaayenson · 2025-09-03T20:46:21Z

Pull Request

Issue link(s):

Summary - What I changed

In #4688 @Samirbous is adding the first multi-dataset query to the repo. His PR leveraged EQL to correlate across different datasources per subquery. This PR refactors the integration validation to support multiple datasources used within an eql sequence query (multiple packages with a single integration, multiple integrations).

Important

Instead of validating an entire eql sequence query with a single merged schema, we're not validating subqueries individually with the proper schemas.

To cleanup some of the # type: ignore[reportUnknownVariableType] litter, might be good to @typing.no_type_check

As part of this large refactor, the major change is that previously, we had several branching conditions and multiple validation calls per query just to double check validation. Now for each rule, we build a validation plan by pulling all the right schemas needed. Then execute that validation plan.

Warning

Bumping this to a minor version bump as it may break validation for users (now that we're identifying new potential errors).

How To Test

Added a new unit test class to validate, so CI should pass. I also refactored the test to have more consistent formatting in tests/test_python_library.py
Ensure we're not inadvertently breaking sequence validation
Ensure we didn't introduce regressions in DaC auto schema generation
In testing I identified several rule now failing validation (often when beats schemas were added to rules, where fields were not present in those beats schemas).

Failing Rules

Each have to be manually checked: double_check_siem_rules.txt

Note

Unit test will fail until these rules are tuned.

#5072

Additional Context

EQL’s parser accepts a single flat schema per parse. It has no concept of “schema scoped by dataset per subquery.” If you pass the whole sequence with a merged schema, you lose the ability to enforce that each subquery uses only the fields from its own integration/package.

Why not validate once with a merged schema

Superset masking: A field from integration A will exist in the merged schema even when you’re in a subquery whose dataset is integration B. The parse will succeed, and you won’t catch the misuse.
Type conflicts: Different packages can define the same field name with different types. A merged map can pick one type arbitrarily or last-wins, producing wrong acceptance or wrong errors.
Ambiguous errors: Even if you detect an error, you can’t attribute it cleanly to “subquery X vs package Y” because the validation had no subquery boundary.

Why per-subquery validation is necessary

Flat-schema constraint: EQL validates against one field-type map at a time. To emulate “dataset scoping,” we parse each subquery with only the fields from the dataset’s integration (plus ECS/index/custom as needed).
Correctness by construction: If a subquery references a field from another package, it won’t be present in that subquery’s schema, and the parser raises “Unknown field” (or “Field not recognized” with proper trailer).
Clear attribution: You get an error bound to the specific subquery and its intended package, which is actionable.

Checklist

Added a label for the type of pr: bug, enhancement, schema, maintenance, Rule: New, Rule: Deprecation, Rule: Tuning, Hunt: New, or Hunt: Tuning so guidelines can be generated
Added the meta:rapid-merge label if planning to merge within 24 hours
Secret and sensitive material has been managed correctly
Automated testing was updated or added to match the most common scenarios
Documentation and comments were added for features that require explanation

github-actions · 2025-09-03T20:46:33Z

terrancedejesus · 2025-09-04T14:29:06Z

@Mikaayenson have we tried a rule or query that is truly separate data sources (separate integrations)? Like Okta and Azure Activity logs? The rule mentioned is Azure integration, but Entra ID Protection logs and Entra ID Audit logs as separate data streams. Similar to how we correlate Entra ID Sign ins to Microsoft Graph activity here, but its an ESQL rule. The closest I believe to true separate data sources is this Okta rule which looks at Okta system logs and any logs reported by a Windows endpoint, but does not use event.dataset and thus we did not run into this support issue.

Mikaayenson · 2025-09-04T14:34:54Z

@Mikaayenson have we tried a rule or query that is truly separate data sources (separate integrations)? Like Okta and Azure Activity logs? The rule mentioned is Azure integration, but Entra ID Protection logs and Entra ID Audit logs as separate data streams. Similar to how we correlate Entra ID Sign ins to Microsoft Graph activity here, but its an ESQL rule. The closest I believe to true separate data sources is this Okta rule which looks at Okta system logs and any logs reported by a Windows endpoint, but does not use event.dataset and thus we did not run into this support issue.

@terrancedejesus Did you see the unit tests?

terrancedejesus · 2025-09-04T14:48:10Z

@terrancedejesus Did you see the unit tests?

rgr, thanks for sharing. I see from the testing we do the following:

1 integration:2+ datastreams
2 integrations:2+datastreams

That covers my question. Thank you!

detection_rules/rule_validators.py

…hub.com:elastic/detection-rules into support-multidatasource-eql-integration-queries

detection_rules/rule_validators.py

shashank-elastic

Verified code changes across back ports. Unit tests passes.

eric-forte-elastic · 2025-09-10T17:54:27Z

🟢 Tested that rule validation plan and rule execution. Only Rule that had a target that was not validated was a development rule which was expected. See details for additional testing.

Details

test_loader_output.txt
testing_validator.patch

Use the following to test:

from detection_rules.rule_loader import RuleCollection

rc = RuleCollection.default()

print("Done")

eric-forte-elastic

🟢 Peer review, looks good to me! 👍

shashank-elastic · 2025-09-10T18:01:52Z

Remote CLI Tests also work

❯ ./detection_rules/etc/test_remote_cli.bash 
Running detection-rules remote CLI tests...
Performing a quick rule alerts search...
Requires .detection-rules-cfg.json credentials file set.
Loaded config file: /Users/shashankks/elastic_workspace/detection-rules/.detection-rules-cfg.json

█▀▀▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄   ▄      █▀▀▄ ▄  ▄ ▄   ▄▄▄ ▄▄▄
█  █ █▄▄  █  █▄▄ █    █   █  █ █ █▀▄ █      █▄▄▀ █  █ █   █▄▄ █▄▄
█▄▄▀ █▄▄  █  █▄▄ █▄▄  █  ▄█▄ █▄█ █ ▀▄█      █ ▀▄ █▄▄█ █▄▄ █▄▄ ▄▄█

==================================================================================================================================================
                                                                                      kibana                                                      
                                                                                      alert                                                       
 host                           rule                                                                                                              
 hostname                       name                                                                            status   original_time            
==================================================================================================================================================
 trade-test-local-vm.local      Malicious Behavior Detection Alert: DARKRADIATION Ransomware Infection          active   2025-09-04T04:49:46.901Z 
 e2e-release-test-instance-2    Malicious Behavior Detection Alert: DARKRADIATION Ransomware Infection          active   2025-09-04T04:48:31.526Z 
 e2e-release-windows-server-2   Malicious Behavior Detection Alert: Suspicious Bitsadmin Activity               active   2025-09-04T04:46:55.780Z 
 e2e-release-windows-server-2   Malicious Behavior Detection Alert: Suspicious Microsoft Office Child Process   active   2025-09-04T04:46:55.843Z 
==================================================================================================================================================
Setting Up Custom Directory...
Loaded config file: /Users/shashankks/elastic_workspace/detection-rules/.detection-rules-cfg.json

█▀▀▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄   ▄      █▀▀▄ ▄  ▄ ▄   ▄▄▄ ▄▄▄
█  █ █▄▄  █  █▄▄ █    █   █  █ █ █▀▄ █      █▄▄▀ █  █ █   █▄▄ █▄▄
█▄▄▀ █▄▄  █  █▄▄ █▄▄  █  ▄█▄ █▄█ █ ▀▄█      █ ▀▄ █▄▄█ █▄▄ █▄▄ ▄▄█

Created directory: tmp-custom/actions
Created directory: tmp-custom/action_connectors
Created directory: tmp-custom/exceptions
Created directory: tmp-custom/rules
Created directory: tmp-custom/rules_building_block
Created directory: tmp-custom/etc
Created file with default content: tmp-custom/etc/deprecated_rules.json
Created file with default content: tmp-custom/etc/version.lock.json
Created file with default content: tmp-custom/etc/packages.yaml
Created file with default content: tmp-custom/etc/stack-schema-map.yaml
Created file with default content: tmp-custom/etc/test_config.yaml
Created file with default content: tmp-custom/_config.yaml

# For details on how to configure the _config.yaml file,
# consult: /Users/shashankks/elastic_workspace/detection-rules/detection_rules/etc/_config.yaml
# or the docs: /Users/shashankks/elastic_workspace/detection-rules/docs-dev/custom-rules-management.md
Performing a rule conversion from ndjson to toml files...
Loaded config file: /Users/shashankks/elastic_workspace/detection-rules/.detection-rules-cfg.json

█▀▀▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄   ▄      █▀▀▄ ▄  ▄ ▄   ▄▄▄ ▄▄▄
█  █ █▄▄  █  █▄▄ █    █   █  █ █ █▀▄ █      █▄▄▀ █  █ █   █▄▄ █▄▄
█▄▄▀ █▄▄  █  █▄▄ █▄▄  █  ▄█▄ █▄█ █ ▀▄█      █ ▀▄ █▄▄█ █▄▄ █▄▄ ▄▄█

[+] Building rule for tmp-custom/rules/test_kql_rule.toml
[+] Building rule for tmp-custom/rules/test_kql_with_alert_supprestion_and_investigation_fileds.toml
[+] Building rule for tmp-custom/rules/test_kql_with_alert_suppression.toml
[+] Building rule for tmp-custom/rules/test_eql_rule.toml
[+] Building rule for tmp-custom/rules/test_esql_rule_with_shared_rule_exception.toml
[+] Building rule for tmp-custom/rules/test_new_terms_rule_with_shared_rule_exception.toml
[+] Building rule for tmp-custom/rules/test_indicator_match_rule_with_email_actions.toml
[+] Building rule for tmp-custom/rules/test_threshold_with_rule_exception.toml
[+] Building rule for tmp-custom/rules/test_machine_learning_rule_with_index_action_connector.toml
[+] Building exception(s) for /Users/shashankks/elastic_workspace/detection-rules/tmp-custom/exceptions/1c8a1378-8f0d-4565-9ae0-abeeaf3981ca_exceptions.toml
[+] Building exception(s) for /Users/shashankks/elastic_workspace/detection-rules/tmp-custom/exceptions/0a4124f8-2074-450b-8689-d7dee319c666_exceptions.toml
[+] Building action connector(s) for /Users/shashankks/elastic_workspace/detection-rules/tmp-custom/action_connectors/e1b418e7-78df-4042-bfb0-1cc5fb6f7a4e_actions.toml
14 results exported
9 rules converted
4 exceptions exported
1 actions connectors exported
Performing a rule import to kibana...
Loaded config file: /Users/shashankks/elastic_workspace/detection-rules/.detection-rules-cfg.json

█▀▀▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄   ▄      █▀▀▄ ▄  ▄ ▄   ▄▄▄ ▄▄▄
█  █ █▄▄  █  █▄▄ █    █   █  █ █ █▀▄ █      █▄▄▀ █  █ █   █▄▄ █▄▄
█▄▄▀ █▄▄  █  █▄▄ █▄▄  █  ▄█▄ █▄█ █ ▀▄█      █ ▀▄ █▄▄█ █▄▄ █▄▄ ▄▄█

9 rule(s) successfully imported
 - eql-outbound-rundll32-connections
 - 7e0f6dae-5847-465f-89e9-a6de0e9ef918
 - 4c589d81-2622-4036-8cc7-372ea8f0e038
 - process_started_by_ms_office_program
 - process_started_by_ms_office_program_supression
 - 742feb36-ac4c-45e0-b8a5-3b3cfa66b6d2
 - ml_linux_network_high_threshold
 - 2390c9dd-ad90-4af6-97a4-1d607ba0f092
 - liv-win-ser-logins
Performing a rule export...
Loaded config file: /Users/shashankks/elastic_workspace/detection-rules/.detection-rules-cfg.json

█▀▀▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄   ▄      █▀▀▄ ▄  ▄ ▄   ▄▄▄ ▄▄▄
█  █ █▄▄  █  █▄▄ █    █   █  █ █ █▀▄ █      █▄▄▀ █  █ █   █▄▄ █▄▄
█▄▄▀ █▄▄  █  █▄▄ █▄▄  █  ▄█▄ █▄█ █ ▀▄█      █ ▀▄ █▄▄█ █▄▄ █▄▄ ▄▄█

14 results exported
9 rules converted
2 exceptions exported
1 action connectors exported
9 rules saved to tmp-custom
2 exception lists saved to /Users/shashankks/elastic_workspace/detection-rules/tmp-custom/exceptions
1 action connectors saved to /Users/shashankks/elastic_workspace/detection-rules/tmp-custom/action_connectors
Removing generated files...
Detection-rules Remote CLI tests completed!

detection-rules on  support-multidatasource-eql-integration-queries [$?] is 📦 v1.4.0 via 🐍 v3.12.8 (.venv) on ☁️  [email protected] took 19s 
❯

Mikaayenson added 2 commits September 3, 2025 15:17

[FR] Support Multi-Dataset Sequence Validation

898defc

Add test for multiple integrations in a query

991ba7c

Mikaayenson requested a review from Samirbous September 3, 2025 20:46

Mikaayenson self-assigned this Sep 3, 2025

Mikaayenson requested review from eric-forte-elastic and traut as code owners September 3, 2025 20:46

Mikaayenson added enhancement New feature or request test-suite unit and other testing components python Internal python for the repository labels Sep 3, 2025

github-actions bot added backport: auto labels Sep 3, 2025

Mikaayenson marked this pull request as draft September 3, 2025 21:13

Add additional test cases

32dd5f8

Mikaayenson marked this pull request as ready for review September 3, 2025 21:41

Samirbous approved these changes Sep 4, 2025

View reviewed changes

terrancedejesus approved these changes Sep 4, 2025

View reviewed changes

leverage eql to validate subquery with synthetic sequence

9229c52

eric-forte-elastic reviewed Sep 4, 2025

View reviewed changes

detection_rules/rule_validators.py Outdated Show resolved Hide resolved

eric-forte-elastic reviewed Sep 4, 2025

View reviewed changes

detection_rules/rule_validators.py Outdated Show resolved Hide resolved

Mikaayenson marked this pull request as draft September 5, 2025 03:47

Mikaayenson and others added 6 commits September 5, 2025 06:16

Add additional unit test

6cf6665

refactor related integration validation

468c377

Merge branch 'main' into support-multidatasource-eql-integration-queries

5bd2d79

cleanup group by logic

50bd0bd

Merge branch 'support-multidatasource-eql-integration-queries' of git…

c29d07a

…hub.com:elastic/detection-rules into support-multidatasource-eql-integration-queries

refactor schema validation

cfc7364

Mikaayenson marked this pull request as ready for review September 6, 2025 09:32

Mikaayenson commented Sep 8, 2025

View reviewed changes

detection_rules/rule_validators.py Show resolved Hide resolved

botelastic bot added the schema label Sep 8, 2025

Mikaayenson mentioned this pull request Sep 8, 2025

[Rule Tuning] Beats & Endgame Indices #5072

Merged

5 tasks

Mikaayenson and others added 17 commits September 8, 2025 11:07

add mapping for winlog

da68568

add o365.audit.ExtendedProperties.RequestType to non-ecs-schema

908100b

skip kql validation for stack combos when beats/endgame arent included.

b5a6156

add problemchild fields to winlog non-ecs-schema

748373d

add winlog fields to non-ecs-schema

721e496

better error messages

061c4cc

Update non-ecs-schema.json

37a4593

add more auditbeat fields to non-ecs

ee35586

Merge branch 'main' into support-multidatasource-eql-integration-queries

4bfe850

Merge branch 'main' into support-multidatasource-eql-integration-queries

890e208

Merge branch 'main' into support-multidatasource-eql-integration-queries

9708ddc

Merge branch 'main' into support-multidatasource-eql-integration-queries

d91afd9

remove old test case

7c00fc9

Merge branch 'main' into support-multidatasource-eql-integration-queries

0db8038

lint

7a9768f

adjust type hint

0714f67

build schemas on each retry

91b87e7

shashank-elastic approved these changes Sep 10, 2025

View reviewed changes

Merge branch 'main' into support-multidatasource-eql-integration-queries

b5a5b82

eric-forte-elastic approved these changes Sep 10, 2025

View reviewed changes

Merge branch 'main' into support-multidatasource-eql-integration-queries

3350063

Mikaayenson merged commit f0f7d21 into main Sep 10, 2025
17 of 20 checks passed

Mikaayenson deleted the support-multidatasource-eql-integration-queries branch September 10, 2025 18:11

eric-forte-elastic mentioned this pull request Sep 10, 2025

[New] Microsoft Entra ID Protection Alert and Device Registration #4688

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FR] Refactor Schema Validation & Support Multi-Dataset Sequence Validation #5059

[FR] Refactor Schema Validation & Support Multi-Dataset Sequence Validation #5059

Mikaayenson commented Sep 3, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 3, 2025

Uh oh!

terrancedejesus commented Sep 4, 2025 •

edited

Loading

Uh oh!

Mikaayenson commented Sep 4, 2025 •

edited

Loading

Uh oh!

terrancedejesus commented Sep 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shashank-elastic left a comment

Uh oh!

eric-forte-elastic commented Sep 10, 2025

Uh oh!

eric-forte-elastic left a comment

Uh oh!

shashank-elastic commented Sep 10, 2025

Uh oh!

Uh oh!

Uh oh!

[FR] Refactor Schema Validation & Support Multi-Dataset Sequence Validation #5059

[FR] Refactor Schema Validation & Support Multi-Dataset Sequence Validation #5059

Conversation

Mikaayenson commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request

Summary - What I changed

How To Test

Additional Context

Why not validate once with a merged schema

Why per-subquery validation is necessary

Checklist

Uh oh!

github-actions bot commented Sep 3, 2025

Enhancement - Guidelines

Documentation and Context

Code Standards and Practices

Testing

Additional Checks

Uh oh!

terrancedejesus commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Mikaayenson commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

terrancedejesus commented Sep 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shashank-elastic left a comment

Choose a reason for hiding this comment

Uh oh!

eric-forte-elastic commented Sep 10, 2025

Uh oh!

eric-forte-elastic left a comment

Choose a reason for hiding this comment

Uh oh!

shashank-elastic commented Sep 10, 2025

Uh oh!

Uh oh!

Uh oh!

Mikaayenson commented Sep 3, 2025 •

edited

Loading

terrancedejesus commented Sep 4, 2025 •

edited

Loading

Mikaayenson commented Sep 4, 2025 •

edited

Loading