Skip to content

Conversation

Mikaayenson
Copy link
Contributor

@Mikaayenson Mikaayenson commented Sep 3, 2025

Pull Request

Issue link(s):

Summary - What I changed

In #4688 @Samirbous is adding the first multi-dataset query to the repo. His PR leveraged EQL to correlate across different datasources per subquery. This PR refactors the integration validation to support multiple datasources used within an eql sequence query (multiple packages with a single integration, multiple integrations).

Important

Instead of validating an entire eql sequence query with a single merged schema, we're not validating subqueries individually with the proper schemas.

To cleanup some of the # type: ignore[reportUnknownVariableType] litter, might be good to @typing.no_type_check

As part of this large refactor, the major change is that previously, we had several branching conditions and multiple validation calls per query just to double check validation. Now for each rule, we build a validation plan by pulling all the right schemas needed. Then execute that validation plan.

Warning

Bumping this to a minor version bump as it may break validation for users (now that we're identifying new potential errors).

How To Test

  • Added a new unit test class to validate, so CI should pass. I also refactored the test to have more consistent formatting in tests/test_python_library.py
  • Ensure we're not inadvertently breaking sequence validation
  • Ensure we didn't introduce regressions in DaC auto schema generation
  • In testing I identified several rule now failing validation (often when beats schemas were added to rules, where fields were not present in those beats schemas).
Failing Rules

Each have to be manually checked: double_check_siem_rules.txt

Note

Unit test will fail until these rules are tuned.

#5072

Additional Context

EQL’s parser accepts a single flat schema per parse. It has no concept of “schema scoped by dataset per subquery.” If you pass the whole sequence with a merged schema, you lose the ability to enforce that each subquery uses only the fields from its own integration/package.

Why not validate once with a merged schema

  • Superset masking: A field from integration A will exist in the merged schema even when you’re in a subquery whose dataset is integration B. The parse will succeed, and you won’t catch the misuse.
  • Type conflicts: Different packages can define the same field name with different types. A merged map can pick one type arbitrarily or last-wins, producing wrong acceptance or wrong errors.
  • Ambiguous errors: Even if you detect an error, you can’t attribute it cleanly to “subquery X vs package Y” because the validation had no subquery boundary.

Why per-subquery validation is necessary

  • Flat-schema constraint: EQL validates against one field-type map at a time. To emulate “dataset scoping,” we parse each subquery with only the fields from the dataset’s integration (plus ECS/index/custom as needed).
  • Correctness by construction: If a subquery references a field from another package, it won’t be present in that subquery’s schema, and the parser raises “Unknown field” (or “Field not recognized” with proper trailer).
  • Clear attribution: You get an error bound to the specific subquery and its intended package, which is actionable.

Checklist

  • Added a label for the type of pr: bug, enhancement, schema, maintenance, Rule: New, Rule: Deprecation, Rule: Tuning, Hunt: New, or Hunt: Tuning so guidelines can be generated
  • Added the meta:rapid-merge label if planning to merge within 24 hours
  • Secret and sensitive material has been managed correctly
  • Automated testing was updated or added to match the most common scenarios
  • Documentation and comments were added for features that require explanation

@Mikaayenson Mikaayenson self-assigned this Sep 3, 2025
@Mikaayenson Mikaayenson added enhancement New feature or request test-suite unit and other testing components python Internal python for the repository labels Sep 3, 2025
Copy link
Contributor

github-actions bot commented Sep 3, 2025

Enhancement - Guidelines

These guidelines serve as a reminder set of considerations when addressing adding a feature to the code.

Documentation and Context

  • Describe the feature enhancement in detail (alternative solutions, description of the solution, etc.) if not already documented in an issue.
  • Include additional context or screenshots.
  • Ensure the enhancement includes necessary updates to the documentation and versioning.

Code Standards and Practices

  • Code follows established design patterns within the repo and avoids duplication.
  • Ensure that the code is modular and reusable where applicable.

Testing

  • New unit tests have been added to cover the enhancement.
  • Existing unit tests have been updated to reflect the changes.
  • Provide evidence of testing and validating the enhancement (e.g., test logs, screenshots).
  • Validate that any rules affected by the enhancement are correctly updated.
  • Ensure that performance is not negatively impacted by the changes.
  • Verify that any release artifacts are properly generated and tested.
  • Conducted system testing, including fleet, import, and create APIs (e.g., run make test-cli, make test-remote-cli, make test-hunting-cli)

Additional Checks

  • Verify that the enhancement works across all relevant environments (e.g., different OS versions).
  • Confirm that the proper version label is applied to the PR patch, minor, major.

@Mikaayenson Mikaayenson marked this pull request as draft September 3, 2025 21:13
@Mikaayenson Mikaayenson marked this pull request as ready for review September 3, 2025 21:41
@terrancedejesus
Copy link
Contributor

terrancedejesus commented Sep 4, 2025

@Mikaayenson have we tried a rule or query that is truly separate data sources (separate integrations)? Like Okta and Azure Activity logs? The rule mentioned is Azure integration, but Entra ID Protection logs and Entra ID Audit logs as separate data streams. Similar to how we correlate Entra ID Sign ins to Microsoft Graph activity here, but its an ESQL rule. The closest I believe to true separate data sources is this Okta rule which looks at Okta system logs and any logs reported by a Windows endpoint, but does not use event.dataset and thus we did not run into this support issue.

@Mikaayenson
Copy link
Contributor Author

Mikaayenson commented Sep 4, 2025

@Mikaayenson have we tried a rule or query that is truly separate data sources (separate integrations)? Like Okta and Azure Activity logs? The rule mentioned is Azure integration, but Entra ID Protection logs and Entra ID Audit logs as separate data streams. Similar to how we correlate Entra ID Sign ins to Microsoft Graph activity here, but its an ESQL rule. The closest I believe to true separate data sources is this Okta rule which looks at Okta system logs and any logs reported by a Windows endpoint, but does not use event.dataset and thus we did not run into this support issue.

@terrancedejesus Did you see the unit tests?

@terrancedejesus
Copy link
Contributor

@terrancedejesus Did you see the unit tests?

rgr, thanks for sharing. I see from the testing we do the following:

1 integration:2+ datastreams
2 integrations:2+datastreams

That covers my question. Thank you!

@Mikaayenson Mikaayenson marked this pull request as draft September 5, 2025 03:47
@Mikaayenson Mikaayenson marked this pull request as ready for review September 6, 2025 09:32
Copy link
Contributor

@shashank-elastic shashank-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified code changes across back ports. Unit tests passes.

@eric-forte-elastic
Copy link
Contributor

🟢 Tested that rule validation plan and rule execution. Only Rule that had a target that was not validated was a development rule which was expected. See details for additional testing.

Details

test_loader_output.txt
testing_validator.patch

Use the following to test:

from detection_rules.rule_loader import RuleCollection

rc = RuleCollection.default()

print("Done")

Copy link
Contributor

@eric-forte-elastic eric-forte-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Peer review, looks good to me! 👍

@shashank-elastic
Copy link
Contributor

Remote CLI Tests also work

❯ ./detection_rules/etc/test_remote_cli.bash 
Running detection-rules remote CLI tests...
Performing a quick rule alerts search...
Requires .detection-rules-cfg.json credentials file set.
Loaded config file: /Users/shashankks/elastic_workspace/detection-rules/.detection-rules-cfg.json

█▀▀▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄   ▄      █▀▀▄ ▄  ▄ ▄   ▄▄▄ ▄▄▄
█  █ █▄▄  █  █▄▄ █    █   █  █ █ █▀▄ █      █▄▄▀ █  █ █   █▄▄ █▄▄
█▄▄▀ █▄▄  █  █▄▄ █▄▄  █  ▄█▄ █▄█ █ ▀▄█      █ ▀▄ █▄▄█ █▄▄ █▄▄ ▄▄█

==================================================================================================================================================
                                                                                      kibana                                                      
                                                                                      alert                                                       
 host                           rule                                                                                                              
 hostname                       name                                                                            status   original_time            
==================================================================================================================================================
 trade-test-local-vm.local      Malicious Behavior Detection Alert: DARKRADIATION Ransomware Infection          active   2025-09-04T04:49:46.901Z 
 e2e-release-test-instance-2    Malicious Behavior Detection Alert: DARKRADIATION Ransomware Infection          active   2025-09-04T04:48:31.526Z 
 e2e-release-windows-server-2   Malicious Behavior Detection Alert: Suspicious Bitsadmin Activity               active   2025-09-04T04:46:55.780Z 
 e2e-release-windows-server-2   Malicious Behavior Detection Alert: Suspicious Microsoft Office Child Process   active   2025-09-04T04:46:55.843Z 
==================================================================================================================================================
Setting Up Custom Directory...
Loaded config file: /Users/shashankks/elastic_workspace/detection-rules/.detection-rules-cfg.json

█▀▀▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄   ▄      █▀▀▄ ▄  ▄ ▄   ▄▄▄ ▄▄▄
█  █ █▄▄  █  █▄▄ █    █   █  █ █ █▀▄ █      █▄▄▀ █  █ █   █▄▄ █▄▄
█▄▄▀ █▄▄  █  █▄▄ █▄▄  █  ▄█▄ █▄█ █ ▀▄█      █ ▀▄ █▄▄█ █▄▄ █▄▄ ▄▄█

Created directory: tmp-custom/actions
Created directory: tmp-custom/action_connectors
Created directory: tmp-custom/exceptions
Created directory: tmp-custom/rules
Created directory: tmp-custom/rules_building_block
Created directory: tmp-custom/etc
Created file with default content: tmp-custom/etc/deprecated_rules.json
Created file with default content: tmp-custom/etc/version.lock.json
Created file with default content: tmp-custom/etc/packages.yaml
Created file with default content: tmp-custom/etc/stack-schema-map.yaml
Created file with default content: tmp-custom/etc/test_config.yaml
Created file with default content: tmp-custom/_config.yaml

# For details on how to configure the _config.yaml file,
# consult: /Users/shashankks/elastic_workspace/detection-rules/detection_rules/etc/_config.yaml
# or the docs: /Users/shashankks/elastic_workspace/detection-rules/docs-dev/custom-rules-management.md
Performing a rule conversion from ndjson to toml files...
Loaded config file: /Users/shashankks/elastic_workspace/detection-rules/.detection-rules-cfg.json

█▀▀▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄   ▄      █▀▀▄ ▄  ▄ ▄   ▄▄▄ ▄▄▄
█  █ █▄▄  █  █▄▄ █    █   █  █ █ █▀▄ █      █▄▄▀ █  █ █   █▄▄ █▄▄
█▄▄▀ █▄▄  █  █▄▄ █▄▄  █  ▄█▄ █▄█ █ ▀▄█      █ ▀▄ █▄▄█ █▄▄ █▄▄ ▄▄█

[+] Building rule for tmp-custom/rules/test_kql_rule.toml
[+] Building rule for tmp-custom/rules/test_kql_with_alert_supprestion_and_investigation_fileds.toml
[+] Building rule for tmp-custom/rules/test_kql_with_alert_suppression.toml
[+] Building rule for tmp-custom/rules/test_eql_rule.toml
[+] Building rule for tmp-custom/rules/test_esql_rule_with_shared_rule_exception.toml
[+] Building rule for tmp-custom/rules/test_new_terms_rule_with_shared_rule_exception.toml
[+] Building rule for tmp-custom/rules/test_indicator_match_rule_with_email_actions.toml
[+] Building rule for tmp-custom/rules/test_threshold_with_rule_exception.toml
[+] Building rule for tmp-custom/rules/test_machine_learning_rule_with_index_action_connector.toml
[+] Building exception(s) for /Users/shashankks/elastic_workspace/detection-rules/tmp-custom/exceptions/1c8a1378-8f0d-4565-9ae0-abeeaf3981ca_exceptions.toml
[+] Building exception(s) for /Users/shashankks/elastic_workspace/detection-rules/tmp-custom/exceptions/0a4124f8-2074-450b-8689-d7dee319c666_exceptions.toml
[+] Building action connector(s) for /Users/shashankks/elastic_workspace/detection-rules/tmp-custom/action_connectors/e1b418e7-78df-4042-bfb0-1cc5fb6f7a4e_actions.toml
14 results exported
9 rules converted
4 exceptions exported
1 actions connectors exported
Performing a rule import to kibana...
Loaded config file: /Users/shashankks/elastic_workspace/detection-rules/.detection-rules-cfg.json

█▀▀▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄   ▄      █▀▀▄ ▄  ▄ ▄   ▄▄▄ ▄▄▄
█  █ █▄▄  █  █▄▄ █    █   █  █ █ █▀▄ █      █▄▄▀ █  █ █   █▄▄ █▄▄
█▄▄▀ █▄▄  █  █▄▄ █▄▄  █  ▄█▄ █▄█ █ ▀▄█      █ ▀▄ █▄▄█ █▄▄ █▄▄ ▄▄█

9 rule(s) successfully imported
 - eql-outbound-rundll32-connections
 - 7e0f6dae-5847-465f-89e9-a6de0e9ef918
 - 4c589d81-2622-4036-8cc7-372ea8f0e038
 - process_started_by_ms_office_program
 - process_started_by_ms_office_program_supression
 - 742feb36-ac4c-45e0-b8a5-3b3cfa66b6d2
 - ml_linux_network_high_threshold
 - 2390c9dd-ad90-4af6-97a4-1d607ba0f092
 - liv-win-ser-logins
Performing a rule export...
Loaded config file: /Users/shashankks/elastic_workspace/detection-rules/.detection-rules-cfg.json

█▀▀▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄   ▄      █▀▀▄ ▄  ▄ ▄   ▄▄▄ ▄▄▄
█  █ █▄▄  █  █▄▄ █    █   █  █ █ █▀▄ █      █▄▄▀ █  █ █   █▄▄ █▄▄
█▄▄▀ █▄▄  █  █▄▄ █▄▄  █  ▄█▄ █▄█ █ ▀▄█      █ ▀▄ █▄▄█ █▄▄ █▄▄ ▄▄█

14 results exported
9 rules converted
2 exceptions exported
1 action connectors exported
9 rules saved to tmp-custom
2 exception lists saved to /Users/shashankks/elastic_workspace/detection-rules/tmp-custom/exceptions
1 action connectors saved to /Users/shashankks/elastic_workspace/detection-rules/tmp-custom/action_connectors
Removing generated files...
Detection-rules Remote CLI tests completed!

detection-rules on  support-multidatasource-eql-integration-queries [$?] is 📦 v1.4.0 via 🐍 v3.12.8 (.venv) on ☁️  [email protected] took 19s 
❯ 

@Mikaayenson Mikaayenson merged commit f0f7d21 into main Sep 10, 2025
17 of 20 checks passed
@Mikaayenson Mikaayenson deleted the support-multidatasource-eql-integration-queries branch September 10, 2025 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport: auto enhancement New feature or request minor python Internal python for the repository schema test-suite unit and other testing components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Integration Validation Missing Dataset Specific Schemas [Bug] EQL Sequence Multi-Data Source Schema Validation
6 participants