Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Implementation of Extended Isolation Forest (EIF) anomaly detector #2679

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

Akhil-Jasson
Copy link
Contributor

@Akhil-Jasson Akhil-Jasson commented Mar 23, 2025

This PR implements the Extended Isolation Forest (EIF) algorithm.

Reference Issues/PRs

Fixes #2113

What does this implement/fix? Explain your changes.

Does your contribution introduce a new dependency? If yes, which one?

Yes, it introduces H20.ai as a new dependency

Any other comments?

PR checklist

For all contributions
  • I've added myself to the list of contributors. Alternatively, you can use the @all-contributors bot to do this for you after the PR has been merged.
  • The PR title starts with either [ENH], [MNT], [DOC], [BUG], [REF], [DEP] or [GOV] indicating whether the PR topic is related to enhancement, maintenance, documentation, bugs, refactoring, deprecation or governance.
For new estimators and functions
  • I've added the estimator/function to the online API documentation.
  • (OPTIONAL) I've added myself as a __maintainer__ at the top of relevant files and want to be contacted regarding its maintenance. Unmaintained files may be removed. This is for the full file, and you should not add yourself if you are just making minor changes or do not want to help maintain its contents.
For developers with write access
  • (OPTIONAL) I've updated aeon's CODEOWNERS to receive notifications about future changes to these files.

@aeon-actions-bot
Copy link
Contributor

Thank you for contributing to aeon

I did not find any labels to add based on the title. Please add the [ENH], [MNT], [BUG], [DOC], [REF], [DEP] and/or [GOV] tags to your pull requests titles. For now you can add the labels manually.
I have added the following labels to this PR based on the changes made: [ $\color{#6F6E8D}{\textsf{anomaly detection}}$ ]. Feel free to change these if they do not properly represent the PR.

The Checks tab will show the status of our automated tests. You can click on individual test runs in the tab or "Details" in the panel below to see more information if there is a failure.

If our pre-commit code quality check fails, any trivial fixes will automatically be pushed to your PR unless it is a draft.

Don't hesitate to ask questions on the aeon Slack channel if you have any.

PR CI actions

These checkboxes will add labels to enable/disable CI functionality for this PR. This may not take effect immediately, and a new commit may be required to run the new configuration.

  • Run pre-commit checks for all files
  • Run mypy typecheck tests
  • Run all pytest tests and configurations
  • Run all notebook example tests
  • Run numba-disabled codecov tests
  • Stop automatic pre-commit fixes (always disabled for drafts)
  • Disable numba cache loading
  • Push an empty commit to re-run CI checks

@MatthewMiddlehurst
Copy link
Member

Please fill out the template and use the correct title format.

@MatthewMiddlehurst MatthewMiddlehurst added the enhancement New feature, improvement request or other non-bug code enhancement label Mar 23, 2025
@Akhil-Jasson Akhil-Jasson changed the title 2113 Implementation of Extended Isolation Forest (EIF) anomaly detector [ENH] Implementation of Extended Isolation Forest (EIF) anomaly detector Mar 23, 2025
@Akhil-Jasson
Copy link
Contributor Author

I've fixed the title format. Shall I proceed with adding EIF to the online API documentation?

@Ramana-Raja
Copy link

Ramana-Raja commented Mar 24, 2025

Hi @Akhil-Jasson,

I just saw your code, and it looks great! That said, I don’t think using H2O is the best approach since Aeon doesn’t rely on it. It might be better to have our own implementation instead.

A few updates to consider:

1.Could you update the section "Does your contribution introduce a new dependency?" and mention H2O there?

2.The test cases seem to be missing—could you add them?

3.Instead of importing the entire H2O module, it’s better to import only what’s needed to keep things lightweight.

@MatthewMiddlehurst
Copy link
Member

New dependencies should be put in pyproject.toml otherwise this won't be tested, Still bits missing from the template

@Akhil-Jasson
Copy link
Contributor Author

I've added the h2o dependency to pyproject.toml, but I'm encountering errors when running the test files. The test attempts to import aeon.anomaly_detection._eif but fails, indicating that the module doesn’t exist yet.

Is there a step I'm missing for adding new modules to aeon? What could be the possible issue?

@MatthewMiddlehurst
Copy link
Member

your import is incorrect.

Copy link
Member

@SebastianSchmidl SebastianSchmidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI2O looks like a massive package. Do we actually want to include it as a dependency? The issue clearly states that we are looking for an implementation in aeon directly.

I know this is not mentioned in the corresponding issue, but I think it makes sense to work with sliding windows in EIF as well. We can always get the original behavior back by setting the window-size to 1.

self.contamination = contamination
self.extension_level = extension_level
self.random_state = random_state
self.scaler = StandardScaler()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scaler is not a parameter, so cannot be instantiated here. Please read the sklearn estimator development docs on how to name and where to place parameters, fitted and non-fitted attributes, etc.

Comment on lines +96 to +98
# Fit the scaler
self.scaler.fit(X)
X_scaled = self.scaler.transform(X)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The paper does not explicitly mention that scaling the data provides better results. In anomaly detection scaling might hide some types of anomalies. Why do you include it?


return self

def _predict(self, X) -> np.ndarray:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make the usage of EIF similar to our other models, we want it to be usable as a semi-supervised (as implemented already) and an unsupervised algorithm. The current implementation of _predict does not allow that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
anomaly detection Anomaly detection package enhancement New feature, improvement request or other non-bug code enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] Implement Extended Isolation Forest
4 participants