-
Notifications
You must be signed in to change notification settings - Fork 27
Multivariate detector #52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
abaranov25
merged 20 commits into
sintel-dev:master
from
abaranov25:Multivariate-Detector
Mar 12, 2026
Merged
Changes from all commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
088592f
Added multivariate detector pipeline with formatting methods
df34b4d
Add verbose flag to formatting methods and clean up comments
24c96d1
Added multi-step-ahead predictions and disentangled this branch from …
317620e
Addressing comments for PR
8159622
Addressing PR Comments
f4ea7f1
Tutorial Notebook + trunc behavior
cf4c192
Added multivariate dataset to tutorial notebook
8864f14
Fixed lints
6468a71
Merge remote-tracking branch 'origin/master' into Multivariate-Detector
46ed9ce
Added unit tests
6ef46d7
Fixed lints
7cb05ba
Removing unrelated tutorial from PR
d5653cf
Addressing PR comments:
57b5325
Addressing PR comments
b998546
restoring detector pipeline
2c015f0
Fixing lint issues
056ffa9
Ran tutorial notebook to completion
36b82b3
Slight change to docstrings
0f6370b
Addressing comments
bf3825e
fix lint
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
116 changes: 116 additions & 0 deletions
116
sigllm/pipelines/detector/multivariate_mistral_detector_jsonformat.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,116 @@ | ||
| { | ||
| "primitives": [ | ||
| "mlstars.custom.timeseries_preprocessing.time_segments_aggregate", | ||
| "sklearn.impute.SimpleImputer", | ||
| "sigllm.primitives.transformation.Float2Scalar", | ||
| "mlstars.custom.timeseries_preprocessing.rolling_window_sequences", | ||
| "sigllm.primitives.formatting.json_format.format_as_string", | ||
| "sigllm.primitives.forecasting.huggingface.HF", | ||
| "sigllm.primitives.formatting.json_format.format_as_integer", | ||
| "sigllm.primitives.transformation.Scalar2Float", | ||
| "sigllm.primitives.transformation.Scalar2Float", | ||
| "sigllm.primitives.postprocessing.aggregate_rolling_window", | ||
| "numpy.reshape", | ||
| "orion.primitives.timeseries_errors.regression_errors", | ||
| "orion.primitives.timeseries_anomalies.find_anomalies" | ||
| ], | ||
| "init_params": { | ||
| "mlstars.custom.timeseries_preprocessing.time_segments_aggregate#1": { | ||
| "time_column": "timestamp", | ||
| "interval": 21600, | ||
| "method": "mean" | ||
| }, | ||
| "sigllm.primitives.transformation.Float2Scalar#1": { | ||
| "decimal": 2, | ||
| "rescale": true | ||
| }, | ||
| "mlstars.custom.timeseries_preprocessing.rolling_window_sequences#1": { | ||
| "target_column": 0, | ||
| "window_size": 140, | ||
| "target_size": 1, | ||
| "step_size": 1 | ||
| }, | ||
| "sigllm.primitives.forecasting.huggingface.HF#1": { | ||
| "name": "mistralai/Mistral-7B-Instruct-v0.2", | ||
| "steps": 5, | ||
| "multivariate_allowed_symbols": [ | ||
| "d", | ||
| ":", | ||
| "," | ||
| ] | ||
| }, | ||
| "sigllm.primitives.formatting.json_format.format_as_integer#1": { | ||
| "trunc": 1, | ||
| "target_column": 0 | ||
| }, | ||
| "sigllm.primitives.postprocessing.aggregate_rolling_window#1": { | ||
| "agg": "median" | ||
| }, | ||
| "orion.primitives.timeseries_anomalies.find_anomalies#1": { | ||
| "window_size_portion": 0.3, | ||
| "window_step_size_portion": 0.1, | ||
| "fixed_threshold": true | ||
| } | ||
| }, | ||
| "input_names": { | ||
| "sigllm.primitives.transformation.Float2Scalar#1": { | ||
| "X": "y" | ||
| }, | ||
| "mlstars.custom.timeseries_preprocessing.rolling_window_sequences#1": { | ||
| "X": "y_scaled" | ||
| }, | ||
| "sigllm.primitives.formatting.json_format.format_as_integer#1": { | ||
| "X": "y_hat" | ||
| }, | ||
| "sigllm.primitives.transformation.Scalar2Float#1": { | ||
| "X": "y_hat", | ||
| "minimum": "minimum", | ||
| "decimal": "decimal" | ||
| }, | ||
| "sigllm.primitives.transformation.Scalar2Float#2": { | ||
| "X": "y", | ||
| "minimum": "minimum", | ||
| "decimal": "decimal" | ||
| }, | ||
| "sigllm.primitives.postprocessing.aggregate_rolling_window#1": { | ||
| "y": "y_hat" | ||
| }, | ||
| "numpy.reshape#1": { | ||
| "X": "y_hat" | ||
| }, | ||
| "orion.primitives.timeseries_anomalies.find_anomalies#1": { | ||
| "index": "target_index" | ||
| } | ||
| }, | ||
| "output_names": { | ||
| "sklearn.impute.SimpleImputer#1": { | ||
| "X": "y" | ||
| }, | ||
| "sigllm.primitives.transformation.Float2Scalar#1": { | ||
| "X": "y_scaled", | ||
| "minimum": "minimum", | ||
| "decimal": "decimal" | ||
| }, | ||
| "sigllm.primitives.forecasting.huggingface.HF#1": { | ||
| "y": "y_hat" | ||
| }, | ||
| "sigllm.primitives.formatting.json_format.format_as_integer#1": { | ||
| "X": "y_hat" | ||
| }, | ||
| "sigllm.primitives.transformation.Scalar2Float#1": { | ||
| "X": "y_hat" | ||
| }, | ||
| "sigllm.primitives.transformation.Scalar2Float#2": { | ||
| "X": "y" | ||
| }, | ||
| "sigllm.primitives.postprocessing.aggregate_rolling_window#1": { | ||
| "y": "y_hat" | ||
| }, | ||
| "numpy.reshape#1": { | ||
| "X": "y_hat" | ||
| }, | ||
| "orion.primitives.timeseries_anomalies.find_anomalies#1": { | ||
| "y": "anomalies" | ||
| } | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| """Multivariate formatting methods for time series data.""" | ||
|
|
||
| from sigllm.primitives.formatting.multivariate_formatting import MultivariateFormattingMethod | ||
| from sigllm.primitives.formatting.json_format import JSONFormat | ||
| from sigllm.primitives.formatting.univariate_control import UnivariateControl | ||
| from sigllm.primitives.formatting.persistence_control import PersistenceControl | ||
| from sigllm.primitives.formatting.value_concatenation import ValueConcatenation | ||
| from sigllm.primitives.formatting.value_interleave import ValueInterleave | ||
| from sigllm.primitives.formatting.digit_interleave import DigitInterleave | ||
|
|
||
| __all__ = [ | ||
| 'MultivariateFormattingMethod', | ||
| 'JSONFormat', | ||
| 'UnivariateControl', | ||
| 'PersistenceControl', | ||
| 'ValueConcatenation', | ||
| 'ValueInterleave', | ||
| 'DigitInterleave', | ||
| ] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,103 @@ | ||
| import numpy as np | ||
|
|
||
| from sigllm.primitives.formatting.multivariate_formatting import MultivariateFormattingMethod | ||
|
|
||
|
|
||
| class DigitInterleave(MultivariateFormattingMethod): | ||
| """Formatting method that interleaves digits from multiple values.""" | ||
|
|
||
| def __init__(self, verbose: bool = False, **kwargs): | ||
| super().__init__('digit_interleave', verbose=verbose, **kwargs) | ||
|
|
||
| def format_as_string( | ||
| self, X: np.ndarray, digits_per_timestamp=3, separator=',', **kwargs | ||
| ) -> str: | ||
| """Format array as string with interleaved digits.""" | ||
| max_digits = max(len(str(abs(int(v)))) for window in X for ts in window for v in ts) | ||
| width_used = max(digits_per_timestamp, max_digits) | ||
| self.metadata['width_used'] = width_used | ||
|
|
||
| def interleave_digits(timestamp): | ||
| str_values = [str(int(val)) for val in timestamp] | ||
| padded_values = [s.zfill(width_used) for s in str_values] | ||
| result_str = '' | ||
| for digit_pos in range(width_used): | ||
| for padded_val in padded_values: | ||
| result_str += padded_val[digit_pos] | ||
|
|
||
| return result_str | ||
|
|
||
| result = [ | ||
| separator.join(interleave_digits(timestamp) for timestamp in window) + separator | ||
| for window in X | ||
| ] | ||
| return result | ||
|
|
||
| def format_as_integer( | ||
| self, | ||
| X: list[str], | ||
| separator=',', | ||
| trunc=None, | ||
| digits_per_timestamp=3, | ||
| target_column=None, | ||
| **kwargs, | ||
| ) -> np.ndarray: | ||
| """Parse interleaved digit strings back to integer arrays for the target column. | ||
|
|
||
| Args: | ||
| X (list[str]): | ||
| list of strings, each string is a concatenation of | ||
| interleaved digit values separated by separator. | ||
| separator (str): | ||
| separator between values | ||
| trunc (int): | ||
| Number of timestamps to extract from each sample. | ||
| If None, all timestamps are extracted. | ||
| digits_per_timestamp (int): | ||
| Number of digits to extract from each timestamp. | ||
| target_column (int): | ||
| Which column to extract (default 0). Can also be set via config. | ||
|
|
||
| Returns: | ||
| np.ndarray: | ||
| Array that holds int values for the target column | ||
| for each sample in each window. | ||
| """ | ||
| width_used = self.metadata['width_used'] | ||
| if target_column is None: | ||
| target_column = self.config.get('target_column', 0) | ||
|
|
||
| def deinterleave_timestamp_target_column(interleaved_str): | ||
| """Convert interleaved digits back to original values and extract target dimension.""" | ||
| total_digits = len(interleaved_str) | ||
| num_values = total_digits // width_used | ||
|
|
||
| if target_column >= num_values: | ||
| return np.array([None]) | ||
|
|
||
| value_digits = [] | ||
| for digit_pos in range(width_used): | ||
| pos = digit_pos * num_values + target_column | ||
| if pos < total_digits: | ||
| value_digits.append(interleaved_str[pos]) | ||
|
|
||
| if value_digits: | ||
| return np.array([int(''.join(value_digits))]) | ||
| return np.array([None]) | ||
|
|
||
| result = np.array( | ||
| [ | ||
| [ | ||
| deinterleave_timestamp_target_column(timestamp) | ||
| for sample in entry | ||
| for timestamp in sample | ||
| .lstrip(separator) | ||
| .rstrip(separator) | ||
| .split(separator)[:trunc] | ||
| if timestamp.strip() | ||
| ] | ||
| for entry in X | ||
| ], | ||
| dtype=object, | ||
| ) | ||
| return result |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use 4 spaces for indentation similar to all the other pipelines