-
Notifications
You must be signed in to change notification settings - Fork 7
Add new grouping method: AttractorsViaPairwiseComparison #97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 7 commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
c4dde42
Add grouping of attractors via pairwise comparison -- WIP
KalelR e738ec7
add supp for generic distance metric
KalelR 861a08b
add tests for vector and matrix features
KalelR 1004c56
renamed optiaml radius method to distance threshold (mcuh better)
KalelR ac1b16d
remove par_weight keywords
KalelR f704bda
improve doc with more info and some tips
KalelR 9f90d1e
improve docs, incremented version
KalelR bb2575d
rename threshold and metric
KalelR 3cb90ff
small improvements to docs
KalelR 3889e7a
Merge branch 'main' into attractors_via_pairwise_comparison
Datseris File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,7 +2,7 @@ name = "Attractors" | |
uuid = "f3fd9213-ca85-4dba-9dfd-7fc91308fec7" | ||
authors = ["George Datseris <[email protected]>", "Kalel Rossi", "Alexandre Wagemakers"] | ||
repo = "https://github.com/JuliaDynamics/Attractors.jl.git" | ||
version = "1.11.0" | ||
version = "1.12.0" | ||
|
||
[deps] | ||
BlackBoxOptim = "a134a8b2-14d6-55f6-9291-3336d3ab0209" | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
export GroupViaPairwiseComparison | ||
|
||
""" | ||
GroupViaPairwiseComparison(; distance_threshold::Real, kwargs...) | ||
|
||
Initialize a struct that contains instructions on how to group features in | ||
[`AttractorsViaFeaturizing`](@ref). `GroupViaPairwiseComparison` groups features and | ||
identifies clusters by considering the pairwise distance between features. It can be used | ||
as an alternative to the clustering method in `GroupViaClustering`, having the | ||
advantage that it is simpler, typically faster and uses less memory. | ||
|
||
## Keyword arguments | ||
|
||
* `distance_threshold`: a real number defining the maximum distance two features can be to | ||
be considered in the same cluster - above the threshold, features are different. This | ||
value simply needs to be large enough to differentiate clusters. | ||
* `distance_metric = Euclidean()`: A metric to be used in the clustering. It can be any | ||
function `f(a, b)` that returns the distance between any type of data structure (usually | ||
vectors or matrices of reals). Needs to be consistent with the `featurizer` function. All | ||
metrics from Distances.jl can be used here. | ||
KalelR marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* `rescale_features = true`: if true, rescale each dimension of the extracted features | ||
Datseris marked this conversation as resolved.
Show resolved
Hide resolved
|
||
separately into the range `[0,1]`. This typically leads to more accurate clustering. | ||
|
||
## Description | ||
This algorithm assumes that the features are well-separated into distinct clouds, with the | ||
maximum radius of the cloud controlled by `distance_threshold`. Since the systems are | ||
deterministic, this is achievable with a good-enough `featurizer` function, by removing | ||
transients, and running the trajectories for sufficiently long. It then considers that | ||
features belong to the same attractor when their pairwise distance, computed using | ||
`distance_metric`, is smaller than or equal to `distance_threshold`, and that they belong | ||
to different attractors when the distance is bigger. Attractors correspond to each | ||
grouping of similar features. In this way, the key parameter `distance_threshold` is | ||
basically the amount of variation permissible in the features belonging to the same | ||
attractor. If they are well-chosen, the value can be relatively small and does not need to | ||
be fine tuned. | ||
|
||
The `distance_threshold` should achieve a balance: one one hand, it should be large enough | ||
to account for variations in the features from the same attractor - if it's not large | ||
enough, the algorithm will find duplicate attractors. On the other hand, it should be | ||
small enough to not group together features from distinct attractors. This requires some | ||
knowledge of how spread the features are. If it's too big, the algorithm will miss some | ||
attractors, as it groups 2+ distinct attractors together. Therefore, as a rule of thumb, | ||
one can repeat the procedure a few times, starting with a relatively large value and | ||
reducing it until no more attractors are found and no duplicates appear. | ||
|
||
The method uses relatively little memory, as it only stores vectors whose size is on order | ||
of the number of attractors of the system. | ||
""" | ||
struct GroupViaPairwiseComparison{R<:Real, M} <: GroupingConfig | ||
distance_threshold::R | ||
distance_metric::M | ||
rescale_features::Bool | ||
end | ||
|
||
function GroupViaPairwiseComparison(; | ||
distance_threshold, #impossible to set a good default value, depends on the features | ||
distance_metric=Euclidean(), rescale_features=false, | ||
) | ||
return GroupViaPairwiseComparison( | ||
distance_threshold, | ||
distance_metric, rescale_features, | ||
) | ||
end | ||
|
||
function group_features( | ||
features, config::GroupViaPairwiseComparison; | ||
kwargs... | ||
) | ||
if config.rescale_features | ||
features = _rescale_to_01(features) | ||
end | ||
|
||
labels = _cluster_features_into_labels(features, config, config.distance_threshold; kwargs...) | ||
return labels | ||
end | ||
|
||
# TODO: add support for par_weight,plength and spp in the computation of the distance metric? | ||
function _cluster_features_into_labels(features, config::GroupViaPairwiseComparison, distance_threshold::Real; kwargs...) | ||
labels_features = Vector{Int64}(undef, length(features)) #labels of all features | ||
metric = config.distance_metric | ||
|
||
# Assign feature 1 as a new attractor | ||
labels_features[1] = 1 | ||
cluster_idxs = [1] # idxs of the features that define each cluster | ||
cluster_labels = [1] # labels for the clusters, going from 1 : num_clusters | ||
next_cluster_label = 2 | ||
|
||
for idx_feature = 2:length(features) | ||
feature = features[idx_feature] | ||
dist_to_clusters = _distance_dict(feature, features, cluster_idxs, cluster_labels, metric; kwargs...) | ||
min_dist, closest_cluster_label = findmin(dist_to_clusters) | ||
|
||
if min_dist > distance_threshold #bigger than threshold => new attractor | ||
feature_label = next_cluster_label | ||
push!(cluster_idxs, idx_feature) | ||
push!(cluster_labels, next_cluster_label) | ||
# @info "New attractor $next_cluster_label, min dist was $min_dist > $distance_threshold" #TODO: allow this when debugging verbose mode on! | ||
next_cluster_label += 1 | ||
else #smaller than threshold => assign to closest cluster | ||
feature_label = closest_cluster_label | ||
end | ||
|
||
labels_features[idx_feature] = feature_label | ||
end | ||
return labels_features | ||
end | ||
|
||
function _distance_dict(feature, features, cluster_idxs, cluster_labels, metric; kwargs...) | ||
if metric isa Metric | ||
dist_to_clusters = Dict(cluster_label => evaluate(metric, feature, features[cluster_idxs[idx_cluster]]) for (idx_cluster, cluster_label) in enumerate(cluster_labels)) | ||
else | ||
dist_to_clusters = Dict(cluster_label => metric(feature, features[cluster_idxs[idx_cluster]]) for (idx_cluster, cluster_label) in enumerate(cluster_labels)) | ||
end | ||
|
||
return dist_to_clusters | ||
end |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.