Skip to content

Commit 55e3876

Browse files
KalelRDatseris
andauthored
Add new grouping method: AttractorsViaPairwiseComparison (#97)
* Add grouping of attractors via pairwise comparison -- WIP * add supp for generic distance metric * add tests for vector and matrix features * renamed optiaml radius method to distance threshold (mcuh better) * remove par_weight keywords * improve doc with more info and some tips * improve docs, incremented version * rename threshold and metric * small improvements to docs --------- Co-authored-by: George Datseris <[email protected]>
1 parent 3606b62 commit 55e3876

File tree

5 files changed

+156
-9
lines changed

5 files changed

+156
-9
lines changed

CHANGELOG.md

+3
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1+
# v1.12
2+
- New algorithm `GroupViaPairwiseComparison` to group features in `AttractorsViaFeaturizing`. Simpler, typically faster and using less memory than DBSCAN, it can be useful in well-behaved systems.
3+
14
# v1.11
25
- New algorithm `subdivision_based_grid`. Allows user to automatically constuct the grid which simulates subdivision into regions with different discretization levels in accordance with state space flow speed.
36

Project.toml

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ name = "Attractors"
22
uuid = "f3fd9213-ca85-4dba-9dfd-7fc91308fec7"
33
authors = ["George Datseris <[email protected]>", "Kalel Rossi", "Alexandre Wagemakers"]
44
repo = "https://github.com/JuliaDynamics/Attractors.jl.git"
5-
version = "1.11.1"
5+
version = "1.12.0"
66

77
[deps]
88
BlackBoxOptim = "a134a8b2-14d6-55f6-9291-3336d3ab0209"

src/mapping/grouping/all_grouping_configs.jl

+2
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ Currently available grouping configurations are:
1212
- [`GroupViaClustering`](@ref)
1313
- [`GroupViaNearestFeature`](@ref)
1414
- [`GroupViaHistogram`](@ref)
15+
- [`GroupViaPairwiseComparison`](@ref)
1516
1617
`GroupingConfig` defines an extendable interface.
1718
The only thing necessary for a new grouping configuration is to:
@@ -54,3 +55,4 @@ end
5455
include("cluster_config.jl")
5556
include("histogram_config.jl")
5657
include("nearest_feature_config.jl")
58+
include("pairwise_comparison.jl")
+115
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
export GroupViaPairwiseComparison
2+
3+
"""
4+
GroupViaPairwiseComparison(; threshold::Real, kwargs...)
5+
6+
Initialize a struct that contains instructions on how to group features in
7+
[`AttractorsViaFeaturizing`](@ref). `GroupViaPairwiseComparison` groups features and
8+
identifies clusters by considering the pairwise distance between features. It can be used
9+
as an alternative to the clustering method in `GroupViaClustering`, having the
10+
advantage that it is simpler, typically faster and uses less memory.
11+
12+
## Keyword arguments
13+
14+
* `threshold` (mandatory): A real number defining the maximum distance two features can be to
15+
be considered in the same cluster - above the threshold, features are different. This
16+
value simply needs to be large enough to differentiate clusters.
17+
* `metric = Euclidean()`: A function `metric(a, b)` that returns the distance between two
18+
features `a` and `b`, outputs of `featurizer`. Any `Metric` from Distances.jl can be used
19+
here.
20+
* `rescale_features = true`: if true, rescale each dimension of the extracted features
21+
separately into the range `[0,1]`. This typically leads to more accurate grouping.
22+
23+
## Description
24+
This algorithm assumes that the features are well-separated into distinct clouds, with the
25+
maximum radius of the cloud controlled by `threshold`. Since the systems are
26+
deterministic, this is achievable with a good-enough `featurizer` function, by removing
27+
transients, and running the trajectories for sufficiently long. It then considers that
28+
features belong to the same attractor when their pairwise distance, computed using
29+
`metric`, is smaller than or equal to `threshold`, and that they belong
30+
to different attractors when the distance is bigger. Attractors correspond to each
31+
grouping of similar features. In this way, the key parameter `threshold` is
32+
basically the amount of variation permissible in the features belonging to the same
33+
attractor. If they are well-chosen, the value can be relatively small and does not need to
34+
be fine tuned.
35+
36+
The `threshold` should achieve a balance: one one hand, it should be large enough
37+
to account for variations in the features from the same attractor - if it's not large
38+
enough, the algorithm will find duplicate attractors. On the other hand, it should be
39+
small enough to not group together features from distinct attractors. This requires some
40+
knowledge of how spread the features are. If it's too big, the algorithm will miss some
41+
attractors, as it groups 2+ distinct attractors together. Therefore, as a rule of thumb,
42+
one can repeat the procedure a few times, starting with a relatively large value and
43+
reducing it until no more attractors are found and no duplicates appear.
44+
45+
The method uses relatively little memory, as it only stores vectors whose size is on order
46+
of the number of attractors of the system.
47+
"""
48+
struct GroupViaPairwiseComparison{R<:Real, M} <: GroupingConfig
49+
threshold::R
50+
metric::M
51+
rescale_features::Bool
52+
end
53+
54+
function GroupViaPairwiseComparison(;
55+
threshold, #impossible to set a good default value, depends on the features
56+
metric=Euclidean(), rescale_features=false,
57+
)
58+
return GroupViaPairwiseComparison(
59+
threshold,
60+
metric, rescale_features,
61+
)
62+
end
63+
64+
function group_features(
65+
features, config::GroupViaPairwiseComparison;
66+
kwargs...
67+
)
68+
if config.rescale_features
69+
features = _rescale_to_01(features)
70+
end
71+
72+
labels = _cluster_features_into_labels(features, config, config.threshold; kwargs...)
73+
return labels
74+
end
75+
76+
# TODO: add support for par_weight,plength and spp in the computation of the distance metric?
77+
function _cluster_features_into_labels(features, config::GroupViaPairwiseComparison, threshold::Real; kwargs...)
78+
labels_features = Vector{Int64}(undef, length(features)) #labels of all features
79+
metric = config.metric
80+
81+
# Assign feature 1 as a new attractor
82+
labels_features[1] = 1
83+
cluster_idxs = [1] # idxs of the features that define each cluster
84+
cluster_labels = [1] # labels for the clusters, going from 1 : num_clusters
85+
next_cluster_label = 2
86+
87+
for idx_feature = 2:length(features)
88+
feature = features[idx_feature]
89+
dist_to_clusters = _distance_dict(feature, features, cluster_idxs, cluster_labels, metric; kwargs...)
90+
min_dist, closest_cluster_label = findmin(dist_to_clusters)
91+
92+
if min_dist > threshold #bigger than threshold => new attractor
93+
feature_label = next_cluster_label
94+
push!(cluster_idxs, idx_feature)
95+
push!(cluster_labels, next_cluster_label)
96+
# @info "New attractor $next_cluster_label, min dist was $min_dist > $threshold" #TODO: allow this when debugging verbose mode on!
97+
next_cluster_label += 1
98+
else #smaller than threshold => assign to closest cluster
99+
feature_label = closest_cluster_label
100+
end
101+
102+
labels_features[idx_feature] = feature_label
103+
end
104+
return labels_features
105+
end
106+
107+
function _distance_dict(feature, features, cluster_idxs, cluster_labels, metric; kwargs...)
108+
if metric isa Metric
109+
dist_to_clusters = Dict(cluster_label => evaluate(metric, feature, features[cluster_idxs[idx_cluster]]) for (idx_cluster, cluster_label) in enumerate(cluster_labels))
110+
else
111+
dist_to_clusters = Dict(cluster_label => metric(feature, features[cluster_idxs[idx_cluster]]) for (idx_cluster, cluster_label) in enumerate(cluster_labels))
112+
end
113+
114+
return dist_to_clusters
115+
end

test/mapping/attractor_mapping.jl

+35-8
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@
55
# If not, a small, but representative subset of mappers and dynamical systems is used.
66

77
DO_EXTENSIVE_TESTS = get(ENV, "ATTRACTORS_EXTENSIVE_TESTS", "false") == "true"
8-
98
using Test
109
using Attractors
1110
using LinearAlgebra
@@ -16,7 +15,8 @@ using Statistics
1615
# Define generic testing framework
1716
function test_basins(ds, u0s, grid, expected_fs_raw, featurizer;
1817
rerr = 1e-3, ferr = 1e-3, aerr = 1e-15, ε = nothing, max_distance = Inf,
19-
proximity_test = true,
18+
proximity_test = true, pairwise_comparison_matrix_test = false, featurizer_matrix = nothing,
19+
threshold_pairwise = 1,
2020
kwargs... # kwargs are propagated to recurrences
2121
)
2222
# u0s is Vector{Pair}
@@ -89,6 +89,29 @@ function test_basins(ds, u0s, grid, expected_fs_raw, featurizer;
8989
)
9090
end
9191

92+
@testset "Featurizing, pairwise comparison" begin
93+
config = GroupViaPairwiseComparison(; threshold=threshold_pairwise,
94+
metric=Euclidean(), rescale_features=false)
95+
mapper = AttractorsViaFeaturizing(ds, featurizer, config; Ttr = 500)
96+
test_basins_fractions(mapper;
97+
err = ferr, single_u_mapping = false, known_ids = [-1, 1, 2, 3]
98+
)
99+
end
100+
101+
if pairwise_comparison_matrix_test
102+
@testset "Featurizing, pairwise comparison, matrix features" begin
103+
function metric_hausdorff(A,B)
104+
set_distance(A, B, Hausdorff())
105+
end
106+
config = GroupViaPairwiseComparison(; threshold=threshold_pairwise,
107+
metric=metric_hausdorff, rescale_features=false)
108+
mapper = AttractorsViaFeaturizing(ds, featurizer_matrix, config; Ttr = 500)
109+
test_basins_fractions(mapper;
110+
err = ferr, single_u_mapping = false, known_ids = [-1, 1, 2, 3]
111+
)
112+
end
113+
end
114+
92115
@testset "Featurizing, nearest feature" begin
93116
# First generate the templates
94117
function features_from_u(u)
@@ -134,7 +157,7 @@ end
134157
return any(isinf, x) ? SVector(200.0, 200.0) : x
135158
end
136159
test_basins(ds, u0s, grid, expected_fs_raw, featurizer;
137-
max_distance = 20, ε = 1e-3, proximity_test = false)
160+
max_distance = 20, ε = 1e-3, proximity_test = false, threshold_pairwise=1)
138161
end
139162

140163
@testset "Magnetic pendulum: projected system" begin
@@ -181,7 +204,11 @@ end
181204
return SVector(A[end][1], A[end][2])
182205
end
183206

184-
test_basins(ds, u0s, grid, expected_fs_raw, featurizer; ε = 0.2, Δt = 1.0, ferr=1e-2)
207+
function featurizer_matrix(A, t)
208+
return A
209+
end
210+
211+
test_basins(ds, u0s, grid, expected_fs_raw, featurizer; ε = 0.2, Δt = 1.0, ferr=1e-2, featurizer_matrix, pairwise_comparison_matrix_test=true, threshold_pairwise=1)
185212
end
186213

187214
# Okay, all of these aren't fundamentally new tests.
@@ -218,9 +245,9 @@ if DO_EXTENSIVE_TESTS
218245
g = exp(entropy(Renyi(; q = 0), probs))
219246
return SVector(g, minimum(A[:,1]))
220247
end
221-
248+
222249
test_basins(ds, u0s, grid, expected_fs_raw, featurizer;
223-
ε = 0.01, ferr=1e-2, Δt = 0.2, mx_chk_att = 20)
250+
ε = 0.01, ferr=1e-2, Δt = 0.2, mx_chk_att = 20, threshold_pairwise=100) #threshold is very high because features haven't really converged yet here
224251
end
225252

226253
@testset "Duffing oscillator: stroboscopic map" begin
@@ -247,7 +274,7 @@ if DO_EXTENSIVE_TESTS
247274
end
248275

249276
test_basins(ds, u0s, grid, expected_fs_raw, featurizer;
250-
ε = 1.0, ferr=1e-2, rerr = 1e-2, aerr = 5e-3)
277+
ε = 1.0, ferr=1e-2, rerr = 1e-2, aerr = 5e-3, threshold_pairwise=1)
251278
end
252279

253280

@@ -282,5 +309,5 @@ if DO_EXTENSIVE_TESTS
282309

283310
test_basins(pmap, u0s, grid, expected_fs_raw, thomas_featurizer; ε = 1.0, ferr=1e-2)
284311
end
285-
312+
286313
end

0 commit comments

Comments
 (0)