-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new grouping method: AttractorsViaPairwiseComparison #97
Conversation
Codecov Report
@@ Coverage Diff @@
## main #97 +/- ##
==========================================
+ Coverage 84.27% 84.62% +0.35%
==========================================
Files 20 21 +1
Lines 1151 1184 +33
==========================================
+ Hits 970 1002 +32
- Misses 181 182 +1
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
What do you mean that you need to add support for
? Is this for the continuation? If so,
|
it is not clear to me how one would choose the distance threshold. I guess with the scaling to 0-1 turned on, a good value could be 0.1 or something like that? |
Is this method significantly faster than the clustering method? You should do formal benchmarks and post the results here. Compute a list of features, and then benchmark the |
As I understand they included this parameter distance in the computation of distances for the clustering. The idea was to "ensure that similar asymptotic states with strongly different parameter values are distinguished from each other", which I never really understood to be honest haha - isn't the whole point of continuation tracking an attractor across different parameters?. But anyways, I thought that if people wanted this in the continuation method via clustering, they might also want it in the continuation method using this PairwiseComparison - the latter doesn't cluster per se, but we still computes distances. What do you think? TLDR: This can be easily implemented in the future if anyone wishes. For now, I don't think it's needed, but wanted to check with you. |
The choice is a balance between two points
So what I've been doing is to ensure the deviations in the features are super small, which can be achieved with a good choice of |
Not sure if it's significantly faster, but I don't think the speed really matters to be honest. It's so much simpler, more reliable and easy to use than the clustering methods that I wouldn't mind even if it wasn't slower. But now, I think at worse it's a comparable speed. DBSCAN at least requires the computation of the distance matrix, which is order |
I don't agree with this statement. I agree that it is simpler, but I can't see a way that it is more reliable. You need to pick this distance number. In the clustering you don't pick anything, it is done automatically. Besides, if the localization of points in clusters is so good, there is no way DBSCAN would fail. And this method targets such cases. It would be great if you can provide a concrete MWE where this method outperforms the Clustering one. Not only that, but the quality of your clustering depends on the order of the initial conditions. Hence, each time you call DBSCAN, as well as all other grouping algorithms we have so far, are invariant to the sequence of the initial conditions. Note that this doesn't stop us from merging this in, but I am just pointing out that I personally wouldn't use this one instead of the Clustering version. If I knew the system so well that I knew what the features would do, this means that I know the attractors. Then, I would just use |
Exactly. Hence, I would argue, this whole |
Sure but please do a formal benchmark and report the results. It shouldn't take too much effort and performance matters in this context. A lot. For me, it is the prime reason to change algorithms. The clustering is also done 100s of times to optimize the clustering radius. |
You should document tips for choosing the distance threshold by summarizing what you wrote here. |
DBSCAN also requires a distance parameter, which is much less intuitive (leads to the black box behavior we discussed so many times). The only reason the clustering method as a whole doesn't require this parameter is that we brute-force search for an "optimal value". This is very debatable. In the use cases I've tried (on a system of 2 coupled 2d oscillators and on 100d networks) this lead to problems so often. Sometimes the "optimal" parameter was way too small, and the algorithm found lots of duplicates. Sometimes it was way too big, and missed some attractors. Sometimes it worked great for one system parameter and then terrible for another with similar dynamics. This simple comparison came as a way to simplify the procedure and enable me some clarity to know when the algorithm really is working. This is cases when I don't know the attractors of the system, but I can probe around, change parameters and I agree that in the well behaved cases it and DBSCAN should work, and admit I oversold this one algorithm, probably as a frustration with so much time wasted with DBSCAN haha. But the point is also that in well-behaved cases using DBSCAN is like killing a fly with a cannon, and with an unintuitive cannon at that. Maybe we should think of some procedures to validate the clustering better If the worry is having an automatic procedure, we could think of a search algorithm here as well. Something like reducing
Yeah, excellent point. There are no guarantees, in the use-case I have in mind, when the data isn't "noisy", the algorithm will be invariant. When the data is, though, DBSCAN should be used. Or the features should be improved.
The whole point of course is that you don't know the attractors, but you can play around enough to ensure the features are good (this should always be done anyway, with or without DBSCAN) and that the clustering is working. Then the question is with which algorithm this is more easily and reliably done, and I've have way too many problems with DBSCAN personally. By the way, one point I haven't thought much about, just mentioned here, is that the attractors themselves could be used as features. Then one would separate attractors by some distance metric in state space directly. This is maybe the simplest / most naive attractor-finding algorithm one could think of haha, but could it work? If the metric is good enough. |
You would need to find the attractors first. The only method we have that finds attractors is the recurrences one. And if you use this method, you need no other method to group attractors, as the recurrences anyways maps individual i.c. to attractors correctly.
That's fine, but then you should somehow summarize this discussion in the docstring of the new method for when it should be preferred over the Clustering method.
Given the amount of papers on optimizng the clustering of DBSCAN, and the fact that to this day we even presented an even more optimal way, this could take a lot of time. You should open an Issue Request for this after we merge this PR in. |
I benchmarked the two methods using two sets of features: one that is just 5 randomly generated features repeated N times, another extracted from Henon (like in the tests). The distance threshold was the same for both, and they find the correct clusters. As expected from the difference in scaling, this pairwise comparison is faster and uses less memory than DBSCAN. For some reason in the Henon case the number of allocations in DBSCAN is smaller, but the memory usage is bigger. I don't know why. It is somewhat unfair to compare them though, as the pairwise comparison is so much simpler than DBSCAN. Code is here https://gist.github.com/KalelR/1c3a66a157a8e347dcd6d6ad28f633cc, also in our FrameworkGlobalStability repo. |
Great, thanks, you should state the difference in performance scaling in the docstring. Please also address the following comment in the docstring:
(That's fine, but then you should somehow summarize this discussion in the docstring of the new method for when it should be preferred over the Clustering method.) Lastly, please add an example in the examples list that utilizes this method. Would be fantastic if you used a real world example, i.e., something from your own work, as you said that this method helped you there a lot. Then, I can merge! (and don't forget to increment minor version and add changelog entry) |
I misspoke, I meant that the trajectories can be used as features themselves. Meaning the featurizer function |
This would be quite nice, I'll think of a nice example where it helped. But since the work is still unpublished, I'd rather wait a month or so before putting this online with all the methodology and parameters correctly chosen, just to be sure. Is that alright? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry only minor comments and then I'll merge!
We don't have to rescale to [0, 1]. We can use |
As we had discussed in Issue #84, this PR adds a simpler grouping method to "clusterize" features, particularly useful of course when using
AttractorsViaFeaturizing
.The idea is that clustering algorithms aren't really necessary for grouping features. The systems are deterministic, and we can usually integrate systems long enough to have the features converge. If the user knows good features, and for how to integrate them, then the features in feature space will be very well behaved, with attractors corresponding to small clouds well separated from each other. Therefore, there is no need to use clustering algorithms. Attractors can be found directly by simply grouping together features close to each other within some epsilon, and separating those farther away.
The algorithm is thus very simple. Basically
1. compute distance from the future to the previously found attractors -> dist(feature, clusters_idxs)
2. find minimum distance between future and cluster idxs (ie find the the cluster that is closest to the current feature)
if min dist > threshold, consider the future a new attractor; if min dist < threshold, assign it to the closest cluster
This is nicer than simple histograms mainly because it allows for more general
features
, beyond just vectors of numbers. They can be, for instance, matrices, which allows for features to be the entire attractors. The distance metric just needs to take infeature_A
andfeature_B
and return a number. Plus, given that features are well behaved, thedistance_threshold
doesn't really need to be fined-tuned much. This is wayy more reliable, understandable, and easier to use than DBSCAN. Since I've implemented this I never went back. And to be honest I don't see any future scenario when I would.One part left to do, if needed, is to implement support for computing distances using these parameters: