Optimize feature selection for clustering (or in general finding attractors)

I have an idea of how we could make a function to optimize feature selection for clustering attractors. I will be creating this function while working on a new research project. But I think it is useful to discuss possibilities of how to make it happen.

My idea is as follows:

- We have some "function" that returns **a lot of features**, like 20 or so. 
- From these features we want to find the 3 or 4 ones that are the **most efficient** into clustering data into separated attractors.
- We could perform an optimization loop: pick 3 features at random, cluster the data, with these feaures, and the computed a **clustering quality metric**. Keep the three features that have the highest quality.

The main question therefore is how do we define a "cluster quality metric"? On one hand, this metric needs to have a contribution of cluster separation: the more separated the clusters are, the more clear the clustering. This could be done by computing the pairwise distribution distance of the pdfs fitted to every cluster, and keep as the metric the median of this distribution.  

However, this cluster quality metric should also have some short of contribution from the amount of clusters found. The more clusters the better...?

cc also @KalelR I think this is relevant for you too.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize feature selection for clustering (or in general finding attractors) #85

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimize feature selection for clustering (or in general finding attractors) #85

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions