Skip to content

Optimize feature selection for clustering (or in general finding attractors) #85

@Datseris

Description

@Datseris

I have an idea of how we could make a function to optimize feature selection for clustering attractors. I will be creating this function while working on a new research project. But I think it is useful to discuss possibilities of how to make it happen.

My idea is as follows:

  • We have some "function" that returns a lot of features, like 20 or so.
  • From these features we want to find the 3 or 4 ones that are the most efficient into clustering data into separated attractors.
  • We could perform an optimization loop: pick 3 features at random, cluster the data, with these feaures, and the computed a clustering quality metric. Keep the three features that have the highest quality.

The main question therefore is how do we define a "cluster quality metric"? On one hand, this metric needs to have a contribution of cluster separation: the more separated the clusters are, the more clear the clustering. This could be done by computing the pairwise distribution distance of the pdfs fitted to every cluster, and keep as the metric the median of this distribution.

However, this cluster quality metric should also have some short of contribution from the amount of clusters found. The more clusters the better...?

cc also @KalelR I think this is relevant for you too.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions