for the same set of data, the centroids vary for new run #1

krishnakumar85 · 2013-09-25T07:40:51Z

For each new run of node-kmeans on the same set of data, the clusters and centroids vary. Is there any way we can fix the skewed results or probably start with a constant seed.

listonb · 2014-11-07T16:21:50Z

I'm also seeing this problem. Appears to generate new centroids on every run of identical data

Philmod · 2014-11-07T17:24:56Z

Is there a lot of local minima in your data set?

listonb · 2014-11-07T18:55:20Z

Yes. This is pixel RGB color data from an image

Philmod · 2014-11-07T19:00:54Z

Yes, that's linked to your problem.

Finding the global minimum of the k-means problem is NP-hard in general.

listonb · 2014-11-07T19:03:55Z

Any easy fix?

Philmod · 2014-11-07T19:12:48Z

This is one todo.

I think that can be solved with different solutions:

replicates: trying many random starting points and merging
adding some randomness

I'm happy if you create a Pull Request with a solution.

Thanks,
Philmod

listonb · 2014-11-07T19:17:18Z

Appreciate the time. I'll try to look into it after next week if i have some time!

Morikko · 2018-08-30T13:41:06Z

One of the solution used in sklearn is to used the inertia:

Do the kmean many times with different initiation
For each result, compute the inertia
Keep the results with the lowest inertia

Note about inertia (from sklearn): Sum of squared distances of samples to their closest cluster center.

@Philmod I can do a PR

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

for the same set of data, the centroids vary for new run #1

for the same set of data, the centroids vary for new run #1

krishnakumar85 commented Sep 25, 2013

listonb commented Nov 7, 2014

Philmod commented Nov 7, 2014 •

edited

Loading

listonb commented Nov 7, 2014

Philmod commented Nov 7, 2014

listonb commented Nov 7, 2014

Philmod commented Nov 7, 2014

listonb commented Nov 7, 2014

Morikko commented Aug 30, 2018

for the same set of data, the centroids vary for new run #1

for the same set of data, the centroids vary for new run #1

Comments

krishnakumar85 commented Sep 25, 2013

listonb commented Nov 7, 2014

Philmod commented Nov 7, 2014 • edited Loading

listonb commented Nov 7, 2014

Philmod commented Nov 7, 2014

listonb commented Nov 7, 2014

Philmod commented Nov 7, 2014

listonb commented Nov 7, 2014

Morikko commented Aug 30, 2018

Philmod commented Nov 7, 2014 •

edited

Loading