Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Kmeans #7

Open
cveaux opened this issue Jun 8, 2021 · 0 comments
Open

Custom Kmeans #7

cveaux opened this issue Jun 8, 2021 · 0 comments

Comments

@cveaux
Copy link

cveaux commented Jun 8, 2021

Hi, I think that there is an over-simplification in Custom Kmeans, the way the centroids are estimated:

centres[each_center] = np.mean(X[each_center_samples], axis=0)

doesn't actually yield to the point that minimise the average custom distances within a cluster.
The mean is the optimal solution for the euclidian distance but not for an arbitrary distance. For instance, in the case of cosine distance, the mean calculated as above will give the optimum center of the cluster only if X rows are l2-normalised.

A more general solution would be to use sklearn_extra.cluster.KMedoids

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant