Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does kmeans support INNER_PRODUCT distance? #2363

Closed
zhanghan9797 opened this issue Jun 22, 2022 · 4 comments
Closed

Does kmeans support INNER_PRODUCT distance? #2363

zhanghan9797 opened this issue Jun 22, 2022 · 4 comments
Labels

Comments

@zhanghan9797
Copy link

Hi, I want to know if the faiss.IndexIVFFlat index is using the kmeans method during training, can the distance calculation only use L2? Is it possible to use INNER_PRODUCT?
Can I use INNER_PRODUCT as distance for kmeans?

I want to implement it this way, is this correct?

quantizer = faiss.IndexFlatIP(emb_size)
index = faiss.IndexIVFFlat(quantizer, emb_size, ivf_centers_num,faiss.METRIC_INNER_PRODUCT)

At the same time I also want to know what does quantizer and faiss.METRIC_INNER_PRODUCT mean here?

Looking forward to your reply!

@mdouze
Copy link
Contributor

mdouze commented Jun 28, 2022

Inner product is supported.
It does not use L2 at search time, but inner product, as expected.

@meghbhalerao
Copy link

Hi - I am not sure this is the right place to ask this, but I would like to know the exact place in the source code where the k means clustering is happening which is using inner product as a similarity metric rather than the Euclidean L2 distance.
If inner product is being used as a similarity metric, then how are the cluster centers being calculated - since I dont think it makes sense to use the arithmetic mean as the cluster center since that is specific to the Euclidean space and distance metric. Thanks and please let me know if I am missing something.

--Megh

@mdouze
Copy link
Contributor

mdouze commented Jul 26, 2022

Yes the arithmetic mean is used to compute centroids.

Indeed the decreasing mean squared error guarantee does not hold with anything else than L2 assignment. However the inner product assignment is useful, especially in combination with the L2 normalization of the centroids after each iteration.

@mdouze mdouze closed this as completed Aug 31, 2022
@ghost
Copy link

ghost commented Sep 23, 2022

Could you briefly describe how we can assign do inner product assignment in the python interface?

Yes the arithmetic mean is used to compute centroids.

Indeed the decreasing mean squared error guarantee does not hold with anything else than L2 assignment. However the inner product assignment is useful, especially in combination with the L2 normalization of the centroids after each iteration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants