Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding KNN codec that is based on Lucene92 codec #444

Merged

Conversation

martin-gaievski
Copy link
Member

@martin-gaievski martin-gaievski commented Jul 13, 2022

Signed-off-by: Martin Gaievski gaievski@amazon.com

Description

Adding new KNN codec that is based on Lucene92 and making it new default.
This is required because 1. we follow core OpenSearch 2.x that is using Lucene92 and 2. hnsw implementation improved comparing to 9.1, it is required if we want to open Lucene hnsw through plugin.

As part of this change I've done a code cleanup and simplified the codec factory. We need some simple mechanism to get only the latest version of KNN codec, and we don't need to support creation of previous KNN codec versions

Check List

  • All tests pass
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@martin-gaievski martin-gaievski requested a review from a team July 13, 2022 01:20
@martin-gaievski martin-gaievski added 2.2.0 backport 2.x Enhancements Increases software capabilities beyond original client specifications labels Jul 13, 2022
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
@codecov-commenter
Copy link

codecov-commenter commented Jul 13, 2022

Codecov Report

Merging #444 (634a8f7) into main (2fc09ba) will decrease coverage by 0.14%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##               main     #444      +/-   ##
============================================
- Coverage     83.97%   83.82%   -0.15%     
- Complexity      902      933      +31     
============================================
  Files           131      133       +2     
  Lines          3863     3828      -35     
  Branches        359      349      -10     
============================================
- Hits           3244     3209      -35     
  Misses          457      457              
  Partials        162      162              
Impacted Files Coverage Δ
...earch/knn/index/codec/KNN920Codec/KNN920Codec.java 100.00% <100.00%> (ø)
...rg/opensearch/knn/index/codec/KNNCodecFactory.java 75.00% <100.00%> (-3.58%) ⬇️
...g/opensearch/knn/index/codec/KNNFormatFactory.java 88.88% <100.00%> (+8.88%) ⬆️
...index/codec/KNN80Codec/KNN80DocValuesConsumer.java 88.67% <0.00%> (-0.42%) ⬇️
.../java/org/opensearch/knn/index/util/KNNEngine.java 100.00% <0.00%> (ø)
...java/org/opensearch/knn/index/util/KNNLibrary.java
...main/java/org/opensearch/knn/index/util/Faiss.java 74.48% <0.00%> (ø)
...a/org/opensearch/knn/index/util/NativeLibrary.java 100.00% <0.00%> (ø)
...ain/java/org/opensearch/knn/index/util/Nmslib.java 83.33% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2fc09ba...634a8f7. Read the comment docs.

@@ -8,6 +8,7 @@
import org.apache.lucene.codecs.Codec;
import org.opensearch.index.mapper.MapperService;
import org.opensearch.knn.index.codec.KNN910Codec.KNN910Codec;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to keep 910 codec around in this builder?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each codec has it's own builder as an inner class in CodecBuilder, as those builders are short I think it makes sense to have them inside the abstract class at least for now.
Those builders are used by codec factory, it should be able to build any codec from set of supported versions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After taking a second thought it may be a good point, customer can build codec directly using no arg constructor as having such constructor is a requirement from core OS. In such case the only place that uses/requires codec factory is our own code, and we can simplify factory logic and return only the latest version of codec. I've pushed commit for this, @jmazanec15 please check if that is something you had in mind

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me.

Signed-off-by: Martin Gaievski <gaievski@amazon.com>
Signed-off-by: Martin Gaievski <gaievski@amazon.com>
Copy link
Member

@vamshin vamshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks

@martin-gaievski martin-gaievski merged commit 7499375 into opensearch-project:main Jul 13, 2022
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jul 13, 2022
* Adding KNN codec that is based on Lucene92 codec

Signed-off-by: Martin Gaievski <gaievski@amazon.com>
(cherry picked from commit 7499375)
martin-gaievski pushed a commit that referenced this pull request Jul 14, 2022
* Adding KNN codec that is based on Lucene92 codec

Signed-off-by: Martin Gaievski <gaievski@amazon.com>
(cherry picked from commit 7499375)
@heemin32 heemin32 added v2.2.0 and removed 2.2.0 labels Nov 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Enhancements Increases software capabilities beyond original client specifications v2.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants