-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handles truncated boundary keys #56
Conversation
Signed-off-by: Randy Hu <ruweih@gmail.com>
6658cc9
to
5a7c2fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very familiar with boundary keys (so I'll ask one of the suggested reviewers to take a look), but would it be possible to add a unit test for this change?
This is very difficult to be reproduced as mentioned in the original post: apple/foundationdb#3608 We did not see the issue until when the data volume grows to really huge, and it still not guarantee to be reproducible. This is not used by JG storage adapter in Gremlin queries currently, only used by bulk operations. |
it.forEachRemaining(key -> keys.add(getBuffer(db.unpack(key).getBytes(0)))); | ||
it.forEachRemaining(key -> { | ||
if (key[key.length - 1] != 0x00) { | ||
key = Arrays.copyOf(key, key.length + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you come up with the idea of adding padding here? I mean it looks good but was it mentioned anywhere from the FDB devs or did you debug it yourself and found out that found out that keys have to be 0-terminated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the line that trying to find the 0x00 as terminator when unpack the byte array:
https://github.com/apple/foundationdb/blob/release-6.2/bindings/java/src/main/com/apple/foundationdb/tuple/TupleUtil.java#L438
and would fail with the exception if not exists:
https://github.com/apple/foundationdb/blob/release-6.2/bindings/java/src/main/com/apple/foundationdb/tuple/TupleUtil.java#L98
It's not an issue in 5.2 client:
https://github.com/apple/foundationdb/blob/release-5.2/bindings/java/src/main/com/apple/foundationdb/tuple/TupleUtil.java#L357
https://github.com/apple/foundationdb/blob/release-5.2/bindings/java/src/main/com/apple/foundationdb/tuple/ByteArrayUtil.java#L326-L328
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok thanks for clarification. Now your changes make more sense to me.
Would you like to point out that difference in the FDB issue as well? For me it looks like a step back that they removed automatic termination. Perhaps that was by accident but there may also be a purpose to it. Because if there is, then boundary keys are not meant to be decoded into a StaticBuffer
and thus we shouldn't do our own decoding.
I'm not saying that your implementation doesn't work, because I think it does! It's just concerned that we shouldn't use it that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's likely by accident since the API expect keys are real one. FDB API does not have very good support for bulk processing like this. The boundary keys returned are encoded, but the APIs on JG FDB adapter expect decoded one when provided as KVQuery:
Lines 40 to 43 in c04fcff
byte[] startKey = (keyStart == null) ? | |
db.range().begin : db.pack(keyStart.as(FoundationDBKeyValueStore.ENTRY_FACTORY)); | |
byte[] endKey = (keyEnd == null) ? | |
db.range().end : db.pack(keyEnd.as(FoundationDBKeyValueStore.ENTRY_FACTORY)); |
So we have to unpack it first in order to use it later to get slices, plus we need tweaking the key to make sure one record not split into different slices. The JG FDB adapter is for runtime, and we don't have similar bulk input format when HBase or Cassandra as backend:
The addition of boundary key support is the first step to add similar integration for bulk processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is indeed by accident, it's a good idea to report that issue to the FDB community so they can fix it. Until then, I think your workaround does a good job avoiding that bug.
Could someone please merge this? It's almost 5 month since approved. |
I completely forgot about this PR. Merging it now |
Boundary keys might not be real keys:
apple/foundationdb#3608
We need handle those truncated keys, or it will fail in FoundationDB 6.x client: