Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handles truncated boundary keys #56

Merged
merged 1 commit into from
Jun 29, 2021

Conversation

ruweih
Copy link
Contributor

@ruweih ruweih commented Jan 29, 2021

Boundary keys might not be real keys:

apple/foundationdb#3608

We need handle those truncated keys, or it will fail in FoundationDB 6.x client:

java.lang.IllegalArgumentException: No terminator found for bytes starting at 1
	at com.apple.foundationdb.tuple.TupleUtil$DecodeState.findNullTerminator(TupleUtil.java:98)
	at com.apple.foundationdb.tuple.TupleUtil.decode(TupleUtil.java:438)
	at com.apple.foundationdb.tuple.TupleUtil.unpack(TupleUtil.java:676)
	at com.apple.foundationdb.tuple.Tuple.fromBytes(Tuple.java:526)
	at com.apple.foundationdb.subspace.Subspace.unpack(Subspace.java:231)
	at org.janusgraph.diskstorage.foundationdb.FoundationDBKeyValueStore.lambda$getBoundaryKeys$1(FoundationDBKeyValueStore.java:264)
	at java.util.Iterator.forEachRemaining(Iterator.java:116)
	at org.janusgraph.diskstorage.foundationdb.FoundationDBKeyValueStore.getBoundaryKeys(FoundationDBKeyValueStore.java:264)

Signed-off-by: Randy Hu <ruweih@gmail.com>
Copy link
Member

@mbrukman mbrukman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very familiar with boundary keys (so I'll ask one of the suggested reviewers to take a look), but would it be possible to add a unit test for this change?

@ruweih
Copy link
Contributor Author

ruweih commented Feb 1, 2021

This is very difficult to be reproduced as mentioned in the original post: apple/foundationdb#3608

We did not see the issue until when the data volume grows to really huge, and it still not guarantee to be reproducible. This is not used by JG storage adapter in Gremlin queries currently, only used by bulk operations.

it.forEachRemaining(key -> keys.add(getBuffer(db.unpack(key).getBytes(0))));
it.forEachRemaining(key -> {
if (key[key.length - 1] != 0x00) {
key = Arrays.copyOf(key, key.length + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you come up with the idea of adding padding here? I mean it looks good but was it mentioned anywhere from the FDB devs or did you debug it yourself and found out that found out that keys have to be 0-terminated?

Copy link
Contributor Author

@ruweih ruweih Feb 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks for clarification. Now your changes make more sense to me.
Would you like to point out that difference in the FDB issue as well? For me it looks like a step back that they removed automatic termination. Perhaps that was by accident but there may also be a purpose to it. Because if there is, then boundary keys are not meant to be decoded into a StaticBuffer and thus we shouldn't do our own decoding.

I'm not saying that your implementation doesn't work, because I think it does! It's just concerned that we shouldn't use it that way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's likely by accident since the API expect keys are real one. FDB API does not have very good support for bulk processing like this. The boundary keys returned are encoded, but the APIs on JG FDB adapter expect decoded one when provided as KVQuery:

byte[] startKey = (keyStart == null) ?
db.range().begin : db.pack(keyStart.as(FoundationDBKeyValueStore.ENTRY_FACTORY));
byte[] endKey = (keyEnd == null) ?
db.range().end : db.pack(keyEnd.as(FoundationDBKeyValueStore.ENTRY_FACTORY));

So we have to unpack it first in order to use it later to get slices, plus we need tweaking the key to make sure one record not split into different slices. The JG FDB adapter is for runtime, and we don't have similar bulk input format when HBase or Cassandra as backend:

https://www.javadoc.io/doc/org.janusgraph/janusgraph-hadoop-core/latest/org/janusgraph/hadoop/formats/hbase/HBaseInputFormat.html

The addition of boundary key support is the first step to add similar integration for bulk processing.

Copy link
Contributor

@rngcntr rngcntr Feb 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is indeed by accident, it's a good idea to report that issue to the FDB community so they can fix it. Until then, I think your workaround does a good job avoiding that bug.

@ruweih
Copy link
Contributor Author

ruweih commented Jun 28, 2021

Could someone please merge this? It's almost 5 month since approved.

@rngcntr
Copy link
Contributor

rngcntr commented Jun 29, 2021

I completely forgot about this PR. Merging it now

@rngcntr rngcntr merged commit 87cc864 into JanusGraph:master Jun 29, 2021
@ruweih ruweih deleted the rangekeys-ruweih branch February 6, 2023 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: external Externally-managed CLA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants