Handles truncated boundary keys #56

ruweih · 2021-01-29T06:41:25Z

Boundary keys might not be real keys:

We need handle those truncated keys, or it will fail in FoundationDB 6.x client:

java.lang.IllegalArgumentException: No terminator found for bytes starting at 1
	at com.apple.foundationdb.tuple.TupleUtil$DecodeState.findNullTerminator(TupleUtil.java:98)
	at com.apple.foundationdb.tuple.TupleUtil.decode(TupleUtil.java:438)
	at com.apple.foundationdb.tuple.TupleUtil.unpack(TupleUtil.java:676)
	at com.apple.foundationdb.tuple.Tuple.fromBytes(Tuple.java:526)
	at com.apple.foundationdb.subspace.Subspace.unpack(Subspace.java:231)
	at org.janusgraph.diskstorage.foundationdb.FoundationDBKeyValueStore.lambda$getBoundaryKeys$1(FoundationDBKeyValueStore.java:264)
	at java.util.Iterator.forEachRemaining(Iterator.java:116)
	at org.janusgraph.diskstorage.foundationdb.FoundationDBKeyValueStore.getBoundaryKeys(FoundationDBKeyValueStore.java:264)

Signed-off-by: Randy Hu <ruweih@gmail.com>

mbrukman

I'm not very familiar with boundary keys (so I'll ask one of the suggested reviewers to take a look), but would it be possible to add a unit test for this change?

ruweih · 2021-02-01T16:07:23Z

This is very difficult to be reproduced as mentioned in the original post: apple/foundationdb#3608

We did not see the issue until when the data volume grows to really huge, and it still not guarantee to be reproducible. This is not used by JG storage adapter in Gremlin queries currently, only used by bulk operations.

rngcntr · 2021-02-02T06:37:12Z

src/main/java/org/janusgraph/diskstorage/foundationdb/FoundationDBKeyValueStore.java

-            it.forEachRemaining(key -> keys.add(getBuffer(db.unpack(key).getBytes(0))));
+            it.forEachRemaining(key -> {
+                if (key[key.length - 1] != 0x00) {
+                    key = Arrays.copyOf(key, key.length + 1);


How did you come up with the idea of adding padding here? I mean it looks good but was it mentioned anywhere from the FDB devs or did you debug it yourself and found out that found out that keys have to be 0-terminated?

Yes, this is the line that trying to find the 0x00 as terminator when unpack the byte array:
https://github.com/apple/foundationdb/blob/release-6.2/bindings/java/src/main/com/apple/foundationdb/tuple/TupleUtil.java#L438
and would fail with the exception if not exists:
https://github.com/apple/foundationdb/blob/release-6.2/bindings/java/src/main/com/apple/foundationdb/tuple/TupleUtil.java#L98

It's not an issue in 5.2 client:
https://github.com/apple/foundationdb/blob/release-5.2/bindings/java/src/main/com/apple/foundationdb/tuple/TupleUtil.java#L357
https://github.com/apple/foundationdb/blob/release-5.2/bindings/java/src/main/com/apple/foundationdb/tuple/ByteArrayUtil.java#L326-L328

Ok thanks for clarification. Now your changes make more sense to me.
Would you like to point out that difference in the FDB issue as well? For me it looks like a step back that they removed automatic termination. Perhaps that was by accident but there may also be a purpose to it. Because if there is, then boundary keys are not meant to be decoded into a StaticBuffer and thus we shouldn't do our own decoding.

I'm not saying that your implementation doesn't work, because I think it does! It's just concerned that we shouldn't use it that way.

It's likely by accident since the API expect keys are real one. FDB API does not have very good support for bulk processing like this. The boundary keys returned are encoded, but the APIs on JG FDB adapter expect decoded one when provided as KVQuery:

janusgraph-foundationdb/src/main/java/org/janusgraph/diskstorage/foundationdb/FoundationDBRangeQuery.java

Lines 40 to 43 in c04fcff

byte[] startKey = (keyStart == null) ?

db.range().begin : db.pack(keyStart.as(FoundationDBKeyValueStore.ENTRY_FACTORY));

byte[] endKey = (keyEnd == null) ?

db.range().end : db.pack(keyEnd.as(FoundationDBKeyValueStore.ENTRY_FACTORY));

So we have to unpack it first in order to use it later to get slices, plus we need tweaking the key to make sure one record not split into different slices. The JG FDB adapter is for runtime, and we don't have similar bulk input format when HBase or Cassandra as backend:

https://www.javadoc.io/doc/org.janusgraph/janusgraph-hadoop-core/latest/org/janusgraph/hadoop/formats/hbase/HBaseInputFormat.html

The addition of boundary key support is the first step to add similar integration for bulk processing.

If it is indeed by accident, it's a good idea to report that issue to the FDB community so they can fix it. Until then, I think your workaround does a good job avoiding that bug.

ruweih · 2021-06-28T15:58:41Z

Could someone please merge this? It's almost 5 month since approved.

rngcntr · 2021-06-29T06:00:30Z

I completely forgot about this PR. Merging it now

Handles truncated bundary keys

5a7c2fd

Signed-off-by: Randy Hu <ruweih@gmail.com>

ruweih force-pushed the rangekeys-ruweih branch from 6658cc9 to 5a7c2fd Compare January 29, 2021 06:42

janusgraph-bot added the cla: external Externally-managed CLA label Jan 29, 2021

ruweih mentioned this pull request Jan 30, 2021

Handles potential truncated boundary keys #55

Closed

mbrukman reviewed Feb 1, 2021

View reviewed changes

mbrukman requested review from rngcntr and farodin91 February 1, 2021 15:31

rngcntr reviewed Feb 2, 2021

View reviewed changes

rngcntr approved these changes Feb 4, 2021

View reviewed changes

rngcntr merged commit 87cc864 into JanusGraph:master Jun 29, 2021

rngcntr mentioned this pull request Jun 29, 2021

Upgrade dependencies to fix image pull error #61

Merged

ruweih deleted the rangekeys-ruweih branch February 6, 2023 17:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handles truncated boundary keys #56

Handles truncated boundary keys #56

ruweih commented Jan 29, 2021

mbrukman left a comment

ruweih commented Feb 1, 2021 •

edited

Loading

rngcntr Feb 2, 2021

ruweih Feb 2, 2021 •

edited

Loading

rngcntr Feb 3, 2021

ruweih Feb 3, 2021

rngcntr Feb 4, 2021 •

edited

Loading

ruweih commented Jun 28, 2021

rngcntr commented Jun 29, 2021

	byte[] startKey = (keyStart == null) ?
	db.range().begin : db.pack(keyStart.as(FoundationDBKeyValueStore.ENTRY_FACTORY));
	byte[] endKey = (keyEnd == null) ?
	db.range().end : db.pack(keyEnd.as(FoundationDBKeyValueStore.ENTRY_FACTORY));

Handles truncated boundary keys #56

Handles truncated boundary keys #56

Conversation

ruweih commented Jan 29, 2021

mbrukman left a comment

Choose a reason for hiding this comment

ruweih commented Feb 1, 2021 • edited Loading

rngcntr Feb 2, 2021

Choose a reason for hiding this comment

ruweih Feb 2, 2021 • edited Loading

Choose a reason for hiding this comment

rngcntr Feb 3, 2021

Choose a reason for hiding this comment

ruweih Feb 3, 2021

Choose a reason for hiding this comment

rngcntr Feb 4, 2021 • edited Loading

Choose a reason for hiding this comment

ruweih commented Jun 28, 2021

rngcntr commented Jun 29, 2021

ruweih commented Feb 1, 2021 •

edited

Loading

ruweih Feb 2, 2021 •

edited

Loading

rngcntr Feb 4, 2021 •

edited

Loading