Add tests and resolve issue running SparkGraphComputer on HBase #81

sjudeng · 2017-02-03T19:53:05Z

Includes refactoring of CassandraInputFormatIT to pull out common tests. Fixes issue in HBaseBinaryInputFormat and HBaseBinaryRecordReader that caused the following error:

06:47:57.278 [Executor task launch worker-0] ERROR org.apache.spark.executor.Executor - Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.ClassCastException: org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$1 cannot be cast to org.apache.hadoop.hbase.mapreduce.TableRecordReader
    at com.thinkaurelius.titan.hadoop.formats.hbase.HBaseBinaryInputFormat.createRecordReader(HBaseBinaryInputFormat.java:47) ~[titan-hadoop-core-1.1.0-SNAPSHOT.jar:na]
    at com.thinkaurelius.titan.hadoop.formats.util.GiraphInputFormat.createRecordReader(GiraphInputFormat.java:53) ~[titan-hadoop-core-1.1.0-SNAPSHOT.jar:na]
    at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:151) ~[spark-core_2.10-1.5.2.jar:1.5.2]
    at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:124) ~[spark-core_2.10-1.5.2.jar:1.5.2]
    at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65) ~[spark-core_2.10-1.5.2.jar:1.5.2]
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) ~[spark-core_2.10-1.5.2.jar:1.5.2]
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) ~[spark-core_2.10-1.5.2.jar:1.5.2]
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) ~[spark-core_2.10-1.5.2.jar:1.5.2]
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) ~[spark-core_2.10-1.5.2.jar:1.5.2]
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) ~[spark-core_2.10-1.5.2.jar:1.5.2]
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) ~[spark-core_2.10-1.5.2.jar:1.5.2]
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) ~[spark-core_2.10-1.5.2.jar:1.5.2]
    at org.apache.spark.scheduler.Task.run(Task.scala:88) ~[spark-core_2.10-1.5.2.jar:1.5.2]
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) ~[spark-core_2.10-1.5.2.jar:1.5.2]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_60]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_60]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]

Note when merging this PR with #79, the hbase-read.properties test resource in janusgraph-hadoop-core needs to be updated to use JanusGraphKryoRegistrator (see this commit).

References:

thinkaurelius/titan#1268
thinkaurelius/titan#1269

jerryjch · 2017-02-03T22:22:12Z

janusgraph-hadoop-parent/janusgraph-hadoop-2/pom.xml

            <optional>true</optional>
            <scope>test</scope>
+            <exclusions>


com.lmax.disruptor is only in hbase-server. No need to exclude for hbase-client.

jerryjch

LGTM.

Would you combine the commits into one before delivering into master?

mbrukman · 2017-02-03T22:29:55Z

@jerryjch – any particular concern with multiple commits? If they're logically distinct, it may be better to keep them separate for readability and future readers, and use a merge commit so that there's only a single point on master at which time they've all landed.

jerryjch · 2017-02-03T22:43:56Z

The 5 commits here can be more logically clean and descriptive.
Let's keep our commit history clear and clean.

sjudeng · 2017-02-03T23:03:41Z

Removed the unnecessary exclusion (good catch) and squashed into a single commit.

mbrukman · 2017-02-03T23:08:14Z

janusgraph-hadoop-parent/janusgraph-hadoop-core/src/test/resources/hbase-read.properties

+gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
+gremlin.hadoop.graphInputFormat=org.janusgraph.hadoop.formats.hbase.HBaseInputFormat
+gremlin.hadoop.graphOutputFormat=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat
+#gremlin.hadoop.graphOutputFormat=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat


Do we need this commented-out line? Is there a use case to prefer this over the NullOutputFormat on the previous line? Does this help with debugging test failures somehow?

I just copied this file from cassandra-read.properties which includes the commented out line (probably for no good reason). Let me know if I should remove.

lets remove it?

amcp

lgtm, minor questions

amcp · 2017-02-04T05:05:34Z

janusgraph-hadoop-parent/janusgraph-hadoop-core/src/test/resources/hbase-read.properties

+gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
+gremlin.hadoop.graphInputFormat=org.janusgraph.hadoop.formats.hbase.HBaseInputFormat
+gremlin.hadoop.graphOutputFormat=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat
+#gremlin.hadoop.graphOutputFormat=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat


lets remove it?

amcp · 2017-02-04T05:07:29Z

...parent/janusgraph-hadoop-core/src/test/java/org/janusgraph/hadoop/AbstractInputFormatIT.java

+        assertEquals(numV, (long) t.V().count().next());
+        propertiesOnVertex = t.V().valueMap().next();
+        valuesOnP = (List)propertiesOnVertex.values().iterator().next();
+        assertEquals(numProps, valuesOnP.size());


is it enough to only check the number of properties?

This was the version that was in CassandraInputFormatIT and was not modified in this PR. I think changing it would be out of scope here but maybe create an issue for it?

please create an issue.

amcp · 2017-02-04T05:08:16Z

...parent/janusgraph-hadoop-core/src/test/java/org/janusgraph/hadoop/AbstractInputFormatIT.java

+
+public abstract class AbstractInputFormatIT extends JanusGraphBaseTest {
+
+


nit extra line here?

sjudeng · 2017-02-04T14:41:13Z

Should be good now on the requested formatting updates

mbrukman · 2017-02-05T22:04:55Z

@sjudeng – please rebase on master and re-push to see how the tests perform on Travis.

sjudeng · 2017-02-05T22:44:46Z

Done but tests in janusgraph-hadoop which this affects are not currently passing under Travis on master.

…n HBase Signed-off-by: sjudeng <sjudeng@users.noreply.github.com>

Add tests and resolve issue running SparkGraphComputer on HBase

janusgraph-bot added the cla: yes This PR is compliant with the CLA label Feb 3, 2017

sjudeng mentioned this pull request Feb 3, 2017

Issue 67: Update Elasticsearch and improve geoshape indexing #79

Merged

mbrukman requested review from hsaputra, jerryjch and pluradj February 3, 2017 21:05

mbrukman added the storage/hbase label Feb 3, 2017

jerryjch reviewed Feb 3, 2017

View reviewed changes

sjudeng force-pushed the hbase-hadoop-fix branch from f615d2a to de5af36 Compare February 3, 2017 23:02

mbrukman reviewed Feb 3, 2017

View reviewed changes

jerryjch added this to the 0.1.0 milestone Feb 4, 2017

jerryjch approved these changes Feb 4, 2017

View reviewed changes

amcp approved these changes Feb 4, 2017

View reviewed changes

sjudeng force-pushed the hbase-hadoop-fix branch from de5af36 to eaf1bc9 Compare February 4, 2017 14:38

sjudeng mentioned this pull request Feb 5, 2017

Hadoop InputFormat tests should check property values #87

Closed

sjudeng force-pushed the hbase-hadoop-fix branch from eaf1bc9 to c148012 Compare February 5, 2017 22:42

Add tests and resolve ClassCastException running SparkGraphComputer o…

e4b57f1

…n HBase Signed-off-by: sjudeng <sjudeng@users.noreply.github.com>

sjudeng force-pushed the hbase-hadoop-fix branch from c148012 to e4b57f1 Compare February 5, 2017 23:36

mbrukman merged commit 00592f8 into JanusGraph:master Feb 6, 2017

sjudeng deleted the hbase-hadoop-fix branch February 7, 2017 02:06

micpod pushed a commit to micpod/janusgraph that referenced this pull request Nov 5, 2019

Merge pull request JanusGraph#81 from ngageoint/hbase-hadoop-fix

1c89bf3

Add tests and resolve issue running SparkGraphComputer on HBase

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tests and resolve issue running SparkGraphComputer on HBase #81

Add tests and resolve issue running SparkGraphComputer on HBase #81

sjudeng commented Feb 3, 2017 •

edited

Loading

jerryjch Feb 3, 2017

jerryjch left a comment

mbrukman commented Feb 3, 2017

jerryjch commented Feb 3, 2017

sjudeng commented Feb 3, 2017

mbrukman Feb 3, 2017

sjudeng Feb 3, 2017

amcp Feb 4, 2017

amcp left a comment

amcp Feb 4, 2017

amcp Feb 4, 2017

sjudeng Feb 4, 2017

amcp Feb 4, 2017

amcp Feb 4, 2017

sjudeng commented Feb 4, 2017

mbrukman commented Feb 5, 2017

sjudeng commented Feb 5, 2017


		public abstract class AbstractInputFormatIT extends JanusGraphBaseTest {

Add tests and resolve issue running SparkGraphComputer on HBase #81

Add tests and resolve issue running SparkGraphComputer on HBase #81

Conversation

sjudeng commented Feb 3, 2017 • edited Loading

Choose a reason for hiding this comment

jerryjch left a comment

Choose a reason for hiding this comment

mbrukman commented Feb 3, 2017

jerryjch commented Feb 3, 2017

sjudeng commented Feb 3, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amcp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sjudeng commented Feb 4, 2017

mbrukman commented Feb 5, 2017

sjudeng commented Feb 5, 2017

sjudeng commented Feb 3, 2017 •

edited

Loading