Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests and resolve issue running SparkGraphComputer on HBase #81

Merged
merged 1 commit into from
Feb 6, 2017

Conversation

sjudeng
Copy link
Contributor

@sjudeng sjudeng commented Feb 3, 2017

Includes refactoring of CassandraInputFormatIT to pull out common tests. Fixes issue in HBaseBinaryInputFormat and HBaseBinaryRecordReader that caused the following error:

06:47:57.278 [Executor task launch worker-0] ERROR org.apache.spark.executor.Executor - Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.ClassCastException: org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$1 cannot be cast to org.apache.hadoop.hbase.mapreduce.TableRecordReader
    at com.thinkaurelius.titan.hadoop.formats.hbase.HBaseBinaryInputFormat.createRecordReader(HBaseBinaryInputFormat.java:47) ~[titan-hadoop-core-1.1.0-SNAPSHOT.jar:na]
    at com.thinkaurelius.titan.hadoop.formats.util.GiraphInputFormat.createRecordReader(GiraphInputFormat.java:53) ~[titan-hadoop-core-1.1.0-SNAPSHOT.jar:na]
    at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:151) ~[spark-core_2.10-1.5.2.jar:1.5.2]
    at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:124) ~[spark-core_2.10-1.5.2.jar:1.5.2]
    at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65) ~[spark-core_2.10-1.5.2.jar:1.5.2]
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) ~[spark-core_2.10-1.5.2.jar:1.5.2]
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) ~[spark-core_2.10-1.5.2.jar:1.5.2]
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) ~[spark-core_2.10-1.5.2.jar:1.5.2]
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) ~[spark-core_2.10-1.5.2.jar:1.5.2]
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) ~[spark-core_2.10-1.5.2.jar:1.5.2]
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) ~[spark-core_2.10-1.5.2.jar:1.5.2]
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) ~[spark-core_2.10-1.5.2.jar:1.5.2]
    at org.apache.spark.scheduler.Task.run(Task.scala:88) ~[spark-core_2.10-1.5.2.jar:1.5.2]
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) ~[spark-core_2.10-1.5.2.jar:1.5.2]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_60]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_60]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]

Note when merging this PR with #79, the hbase-read.properties test resource in janusgraph-hadoop-core needs to be updated to use JanusGraphKryoRegistrator (see this commit).

References:

thinkaurelius/titan#1268
thinkaurelius/titan#1269

<optional>true</optional>
<scope>test</scope>
<exclusions>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

com.lmax.disruptor is only in hbase-server. No need to exclude for hbase-client.

Copy link
Member

@jerryjch jerryjch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Would you combine the commits into one before delivering into master?

@mbrukman
Copy link
Member

mbrukman commented Feb 3, 2017

@jerryjch – any particular concern with multiple commits? If they're logically distinct, it may be better to keep them separate for readability and future readers, and use a merge commit so that there's only a single point on master at which time they've all landed.

@jerryjch
Copy link
Member

jerryjch commented Feb 3, 2017

The 5 commits here can be more logically clean and descriptive.
Let's keep our commit history clear and clean.

@sjudeng
Copy link
Contributor Author

sjudeng commented Feb 3, 2017

Removed the unnecessary exclusion (good catch) and squashed into a single commit.

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphInputFormat=org.janusgraph.hadoop.formats.hbase.HBaseInputFormat
gremlin.hadoop.graphOutputFormat=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat
#gremlin.hadoop.graphOutputFormat=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this commented-out line? Is there a use case to prefer this over the NullOutputFormat on the previous line? Does this help with debugging test failures somehow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just copied this file from cassandra-read.properties which includes the commented out line (probably for no good reason). Let me know if I should remove.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets remove it?

@jerryjch jerryjch added this to the 0.1.0 milestone Feb 4, 2017
Copy link

@amcp amcp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, minor questions

gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphInputFormat=org.janusgraph.hadoop.formats.hbase.HBaseInputFormat
gremlin.hadoop.graphOutputFormat=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat
#gremlin.hadoop.graphOutputFormat=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets remove it?

assertEquals(numV, (long) t.V().count().next());
propertiesOnVertex = t.V().valueMap().next();
valuesOnP = (List)propertiesOnVertex.values().iterator().next();
assertEquals(numProps, valuesOnP.size());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it enough to only check the number of properties?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the version that was in CassandraInputFormatIT and was not modified in this PR. I think changing it would be out of scope here but maybe create an issue for it?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please create an issue.


public abstract class AbstractInputFormatIT extends JanusGraphBaseTest {


Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit extra line here?

@sjudeng
Copy link
Contributor Author

sjudeng commented Feb 4, 2017

Should be good now on the requested formatting updates

@mbrukman
Copy link
Member

mbrukman commented Feb 5, 2017

@sjudeng – please rebase on master and re-push to see how the tests perform on Travis.

@sjudeng
Copy link
Contributor Author

sjudeng commented Feb 5, 2017

Done but tests in janusgraph-hadoop which this affects are not currently passing under Travis on master.

…n HBase

Signed-off-by: sjudeng <sjudeng@users.noreply.github.com>
@mbrukman mbrukman merged commit 00592f8 into JanusGraph:master Feb 6, 2017
@sjudeng sjudeng deleted the hbase-hadoop-fix branch February 7, 2017 02:06
micpod pushed a commit to micpod/janusgraph that referenced this pull request Nov 5, 2019
Add tests and resolve issue running SparkGraphComputer on HBase
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This PR is compliant with the CLA storage/hbase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants