Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how azkaban run spark job? #267

Open
zhoujian319 opened this issue May 25, 2017 · 10 comments
Open

how azkaban run spark job? #267

zhoujian319 opened this issue May 25, 2017 · 10 comments

Comments

@zhoujian319
Copy link

zhoujian319 commented May 25, 2017

Now the Azkaban version of 3.0.0, from JobType Spark source in the HadoopSparkJob, but did not explain how to use, especially in the.Job configuration file is how to configure the configuration parameters and so on, what, how to configure the type=spark job?

@chenruiSundun
Copy link

I have the same problem.

@raghud
Copy link

raghud commented Jul 24, 2017

I am facing the same issue

@ColaCoffe
Copy link

I can use jobtype plugin to run spark job in yarn-cluster mode.
This is my job file:
#spark_demo_wordCount.job
type=spark
user.to.proxy=hdfs
class=com.spark.example.WordCount
master=yarn-cluster
num-executors=1
executor-memory=512M
execution-jar=lib/original-sparkExample-1.0-SNAPSHOT.jar
force.output.overwrite=true
dependencies=start

And this is my system properties:
private.properties
jobtype.class=azkaban.jobtype.HadoopSparkJob
jobtype.classpath=/usr/hdp/2.3.4.0-3485/hadoop:/usr/hdp/2.3.4.0-3485/hadoop/conf:/usr/hdp/2.3.4.0-3485/spark/conf:/usr/hdp/2.3.4.0-3485/spark/lib/*
azkaban.should.proxy=true
execute.as.user=true
proxy.user=hdfs
azkaban.group.name=hadoop

plugin.properties
queue=default

@Jamirkhan50
Copy link

Hi ,

@ColaCoffe

Can you share your sample spark program?
I tried and i extends HadoopSparkJob in my sample file but how should i initialize the spark master and other thinks i didn't get.

Thanks,
Jamirkhan Pathan

@Jamirkhan50
Copy link

@ColaCoffe

I am trying to run my sample program. it gives me
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.yarn.conf.YarnConfiguration
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 15 more

Can u help me to resolve this.

@ColaCoffe
Copy link

@Jamirkhan50
That looks like you didn't set Yarn classpath.
check out the private.properties under directory of spark jobtype,you should set yarn classpath.
this is my private.properties:
jobtype.class=azkaban.jobtype.HadoopSparkJob
jobtype.classpath=/usr/hdp/2.3.4.0-3485/hadoop/,/usr/hdp/2.3.4.0-3485/hadoop/lib/,/usr/hdp/2.3.4.0-3485/hadoop/conf,/usr/hdp/2.3.4.0-3485/hadoop-yarn/,/usr/hdp/2.3.4.0-3
485/hadoop-yarn/lib/
,/usr/hdp/2.3.4.0-3485/spark/conf,/usr/hdp/2.3.4.0-3485/spark/lib/*
azkaban.should.proxy=true
execute.as.user=true
azkaban.group.name=hadoop

@Jamirkhan50
Copy link

@ColaCoffe

Thanks for reply,
But in my case the issue is to copy the hadoop dependencies file in azkabanbasepath/lib

Thanks,
Jamirkhan

@Jamirkhan50
Copy link

@ColaCoffe

Now i am stuck on this issue.

  • Exception in thread "main" java.lang.IllegalStateException: Library directory '/home/ambari/softwares/azkaban-solo-server-3.32.2/executions/120/Spark/assembly/target/scala-2.11/jars' does not exist; make sure Spark is built.

Can u help me to out from this.

@ColaCoffe
Copy link

@Jamirkhan50
Show me your file of spark job,I guess you didn't set jar path correctly.

@Jamirkhan50
Copy link

@ColaCoffe
Its working. we have to add deploy-mode=cluster for running under yarn-cluster mode.
Thanks for help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants