Skip to content

Azkaban Job Type: Pig

cjyu edited this page Mar 24, 2013 · 1 revision

Introduction

Azkaban2 is a ground-up re-design of the old azkaban. One of the design goals is to make Azkaban robust and flexible. The job executors that actually run user jobs were in the way -- we had to upgrade the whole package for any changes in any job executor.

So in Azkaban2, the job executors are carved out to be plugin based. This way, we can add a lot of different job executor plugins as we want -- for hive, for pig, etc, and for different versions of them. We could also add job executors that work with different version of Hadoop without touching the core Azkaban2.

Here is an existing job type re-introduced in Azkaban2:

Pig

In large part, this is the same "pig" type that was in the old azkaban. The difference is mainly in security. For description of Hadoop delegation tokens, refer to "HadoopJava" type page.

In the old azkaban, the keytab information is handed to the user process: The pig job wrapper does the keytab based login and proxy as user to call pig main. It is obviously dangerous for enterprise cluster in LinkedIn.

In Azkaban2, pig jobs are not longer handed the keytab info. Rather, each pig job will be granted hadoop delegation tokens. Luckily for the users, there is no extra action required to use a pig job package that was working in old azkaban and put it to work with new Azkaban2. Plus, there are added settings to make pig jobs taken in more parameters.

How To Use

One needs to specify job type to pig

type=pig

One must also tell azkaban where the pig script is:

pig.script=WHERE_YOUR_PIG_SCRIPT_ON_AZKABAN_MACHINE

One runs pig on a hadoop cluster, therefore one needs

user.to.proxy=YOU_HADOOP_USER_NAME

The proxy user needs to be added as one of project permission in permissions page. Azkaban2 makes sure it doesn't request delegation tokens for any one on anyone's behalf.

pig.additional.jars=PRE_REGISTER_YOUR_UDF_JARS

udf.import.list=IMPORT_YOUR_UDF_NAME_SPACE

param.YOUR_PIG_PARAMS_NAME=YOUR_PIG_PARAMS

This is equivalent to the "-param " in pig command line.

param_file=YOUR_PIG_PARAM_FILE

Additionally, pig job type is based on JavaProcessJob class and supports settings such as jvm.args, classpath, etc.

Example Job Package:

see plugins/jobtype/examples/pig-wc

do zip wordcountpig.zip ./* -r to create the zip package and upload to azkaban

Clone this wiki locally