Oozie : Pig Action – Pig Jobs using oozie

Hi All,

In my upcoming posts, lets discuss about oozie and how to implement various actions using it…

What are OOzie and Pig??  

Apache Oozie is a system for running workflows of dependent jobs. It is composed of
two main parts: a workflow engine that stores and runs workflows composed of different
types of Hadoop jobs (MapReduce, Pig, Hive, and so on), and a coordinator engine that
runs workflow jobs based on predefined schedules and data availability.

Please visit my slideshare : Shalishvj/ApacheOozie for more details on Oozie.

Pig is a high-level platform for creating MapReduce programs used with Hadoop. The language for this platform is called Pig Latin.Pig Latin abstracts the programming from the Java MapReduce idiom into a notation which makes MapReduce programming high level, similar to that of SQL for RDBMS systems.

Pig Action

The pig action starts a Pig job.
The workflow job will wait until the pig job completes before continuing to the next action.

Folder structure

1

Oozie workflow definition (workflow.xml)

This file should go into the HDFS in the package

22

job.properties file. This file resides in a local directory from where oozie command is triggered.

33

ShareLib

  • Oozie needs Jar files to execute the workflow.
  • Oozie will take any of the JARs that you put in that lib folder and automatically add them to your workflow’s classpath when it’s executed.
  • You can use the oozie.libpath property in your job.properties file to specify additional HDFS directories (multiple directories can be separated by a comma) that contain JARs.
  • default location to install the ShareLib is /user/oozie/share/lib

44

Move the package containing workflow.xml, lib, pig script file into HDFS

55

Execute the Oozie command from the local directory containing job.properties file, which has reference to the workflow.xml in the HDFS.

66

Oozie webconsole is available to track the status of the workflow job.

77

You can drill-down to get more info about the workflow job.

88

Further down…

99

Clicking on the Search button near “Console URL” would take you to the Jobtracker UI which has the Job details..

100

Thankyou..Stay tuned 🙂

Advertisements

About shalishvj : My Experience with BigData

6+ years of experience using Bigdata technologies in Architect, Developer and Administrator roles for various clients. • Experience using Hortonworks, Cloudera, AWS distributions. • Cloudera Certified Developer for Hadoop. • Cloudera Certified Administrator for Hadoop. • Spark Certification from Big Data Spark Foundations. • SCJP, OCWCD. • Experience in setting up Hadoop clusters in PROD, DR, UAT , DEV environments.
This entry was posted in oozie, Uncategorized and tagged , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s