In my upcoming posts, lets discuss about oozie and how to implement various actions using it…
What are OOzie and Pig??
Apache Oozie is a system for running workflows of dependent jobs. It is composed of
two main parts: a workflow engine that stores and runs workflows composed of different
types of Hadoop jobs (MapReduce, Pig, Hive, and so on), and a coordinator engine that
runs workflow jobs based on predefined schedules and data availability.
Please visit my slideshare : Shalishvj/ApacheOozie for more details on Oozie.
Pig is a high-level platform for creating MapReduce programs used with Hadoop. The language for this platform is called Pig Latin.Pig Latin abstracts the programming from the Java MapReduce idiom into a notation which makes MapReduce programming high level, similar to that of SQL for RDBMS systems.
The pig action starts a Pig job.
The workflow job will wait until the pig job completes before continuing to the next action.
Oozie workflow definition (workflow.xml)
This file should go into the HDFS in the package
job.properties file. This file resides in a local directory from where oozie command is triggered.
- Oozie needs Jar files to execute the workflow.
- Oozie will take any of the JARs that you put in that lib folder and automatically add them to your workflow’s classpath when it’s executed.
- You can use the oozie.libpath property in your job.properties file to specify additional HDFS directories (multiple directories can be separated by a comma) that contain JARs.
- default location to install the ShareLib is /user/oozie/share/lib
Move the package containing workflow.xml, lib, pig script file into HDFS
Execute the Oozie command from the local directory containing job.properties file, which has reference to the workflow.xml in the HDFS.
Oozie webconsole is available to track the status of the workflow job.
You can drill-down to get more info about the workflow job.
Clicking on the Search button near “Console URL” would take you to the Jobtracker UI which has the Job details..
Thankyou..Stay tuned 🙂