Author Archives: shalishvj : My Experience with BigData

About shalishvj : My Experience with BigData

6+ years of experience using Bigdata technologies in Architect, Developer and Administrator roles for various clients. • Experience using Hortonworks, Cloudera, AWS distributions. • Cloudera Certified Developer for Hadoop. • Cloudera Certified Administrator for Hadoop. • Spark Certification from Big Data Spark Foundations. • SCJP, OCWCD. • Experience in setting up Hadoop clusters in PROD, DR, UAT , DEV environments.

Spark: Transformations with Examples

FILTER List<String> strList = Arrays.asList(“qqqqqqqqwwwww”, “eeeeerrrrrrrrr”, “ttttttttyyyyyyyyy”); SparkConf conf = new SparkConf();static SparkConf conf = new SparkConf(); static JavaSparkContext jsc = new JavaSparkContext(conf); JavaRDD<String> filterStrRDD = strRDD.filter ( new Function<String, Boolean>(){JavaRDD<String> filterStrRDD = strRDD.filter ( new Function<String, Boolean>(){ @Override public … Continue reading

Posted in Uncategorized | Tagged , , , , , | Leave a comment

Spark: Actions with Examples

REDUCE Reduces the elements of this RDD using the specified commutative and associative binary operator. reduce(<function type>) takes a Function Type ; which takes 2 elements of RDD Element Type as argument & returns the Element of same type Example:- … Continue reading

Posted in spark, Uncategorized | Tagged , , , , | Leave a comment

Spark: Resolving Errors

 java.lang.OutOfMemoryError: Java heap space Fix: Increase the driver memory using –driver-memory

Posted in spark, Uncategorized | Tagged , | Leave a comment

Use Case: Automate data flow into HDFS / Hive using Oozie

I am planning to publish few use cases relating to Big data that will be helpful to any industry. Intro Automate data load process into HDFS, Hive and Hive ORC tables ! Suppose you are receiving daily feeds from any … Continue reading

Posted in Uncategorized, use case | Tagged , , , , , , , | Leave a comment

Oozie: CRON style scheduling in Oozie

Cron scheduling adds a lot of flexibility while scheduling jobs using the Oozie coordinator. Its bit tricky, but once you familiarize its going to benefit a lot. Here, just focus on the frequency part in your coordinator.xml <coordinator-app name=”oozie-coordinator” frequency=”0/10 … Continue reading

Posted in oozie, Uncategorized | Tagged , , | Leave a comment

Oozie: How to pass current date to Work Flow

Coordinator.xml <coordinator-app name=”oozie-coordinator” frequency=”1440″ start=”2017-10-06T15:00Z” end=”2099-09-19T13:00Z” timezone=”Canada/Eastern” xmlns=”uri:oozie:coordinator:0.1″> <action> <workflow> <app-path>${workflowxml}</app-path> <configuration> <property> <name>currentDate</name> <value>${coord:formatTime(coord:dateOffset(coord:nominalTime(), 0, ‘DAY’), “yyyyMMdd”)} </value> </property> </configuration> </workflow> </action> </coordinator-app> **Change “0” to -1 if you prefer PREVIOUS DATE Access currentDate in your workflow.xml and pass … Continue reading

Posted in oozie, Uncategorized | Tagged , , , , , | Leave a comment

Tips: Hive

Mask a Column Create a table, Insert values to it CREATE TABLE IF NOT EXISTS employee_test1 ( eid String, name String) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t’ LINES TERMINATED BY ‘\n’ STORED AS TEXTFILE; INSERT INTO TABLE employee_test1 VALUES … Continue reading

Posted in Tips, Uncategorized | Tagged , , , , , , | Leave a comment