Tag Archives: spark

Spark: Transformations with Examples

Posted on November 2, 2017 by shalishvj : My Experience with BigData

FILTER List<String> strList = Arrays.asList(“qqqqqqqqwwwww”, “eeeeerrrrrrrrr”, “ttttttttyyyyyyyyy”); SparkConf conf = new SparkConf();static SparkConf conf = new SparkConf(); static JavaSparkContext jsc = new JavaSparkContext(conf); JavaRDD<String> filterStrRDD = strRDD.filter ( new Function<String, Boolean>(){JavaRDD<String> filterStrRDD = strRDD.filter ( new Function<String, Boolean>(){ @Override public … Continue reading →

Posted in Uncategorized | Tagged filter, flatmap, map, RDD, spark, spark transformations | Leave a comment

Spark: Actions with Examples

Posted on November 2, 2017 by shalishvj : My Experience with BigData

REDUCE Reduces the elements of this RDD using the specified commutative and associative binary operator. reduce(<function type>) takes a Function Type ; which takes 2 elements of RDD Element Type as argument & returns the Element of same type Example:- … Continue reading →

Posted in spark, Uncategorized | Tagged bigdata, fold, reduce, spark, spark action | Leave a comment

Spark: Resolving Errors

Posted on November 2, 2017 by shalishvj : My Experience with BigData

java.lang.OutOfMemoryError: Java heap space Fix: Increase the driver memory using –driver-memory

Posted in spark, Uncategorized | Tagged spark, spark errors | Leave a comment

Spark -> Parquet,ORC

Posted on May 29, 2016 by shalishvj : My Experience with BigData

Create Java RDDs String filePath = “hdfs://<HDFSName>:8020/user…” String outFile = “hdfs://<HDFSName>:8020/user…” SparkConf conf = new SparkConf().setAppName(“appname”); JavaSparkContext jsc = new JavaSparkContext(conf); JavaRDD<String> inFIleRDD = jsc.textFile(filePath); Remove initial empty lines from a file import org.apache.spark.api.java.function.Function2; Function2 removeSpace = new Function2<Integer, Iterator<String>, Iterator<String>>(){ … Continue reading →

Posted in spark, Uncategorized | Tagged apache spark, bigdata, Create Java RDDs, Hadoop, RDD, Remove initial empty lines from a file, Remove initial non -empty lines including headers from a file, spark | Leave a comment

Spark: Configuration, Execution, Performance

Posted on May 24, 2016 by shalishvj : My Experience with BigData

Configuration Pass configuration values from a property file spark-submit supports loading configuration values from a file read whitespace-delimited key/value pairs from this file customize the exact location of the file using the –properties-file flag to spark-submit $ bin/spark-submit \ –class … Continue reading →

Posted in spark, Tips | Tagged apache spark, bigdata, execution, fold, Hadoop, hdfs, spark, spark configuration, spark errors, spark performance, spark serialization | Leave a comment

Developer’s template: Spark

Posted on January 22, 2016 by shalishvj : My Experience with BigData

Developer’s template series is intended to ease the life of Bigdata developers with their application development and leave behind the headache of starting from the scratch. Following program helps you develop and execute an application using Apache Spark with Java. Prerequisites Hadoop … Continue reading →

Posted in Java-Maven-Hadoop, spark | Tagged apache spark, bigdata, Hadoop, spark | Leave a comment

Tips: Spark

Posted on January 22, 2016 by shalishvj : My Experience with BigData

Execute a Spark Pi From Spark directory (usually /usr/hdp/current/spark-client , in case of Hortonworks HDP 2.3.2) run ./bin/spark-submit –class org.apache.spark.examples.SparkPi –master yarn-cluster –num-executors 3 –driver-memory 512m –executor-memory 512m –executor-cores 1 lib/spark-examples*.jar 10 stay tuned..

Posted in spark, Tips | Tagged bigdata, Hadoop, spark | Leave a comment

	Murali Mohan on Sqoop : Incremental Imports us…
	Amitava Dasgupta on Sqoop : Incremental Imports us…
	swathi on Sqoop : Incremental Imports us…
	Yuva Kumar on Sqoop : Incremental Imports us…
	shalishvj : My Exper… on Sqoop : Incremental Imports us…

	Murali Mohan on Sqoop : Incremental Imports us…
	Amitava Dasgupta on Sqoop : Incremental Imports us…
	swathi on Sqoop : Incremental Imports us…
	Yuva Kumar on Sqoop : Incremental Imports us…
	shalishvj : My Exper… on Sqoop : Incremental Imports us…

Tag Archives: spark

Spark: Transformations with Examples

Spark: Actions with Examples

Spark: Resolving Errors

Spark -> Parquet,ORC

Spark: Configuration, Execution, Performance

Developer’s template: Spark

Tips: Spark

Recent Posts

Recent Comments

Archives

Categories

Meta

Recent Posts

Recent Comments

Archives

Categories

Meta