-
Recent Posts
Recent Comments
Archives
Categories
Meta
Tag Archives: spark
Spark: Transformations with Examples
FILTER List<String> strList = Arrays.asList(“qqqqqqqqwwwww”, “eeeeerrrrrrrrr”, “ttttttttyyyyyyyyy”); SparkConf conf = new SparkConf();static SparkConf conf = new SparkConf(); static JavaSparkContext jsc = new JavaSparkContext(conf); JavaRDD<String> filterStrRDD = strRDD.filter ( new Function<String, Boolean>(){JavaRDD<String> filterStrRDD = strRDD.filter ( new Function<String, Boolean>(){ @Override public … Continue reading
Posted in Uncategorized
Tagged filter, flatmap, map, RDD, spark, spark transformations
Leave a comment
Spark: Actions with Examples
REDUCE Reduces the elements of this RDD using the specified commutative and associative binary operator. reduce(<function type>) takes a Function Type ; which takes 2 elements of RDD Element Type as argument & returns the Element of same type Example:- … Continue reading
Spark: Resolving Errors
java.lang.OutOfMemoryError: Java heap space Fix: Increase the driver memory using –driver-memory
Spark -> Parquet,ORC
Create Java RDDs String filePath = “hdfs://<HDFSName>:8020/user…” String outFile = “hdfs://<HDFSName>:8020/user…” SparkConf conf = new SparkConf().setAppName(“appname”); JavaSparkContext jsc = new JavaSparkContext(conf); JavaRDD<String> inFIleRDD = jsc.textFile(filePath); Remove initial empty lines from a file import org.apache.spark.api.java.function.Function2; Function2 removeSpace = new Function2<Integer, Iterator<String>, Iterator<String>>(){ … Continue reading
Spark: Configuration, Execution, Performance
Configuration Pass configuration values from a property file spark-submit supports loading configuration values from a file read whitespace-delimited key/value pairs from this file customize the exact location of the file using the –properties-file flag to spark-submit $ bin/spark-submit \ –class … Continue reading
Posted in spark, Tips
Tagged apache spark, bigdata, execution, fold, Hadoop, hdfs, spark, spark configuration, spark errors, spark performance, spark serialization
Leave a comment
Developer’s template: Spark
Developer’s template series is intended to ease the life of Bigdata developers with their application development and leave behind the headache of starting from the scratch. Following program helps you develop and execute an application using Apache Spark with Java. Prerequisites Hadoop … Continue reading
Tips: Spark
Execute a Spark Pi From Spark directory (usually /usr/hdp/current/spark-client , in case of Hortonworks HDP 2.3.2) run ./bin/spark-submit –class org.apache.spark.examples.SparkPi –master yarn-cluster –num-executors 3 –driver-memory 512m –executor-memory 512m –executor-cores 1 lib/spark-examples*.jar 10 stay tuned..