Category Archives: Hadoop

Hadoop / SPARK on Windows

Hadoop on Windows Download the required binaries (e.g., winutils.exe) necessary to run hadoop Download link: https://github.com/srccodes/hadoop-common-2.2.0-bin/archive/master.zip Add it to $HADOOP_HOME/bin Set  $HADOOP_HOME, $JAVA_HOME under environment variables Reference: http://stackoverflow.com/questions/19620642/failed-to-locate-the-winutils-binary-in-the-hadoop-binary-path   Spark on Windows While running spark, you can refer to a local path in … Continue reading

Posted in Hadoop, spark, Uncategorized | Tagged , , | Leave a comment

Hadoop File Formats

Text – RCFiles – Parquet – ORC Compression Based on a study conducted, Text – RCFiles – Parquet – ORC : Original – 14%  Smaller – 62% Smaller – 78% Smaller   Considerations for ORC over Parquet are: 1. ORC … Continue reading

Posted in Hadoop, hive | Tagged , , , , | Leave a comment