Tag Archives: hdfs

Tips: Spark

Configuration Pass configuration values from a property file spark-submit supports loading configuration values from a file read whitespace-delimited key/value pairs from this file customize the exact location of the file using the –properties-file flag to spark-submit $ bin/spark-submit \ –class … Continue reading

Posted in spark, Tips | Tagged , , , , , , , , , , | Leave a comment

Restrict write permissions on hive external directories

It is possible to restrict the write permissions on hive external directories. This would in turn boost the security of data in Hive We would just need to add Read + Execute permissions to the directory. It is also possible to … Continue reading

Posted in hive | Tagged , , , , | Leave a comment

Java-Hadoop : Create a file in HDFS programmatically and write data into it

Here is a java program with its pom file which lets you to create a file in HDFS, write data into it. The pom file lets you create 2 jar files, one which has all the dependencies included in it. … Continue reading

Posted in Java-Maven-Hadoop | Tagged , , , | Leave a comment

Sqoop : Incremental Imports using Last-Modified mode

As discussed in my previous post, Sqoop is a tool designed to transfer data between Hadoop and relational databases. Incremental imports mode can be used to retrieve only rows newer than some previously-imported set of rows. Why & When Last-Modified … Continue reading

Posted in sqoop | Tagged , , , , | 3 Comments

Sqoop : Incremental Imports using Append mode

As you all know, Sqoop is a tool designed to transfer data between Hadoop and relational databases. Incremental imports mode can be used to retrieve only rows newer than some previously-imported set of rows. Why Append mode ?? works for numerical data … Continue reading

Posted in sqoop | Tagged , , , , , | 7 Comments