Category Archives: hive

Hadoop File Formats

Text – RCFiles – Parquet – ORC Compression Based on a study conducted, Text – RCFiles – Parquet – ORC : Original – 14%  Smaller – 62% Smaller – 78% Smaller   Considerations for ORC over Parquet are: 1. ORC … Continue reading

Posted in Hadoop, hive | Tagged , , , , | Leave a comment

Tips: Sqoop-Hive

  Import data from a database (eg:- SQL Server) into Hive and create an ORC table out of that sqoop import -Dmapred.job.queue.name=default \ –connect ‘jdbc:sqlserver://<host>:<port>;username=<user>;password=<pwd>;database=<dbname>’ \ –hive-import –hive-table <db.tablename> \ –table <sqlservertable> –split-by <splitcolumn> –as-textfile CREATE TABLE <db.tablename_orc> LIKE <db.tablename> … Continue reading

Posted in hive, sqoop, Tips | Leave a comment

Restrict write permissions on hive external directories

It is possible to restrict the write permissions on hive external directories. This would in turn boost the security of data in Hive We would just need to add Read + Execute permissions to the directory. It is also possible to … Continue reading

Posted in hive | Tagged , , , , | Leave a comment

Developer’s template: Hive using JDBC

Developer’s template series is intended to ease the life of  Bigdata developers with their application development and leave behind the headache of starting from the scratch. Here is a java program with its pom file which lets you to connect to Hive and … Continue reading

Posted in hive, Java-Maven-Hadoop, Uncategorized | Tagged , , , , | 2 Comments