Tips: Sqoop

Override Cluster properties

Eg:- disable compression for sqoop output when compression is turned on in the cluster

sqoop import -Dmapred.job.queue.name=default \

-Dmapreduce.map.output.compress=false \

-Dmapreduce.output.fileoutputformat.compress=false \

–driver com.ibm.db2.jcc.DB2Driver –connect jdbc:db2://<host>/<db>\ –username <user>–password <pwd> \

–table <db2 table> –target-dir <hdfs path> \

–split-by <col name> \

–as-textfile

Import data

**Ensure that all the jdbc jars are kept in the lib folder of Sqoop

From SQL Server

sqoop import -Dmapred.job.queue.name=default \
–connect ‘jdbc:sqlserver://<host>:<port>;username=<user>;password=<pwd>;database=<dbname>’ \
–hive-import –hive-table <db.tablename> \
–table <sqlservertable> –split-by <splitcolumn> –as-textfile

From DB2

sqoop import -Dmapred.job.queue.name=default \
–driver com.ibm.db2.jcc.DB2Driver –connect jdbc:db2://<host>:<port>/<db> –username <user> –password <pwd> \
–hive-import –hive-table <db.tablename> \
–table <db2table> –split-by <splitcolumn> –as-textfile

From Vertica

sqoop import -Dmapred.job.queue.name=default \
–driver com.vertica.jdbc.Driver –connect jdbc:vertica://<>:5433/<dbname> –password <pwd> –username <user> \
–hive-import –hive-table <hiveDb.hiveTablename> \
–query ‘SELECT t.col1, t.col2,… FROM VerticaTable t where $CONDITIONS’ \
–target-dir <hdfsDirPath> \
–split-by <verticaCol> \
–as-textfile -m <numMappers>

**Sqoop import with –table doesnt work for Vertica, it often results in 

Error :
Error: java.io.IOException: SQLException in nextKeyValue
……
Caused by: com.vertica.support.exceptions.SyntaxErrorException: [Vertica][VJDBC](4856) ERROR: Syntax error at or near “.”
… 24 more

 

stay tuned..

Advertisements

About shalishvj : My Experience with BigData

6+ years of experience using Bigdata technologies in Architect, Developer and Administrator roles for various clients. • Experience using Hortonworks, Cloudera, AWS distributions. • Cloudera Certified Developer for Hadoop. • Cloudera Certified Administrator for Hadoop. • Spark Certification from Big Data Spark Foundations. • SCJP, OCWCD. • Experience in setting up Hadoop clusters in PROD, DR, UAT , DEV environments.
This entry was posted in sqoop, Tips and tagged , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s