Sqoop : Incremental Imports using Append mode

As you all know, Sqoop is a tool designed to transfer data between Hadoop and relational databases.
Incremental imports mode can be used to retrieve only rows newer than some previously-imported set of rows.

Why Append mode ??

  • works for numerical data that is incrementing over time, such as auto-increment keys,
  • when importing a table where new rows are continually being added with increasing row id values
  • whereas  “lastmodified,” works on time-stamped data.

Arguments which control incremental imports :-

  • –check-column (col) Specifies the column to be examined when determining which rows to import.
  • (the column should not be of type CHAR/NCHAR/VARCHAR/VARNCHAR/ LONGVARCHAR/LONGNVARCHAR)
  • –incremental (mode) Specifies how Sqoop determines which rows are new.
  • Legal values for mode include append and lastmodified.
  • append mode when importing a table where new rows are continually being added with increasing row id values
  • specify the column containing the row’s id with –check-column
  • Sqoop imports rows where the check column has a value greater than the one specified with –last-value
  • –last-value (value) Specifies the maximum value of the check column from the previous import

In this case I am importing data from Oracle to HDFS.

Consider a table – EMP – shown below. ( SQL Developer view )

1

Execute the command to import the data from Oracle to HDFS for the FIRST time.

Here, I have opted ID column as

2

Last Value is printed on the screen towards the end of the execution of the command.

This value is used in the next incremental import.

3

HDFS web console view which shows the successful import of Oracle data into HDFS

4

Now let us add a new row to the table.

5

Execute the Sqoop command to import only the newly added rows to HDFS.

Last Value from the previous import acts as the argument for –last-value

6

HDFS web console view which shows the addition of a new part file in HDFS

7

HDFS web console view which shows the successful import of new record into HDFS

8

That’s it..stay tuned..

Have a Nice Day 😉

 

Advertisements

About shalishvj : My Experience with BigData

6+ years of experience using Bigdata technologies in Architect, Developer and Administrator roles for various clients. • Experience using Hortonworks, Cloudera, AWS distributions. • Cloudera Certified Developer for Hadoop. • Cloudera Certified Administrator for Hadoop. • Spark Certification from Big Data Spark Foundations. • SCJP, OCWCD. • Experience in setting up Hadoop clusters in PROD, DR, UAT , DEV environments.
This entry was posted in sqoop and tagged , , , , , . Bookmark the permalink.

7 Responses to Sqoop : Incremental Imports using Append mode

  1. Abhinay says:

    Nice post Shalishvj,
    Is there any way we can automate the incremental append and run it as a script for regular time interval ?
    I am able to store the latest imported value into a text file, but unable to read that value in sqoop import command.

  2. Pronab Jyoti nath says:

    Its works fine for as well for the oracle.Can you please tell how to do the same in sql server.

  3. Yuva Kumar says:

    Suppose, I do not know last value as the sqoop is scheduled to run daily batch process, can I use a query to fetch the max(ID) for the –last-value. Like –last-value select MAX(empid) from emp ?
    Thank you

  4. Hi, How do you append in case of a source table where data is not incremented rather refreshed.Say, the RDBMS holds 10 rows and i have imported to a HDFS location with a sqoop which in turn will be used by a external hive table. Tomorrow, those 10 records in RDBMS is replaced with another 12 records all unique. Now, I want to append those 12 new records into the part-m-00000 file in same location which will consist of 22 records.

  5. Murali Mohan says:

    if we don’t know the last value how do we do incremental append?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s