As you all know, Sqoop is a tool designed to transfer data between Hadoop and relational databases.
Incremental imports mode can be used to retrieve only rows newer than some previously-imported set of rows.
Why Append mode ??
- works for numerical data that is incrementing over time, such as auto-increment keys,
- when importing a table where new rows are continually being added with increasing row id values
- whereas “lastmodified,” works on time-stamped data.
Arguments which control incremental imports :-
- –check-column (col) Specifies the column to be examined when determining which rows to import.
- (the column should not be of type CHAR/NCHAR/VARCHAR/VARNCHAR/ LONGVARCHAR/LONGNVARCHAR)
- –incremental (mode) Specifies how Sqoop determines which rows are new.
- Legal values for mode include append and lastmodified.
- append mode when importing a table where new rows are continually being added with increasing row id values
- specify the column containing the row’s id with –check-column
- Sqoop imports rows where the check column has a value greater than the one specified with –last-value
- –last-value (value) Specifies the maximum value of the check column from the previous import
In this case I am importing data from Oracle to HDFS.
Consider a table – EMP – shown below. ( SQL Developer view )
Execute the command to import the data from Oracle to HDFS for the FIRST time.
Here, I have opted ID column as
Last Value is printed on the screen towards the end of the execution of the command.
This value is used in the next incremental import.
HDFS web console view which shows the successful import of Oracle data into HDFS
Now let us add a new row to the table.
Execute the Sqoop command to import only the newly added rows to HDFS.
Last Value from the previous import acts as the argument for –last-value
HDFS web console view which shows the addition of a new part file in HDFS
HDFS web console view which shows the successful import of new record into HDFS
That’s it..stay tuned..
Have a Nice Day 😉