As discussed in my previous post, Sqoop is a tool designed to transfer data between Hadoop and relational databases.
Incremental imports mode can be used to retrieve only rows newer than some previously-imported set of rows.
Why & When Last-Modified mode ??
- “lastmodified,” works on time-stamped data.
- use this when rows of the source table may be updated
- and each such update will set the value of a last-modified column to the current timestamp
- Rows where the check column holds a timestamp more recent than the timestamp specified with –last-value are imported
- Oracle Timestamp format : ‘DD-Mon-RR HH24:MI:SS.FF’
- Sqoop Timestamp format : ‘YYYY-MM-DD HH24:MI:SS.FF’
- Specify –last-value in Double Quotes. ie, timestamp in this case.
Here I am importing data from Oracle to HDFS.
Consider a table – EMP – shown below. ( SQL Developer view )
Execute the command to import the data from Oracle to HDFS for the FIRST time.
Here, I have opted CTIME (timestamp) column as the check-column
Last Value timestamp is printed on the screen towards the end of the execution of the command.
This value is used in the next incremental import.
HDFS web console view which shows the successful import of Oracle data into HDFS
Now let us modify Row 1 in the table EMP
As you modify the Row, the timestamp column (here CTIME ) should to be modified to the current timestamp
If a new row is added, it should have the current timestamp as the value in the CTIME column.
Execute the Sqoop command to import only the MODIFIED rows to HDFS.
Last Value from the previous import acts as the argument for –last-value
Sqoop checks for changes in data between the last value timestamp (Lower bound value) and Current timestamp (Upper bound value) and imports the modified or newly added rows.
HDFS web console view which shows the successful import of the modified record into HDFS
That’s it..stay tuned..
Have a Nice Day 😉