Undoubtedly, when you begin to investigate Hadoop and the possibility of adding it into your enterprise ecosystem, the interaction between this newfangled technology and your existing RDBMS may pose some challenges. If you’re like me and are coming from a relational database background and are looking to bridge the gap between the old and new technologies, some natural initial questions like ‘How do I import from my RDBMS into Hadoop?’ and ‘How do I export from Hadoop to my RDBMS?’ will arise. The answer to both of these, of course, is Apache Sqoop which provides these functionalities in a pretty easy to use command line format.
I was lucky enough to get a copy of Instant Apache Sqoop to review and found it to be a useful reference for my needs and learning style. It jumps right in and shows you different ways to immediately achieve tasks. Short, fast, and focused certainly describe the contents well, which is perfect for my learning style: I learn by doing and prefer to dive in and get my hands dirty as quickly as possible and this book certainly helps with that goal.
The book is broken down into 9 sections covering some critical functions of Sqoop including Importing/Exporting with Hive, Importing/Exporting with HBase, and incremental importing. Up until now, I’ve only used Sqoop to import directly into HDFS and I expect I’ll return to this book when I try going to Hive or doing an incremental load for the first time.