4.a) Explain Apache Sqoop Import and Export methods.
Answer:
Apache Sqoop:
Sqoop is a tool designed to transfer data between Hadoop and relational databases.
Sqoop is used to
-import data from a relational database management system (RDBMS) into the Hadoop Distributed File System(HDFS),
- transform the data in Hadoop and
- export the data back into an RDBMS.
Sqoop import method:
Sqoop import
The data import is done in two steps :
- Sqoop examines the database to gather the necessary metadata for the data to be imported.
- Map-only Hadoop job: Transfers the actual data using the metadata.
- The imported data is stored in HDFS directory
- Sqoop will use the database name for the directory, or the user can specify any alternative directory where the files should be populated. By default, these files contain comma delimited fields, with new lines separating different records.
Sqoop Export method:
Data export from the cluster works in a similar fashion. The export is done in two steps :
- examine the database for metadata.
- Map-only Hadoop job to write the data to the database.
Sqoop divides the input data set into splits, then uses individual map tasks to push the splits to the database.