7.B) What is Hive? Explain Hive architecture. – 10 Marks
Answer:-
Hive was created by Facebook. Hive is a data warehousing tool and is also a data store on the top of Hadoop. An enterprise uses a data warehouse as large data repositories that are designed to enable the tracking, managing, and analyzing the data.
Components of Hive architecture are as follows:
- Hive Server (Thrift):
- An optional service that allows remote clients to submit requests to Hive and retrieve results.
- It supports multiple programming languages for interacting with Hive via the Thrift protocol.
- Thrift Server:
- Exposes a simple client API to execute HiveQL (Hive Query Language) statements.
- Acts as an interface between Hive and external clients.
- Hive CLI (Command Line Interface):
- A popular interface to interact with Hive.
- It allows you to run Hive queries from the command line.
- Runs in local mode using local storage instead of HDFS when interacting with a Hadoop cluster.
- Web Interface:
- Hive can also be accessed through a web browser.
- This requires the Hive Web Interface (HWI) server to be running on a designated node.
- You can access it through a URL:
http://<hadoop_host>:<port_number>/hwi
.
- Metastore:
- The system catalog for Hive.
- Stores metadata about tables, databases, columns, data types, and their mappings to HDFS.
- The Metastore is critical for other Hive components to interact with the actual data.
- Hive Driver:
- Manages the lifecycle of a HiveQL statement.
- It oversees the compilation, optimization, and execution of queries.
- Acts as the central controller for executing queries in Hive.