What is Hive? Explain Hive architecture

7.B) What is Hive? Explain Hive architecture. – 10 Marks

Answer:-

Hive was created by Facebook. Hive is a data warehousing tool and is also a data store on the top of Hadoop. An enterprise uses a data warehouse as large data repositories that are designed to enable the tracking, managing, and analyzing the data.

Components of Hive architecture are as follows:

  1. Hive Server (Thrift):
    • An optional service that allows remote clients to submit requests to Hive and retrieve results.
    • It supports multiple programming languages for interacting with Hive via the Thrift protocol.
  2. Thrift Server:
    • Exposes a simple client API to execute HiveQL (Hive Query Language) statements.
    • Acts as an interface between Hive and external clients.
  3. Hive CLI (Command Line Interface):
    • A popular interface to interact with Hive.
    • It allows you to run Hive queries from the command line.
    • Runs in local mode using local storage instead of HDFS when interacting with a Hadoop cluster.
  4. Web Interface:
    • Hive can also be accessed through a web browser.
    • This requires the Hive Web Interface (HWI) server to be running on a designated node.
    • You can access it through a URL: http://<hadoop_host>:<port_number>/hwi.
  5. Metastore:
    • The system catalog for Hive.
    • Stores metadata about tables, databases, columns, data types, and their mappings to HDFS.
    • The Metastore is critical for other Hive components to interact with the actual data.
  6. Hive Driver:
    • Manages the lifecycle of a HiveQL statement.
    • It oversees the compilation, optimization, and execution of queries.
    • Acts as the central controller for executing queries in Hive.

Leave a Reply

Your email address will not be published. Required fields are marked *