3.B) Explain with a neat diagram the components of HDFS. – 8 Marks
Answer:-
HDFS (Hadoop Distributed File System) Components
HDFS is a core component of Hadoop, designed to store and manage vast amounts of data reliably across distributed clusters. It follows a master-slave architecture comprising:
Components of HDFS:
- NameNode (Master):
- Stores metadata about the file system (e.g., file paths, permissions, and block locations).
- Handles file system namespace operations like opening, closing, renaming files, and directories.
- Maps blocks to DataNodes and manages replication.
- Detects and handles DataNode failures.
- DataNodes (Slaves):
- Store the actual data blocks.
- Perform read and write requests from clients as instructed by the NameNode.
- Regularly send heartbeat signals and block reports to the NameNode for status updates.
- Client:
- Interfaces with the NameNode for metadata and retrieves/stores data directly with DataNodes.
- Communicates with the NameNode to create files, read/write data, and track block storage.
- Replication:
- Ensures fault tolerance by replicating data blocks across multiple DataNodes.
- Default replication factor is 3, ensuring data is available even if a node fails.
- Racks:
- HDFS uses rack awareness to improve reliability and reduce data loss. Replicas are spread across different racks when possible.
Data Write Process:
- File Creation:
- The client requests the NameNode to create a file.
- The NameNode determines the number of blocks and the DataNodes for storage.
- Block Writing:
- Data blocks are written to the assigned DataNodes.
- Replication of blocks occurs across other DataNodes, ideally in different racks.
- Acknowledgment:
- DataNodes acknowledge successful replication.
- The client notifies the NameNode of operation completion.