Hadoop is an open-source framework that supports the MapReduce programming model for processing large-scale data across a distributed cluster.
The job execution involves three major components:

User Node
JobTracker (Master Node)

TaskTrackers (Worker Nodes)

Step-by-Step Explanation of the Data Flow:

1. Job Submission (User Node → JobTracker):

The user calls runJob(conf) from the user node.
The user node:
- Requests a new Job ID from the JobTracker.
- Computes input file splits based on the HDFS input data.
- Uploads the following to the JobTracker’s file system:
  - The job’s .JAR file.
  - The configuration (conf) file.
  - The computed input splits.

Then, the job is submitted to JobTracker using submitJob().

2. Task Assignment (JobTracker → TaskTrackers):

The JobTracker:
- Creates Map Tasks for each input split.
- Assigns Map Tasks to TaskTrackers, preferably closest to the data location (data locality optimization).
- Creates Reduce Tasks (number set by user).
- Assigns Reduce Tasks to available TaskTrackers (no data locality is considered for reduce tasks).

3. Task Execution (At TaskTrackers):

Each TaskTracker:
- Copies the job .JAR file from JobTracker’s file system.
- Launches a JVM (Java Virtual Machine).
- Executes Map or Reduce task depending on the assignment.

4. Task Running Check (Heartbeat Monitoring):

TaskTrackers send periodic heartbeat signals to the JobTracker.
- Confirms that the TaskTracker is alive.
- Updates the JobTracker on task status (finished, running, or failed).
- Informs if the TaskTracker is ready for new task assignment.

The data flow in running a MapReduce job at various task trackers using Hadoop Library

Step-by-Step Explanation of the Data Flow:

1. Job Submission (User Node → JobTracker):

2. Task Assignment (JobTracker → TaskTrackers):

3. Task Execution (At TaskTrackers):

4. Task Running Check (Heartbeat Monitoring):

Leave a ReplyCancel Reply

Step-by-Step Explanation of the Data Flow:

1. Job Submission (User Node → JobTracker):

2. Task Assignment (JobTracker → TaskTrackers):

3. Task Execution (At TaskTrackers):

4. Task Running Check (Heartbeat Monitoring):

Related Posts

BSFHK158/258: SFH Core Concepts – Must-Know Points

BSFHK158/258: Scientific Foundations of Health Module wise Question Bank

BSFHK158/258: Scientific Foundations of Health Solved Previous year papers (6 sets)

Leave a ReplyCancel Reply