Yet Another Resource Navigator
In Hadoop 1.0 version, the resource management was also handled by the MapReduce layer. There was just a master tracker that kept track of all the jobs, leading to a bottleneck in terms of processing and scheduling. The scalability of the Hadoop architecture was much reduced so was the efficiency. To solve this, YARN was implemented in Hadoop 2.0, which was responsible for resource management and scheduling of job tasks. With the creation of YARN, scalability, fault tolerance and speed was achieved.
There are multiple components of YARN:
-- Container: Containers are the physical resources such as CPU, RAM, hard disk available to run a task. CLC or Container Launch Context is used to specify how much memory can be allotted and to invoke a container.
-- Application Master: It is responsible for tracking progress of an application, creation of CLC to start and stop containers, and negotiation for resources from the Resource Manager. It also sends heartbeat signals to the Resource Manager to update the status of the application.
-- Node Manager: It manages the containers in its node. It sends heartbeat signals regarding each of its containers whether they are in process, working or not. Node Manager assigns containers when asked for by the Application Master. It manages and keeps track of work performed in the containers.
-- Resource Manager: It is responsible for starting and stopping applications on request of clients. It manages allocation of resources and communications to node managers. In case of failure, it can allocate new resources and start jobs as required. It ensures optimal usage of resources and timely updates to the client. It has two main components: scheduler and application manager.