Mastering Hadoop 3
上QQ阅读APP看书,第一时间看更新

Introduction to YARN job scheduling

In the previous sections, we talked about the YARN architecture and its components. The Resource Manager has two major components; namely, the application manager and the scheduler. The Resource Manager scheduler is responsible for allocating the required resources to an application based on schedule policies. Before YARN, Hadoop used to allocate slots for map and reduce tasks from available memory, which restricts reduce tasks to run on slots allocated for map tasks and the other way around. YARN does not define map and reduce slots initially. Based on a request, it launches containers for tasks. This means that if any free container is available, it will be used for map or reduce tasks. As previously discussed in this chapter, the scheduler will not perform monitoring or status tracking for the any application. The scheduler receives requests from per application application masters with the resources requirement detail and executes its scheduling function. 
Hadoop provides us with the opportunity to run many applications at a time and it is important to effectively utilize the cluster's memory. Selecting the correct scheduling strategy is not easy and YARN provides a configurable scheduling policy that allows us to choose the right strategy based on an application's need. There are by default three schedulers available in YARN, which are as follows:

  • FIFO scheduler 
  • Capacity scheduler
  • Fair scheduler

We will study each of them in the following sections.