
Opportunistic containers in Hadoop 3.x
Containers are allocated to nodes by the scheduler only when there is sufficient unallocated resources at a node. YARN guarantees that once the application master dispatches a container to a node, the execution will immediately start. The execution of a container will only be completed if there is no violation of fairness or capacity, which means until some other containers ask for preemption of resources from the node, the container is guaranteed to run to completion.
The current container execution design allows an efficient task execution but it has two primary limitations, which are as follows:
- Heartbeat delay: The Node Manager at regular intervals sends heartbeats to its resource manager and the heartbeat request also contains the resource metrics of a Node Manager. If any container running on a Node Manager finishes its execution then the information is sent as part of request in the next heartbeat, which means that the Resource Manager knows that there are available resources on a Node Manager to launch a new container. It will schedule a new container at that node and the application master of the application who requested the resource gets notified by the Resource Manager and the application master then launches the container at the node. The delay between the steps discussed previously can be longer and until then, the resources will be idle.
- Resource allocation and utilization: The resources allocated to a container by the Resource Manager can be significantly higher than what is actually being utilized by the container. For example, 6 GB of a container is only using 3 GB, but this doesn't mean that the container will not use more memory than 3 GB. These problems should be addressed in such a way that a container should only use the utilized memory and get more memory whenever it requires it in the future.
To address the preceding limitation, YARN has introduced a new type of container called an opportunist container. Opportunist containers can be sent to a Node Manager even if there are no sufficient resources available on the Node Manager for processing the request. The opportunist container will be queued until there are resources available for its execution. The opportunist container priority is lower than the guaranteed container and because of this an opportunist container can be killed when there is resources requested for guaranteed containers. The applications can be configured to use both opportunist and guaranteed containers for execution of its tasks.
There are two ways in which opportunist containers are allocated to an application, which are as follows:
-
- Centralized: The containers are allocated through a YARN resource manager. The application master requests a container from the Resource Manager. The request for a guaranteed container goes to the ApplicatonMasterService and gets handled by the scheduler. The request for an opportunist container is handled by the OpportunisticContainerAllocator, which schedules a container to a node.
- Distributed: The resources are allocated through the local scheduler, which is available at each Node Manager. The current version of YARN has AMRMProxyService at every node. The AMRMProxyService works as a proxy between the Resource Manager and the application master. An application master does not directly interact with a resource manager. Instead, it interacts with a AMRMProxyService of the same node where it is running. In the event of an application master, a new application master is launched and YARN allocates a AMRMToken to a new application master.
The number of currently running guaranteed containers, opportunist containers, and queued opportunist containers at every node are updated to the Resource Manager by the Node Manager during the heartbeat. The Resource Manager collects the information about each node and determines the least busy node. The default allocation is centralized, hence in such cases, allocation happens centrally.