Mastering Hadoop 3
上QQ阅读APP看书,第一时间看更新

Docker containers in YARN

Docker has been widely used as a light weighted container for various applications. YARN is now widely used as a resource manager for diverse applications and it uses Linux to launch containers. YARN has added support for Docker containerization. The Docker image can be specified to run the YARN container and the Docker container has custom libraries to run the application.

The Docker environment is completely different from those of a Node Manager. The user does not need to worry about additional software or modules required to run the application and can focus on running and fine tuning the application. Different versions of the same application can be run in parallel and they will be completely isolated from one another.

The ContainerExecutor abstraction provides four implementations that are responsible for providing the resources required for running the application, setting up the environment, and managing the life cycle of containers, which are as follows:

  • DefaultContainerExecutor 
  • LinuxContainerExecutor
  • WindowsSecureContainerExecutor
  • DocketContainerExecutor

The DockerContainerExecutor allows the Node Manager to launch YARN containers into Docker containers. YARN has added support for Docker commands to allow the Node Manager to launch, monitor, and clean up Docker containers, the same as it does for any YARN container. Using the DockerContainerExecutor is not recommended because we can only specify one ContainerExecutor per Node Manager and, hence, we will not be able to launch any other job such as Spark, tez, or MapReduce. DockerContainerExecutor will be removed in the future Hadoop release.