
Checkpoint using a secondary NameNode
Checkpoint is the process of merging an fsimage with edit logs by applying all the actions of the edit log on the fsimage. This process is necessary to make sure that the edit log does not grow too large. Let's go into further details of how the checkpoint process works in Hadoop.
In the previous section, we discussed the fsimage and the edit log file. NameNode loads the fsimage into memory when it starts and then applies edits from the edit log file to the fsimage. Once this process is complete, it then writes a new fsimage file to the system. At the end of this operation, the edit log file does not have anything in it. This process starts only during NameNode startup—it does not perform merging operations while the NameNode is live and is busy serving a request. If the NameNode is up for a long period of time, then the edit log file could get too big. Therefore, we may need to have a service that periodically merges edit logs and the fsimage file. The Secondary NameNode does the job of merging the fsimage and the edit log file. The interval for the checkpoint operation and the number of transactions in the edit log is controlled by two configuration parameters: dfs.namenode.checkpoint.period for intervals and dfs.namenode.checkpoint.txns for the number of transactions. This means that if the limit is reached, then the checkpoint process will be forcefully started, even if the interval period has not been reached. The secondary NameNode also stores the latest fsimage file so that it can be used if anything is required.