上QQ阅读APP看书，第一时间看更新

Points to remember

We have covered the HDFS in detail and the following are a few points to remember:

HDFS consists of two main components: NameNode and DataNode. NameNode is a master node that stores metadata information, whereas DataNodes are slave nodes that store file blocks.
Secondary NameNode is responsible for performing checkpoint operations in which edit log changes are applied to fsimage. This is also known as a checkpoint node.
Files in HDFS are split into blocks and blocks are replicated across a number of DataNodes to ensure fault tolerance. The replication factor and block size are configurable.

HDFS Balancer is used to distribute data in an equal fashion between all DataNodes. It is a good practice to run balancer whenever a new DataNode is added and schedule a job to run balancer at regular intervals.
In Hadoop 3, high availability can now have more than two NameNodes running at a time. If an active NameNode fails, a new NameNode will be elected from an other NameNode and will become an active NameNode.
Quorum Journal Manager writes namespace modifications into multiple JournalNodes. These changes are then read by the Standby NameNode and they apply these changes to their fsimage file.
Erasure coding is a new feature that was introduced in Hadoop 3, which reduces storage overhead by up to 50%. The replication factor in HDFS costs us 200% more space. Erasure coding provides the same durability guarantee using less disk storage.