Monday, 17 October 2016


HDFS Architecture:

           HDFS has a master-slave architecture. Master node is called as Name node and Slave nodes are called as Data nodes. HDFS has one name node and multiple data nodes. Actual data is stored on data nodes and name node is responsible for information about data stored on data nodes.

           High level details are covered in this section. Detailed explanation will be given in later posts.


HDFS Architecture


Master Node:

           HDFS has one name node. This name node is responsible for the information about state of all data nodes. Name node is responsible for below tasks.
  1. Maintaining the information on status(running or down) of the data nodes.
  2. Stores the metadata of the files stored on data nodes.
  3. Sending requests to data nodes for replication of files.
  4. Responding to client requests about the metadata of the files and job status.
Data node:

           Data is stored in the disks where the data node is running. This node is responsible for writing and reading data from each disk.

Client: 

          Client is connected to both name node and data node.Client will post the programs to name node in turn name node will post these programs to data node. 

      For reading or writing files from data nodes, client has to send request to name node.Then name node will check meta data(wich name node is responsible for the file) and give the details about the files to client. Then client directly stores or reads the data from data node.

File parts:

     File are divided into parts and will be stored on different disks. Name node has information about how many parts a file is divided into and where they are stored.

Blocks:

         Like any other file system, HDFS stores data in memory blocks. But the block size of HDFS is very larger than other file systems. Default block size of Apache hadoop is 64 MB and Cloudera hadoop is 128 MB. 

          When there is data with size less than default block size, the whole memory of the block is locked for the data even if it is not needed. So the rest of the space is wasted. Name node will store the meta data of data stored in the blocks, so if the block size is less name node has to store large amount of metadata.


No comments:

Post a Comment