HDFS

- suitable

  •  very large size, terabyte, petabyte
  •  write once and read many times
  •  handle node failure without noticeable interruption

- not suitable for some applications with,

  •  low-latency data access, HBase
  •  lots of small files
  •  multiple writers and arbitrary modification. only append to the end of file


- Block

replication on block

- namenode and datanode

  • namenode - filesystem namespace image and edit logs which are written to multiple file systems or NFS. they are critical and cannot be lost
  • namenode - block pool having blocks reported from datanodes when system startup
  • secondary namenode - merge edit logs to the namespace image. when primary namenode is down, copy namespace image and edit logs from NFS and make it as primary namenode. Then,
  1. load the image to memory
  2. replay edit logs
  3. receive enough block reports from datanodes to leave safe mode
  • block can be cached in datanode's memory on per-file basis

- HDFS Federation

  • map file namespaces to different namenodes, like /usr and /share
  • block pools in namenodes are not partitioned. they get block reports from same datanodes if they register with the namenodes.
  • client uses mount table to map path to namenodes

- HA (High Availability)

  • primary namenode and standby namenode share the storage for the image and edit logs
  • datanodes send block reports to the both namenodes
  • standby namenode does the merge of edit logs
  • zookeeper to select namenode. failover and fencing

- Java API

  • FileSystem, Path, FSDataInputStream, FSDataOutputStream, FileStatus
  • fis = fs.open, fos = fs.create, fos = fs.append

- Anatomy of file read


  • read block by block.
  • close connection to datanode when its block reading is done
  • network topology need to be set for hadoop, same node -> same rack -> same data center -> different data centers

- anatomy of file write


  • arrange blocks (by DataStreamer), the first replica is in local, the second is in off-rack, the third is in different node in the same rack as the second.
  • write completes on minimum replica requirement, usually 1. data queue and ack queue.

- coherency

  • use hflush or hsync to make content visible to reader. close calls hflush
- parallel copy, distcp. multiple mapper task to make cluster balanced. one mapper task for each file
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章