Hadoop源碼分析筆記(四)：Hadoop文件系統簡介

Hadoop文件系統簡介

Hadoop文件系統，包括Hadoop抽象文件系統以及基於該抽象文件系統的大量具體文件系統，以滿足構建在Hadoop上的各類應用的各種數據訪問需求，是文件系統發展的新階段。

文件系統的實現

一、塊管理

文件的物理結構指文件在存儲設備(如磁盤)上的存取方式。爲了便於管理，設備往往將存儲空間組織成爲具有一定結構的存儲單位。以磁盤爲例，磁盤在邏輯上會劃分爲磁道、柱面和扇區，扇區是磁盤的讀寫單位，也是磁盤讀寫的最小尋址單位，一個磁盤一般是512字節，2009年後引入使用4096字節扇區的磁盤。

塊管理用於記錄存儲塊和文件的關聯關係，對於隨機存儲設備而言一般有如下三種方法來實現塊管理。(1)、連續分配(2)、鏈接表(3)、索引鏈式表

二、目錄管理

目錄作爲文件盒子目錄的容器，其數據由一組結構化的記錄組成，每個記錄描述了集合中的一個文件或者子目錄。記錄提供足夠的信息。

Hadoop抽象文件系統

在Hadoop中，經常需要向流寫入計算結果，或者從流中讀取結果。Java的數據流DataOutputStream和DataInputStrem支持寫入和讀取所有Java基本類型的方法。數據流廣泛使用於Hadoop的實現中，如序列化機制Writable。

爲了提供對不同數據訪問的一致接口，Hadoop借鑑了Linux虛擬文件系統的概念，引入了Hadoop抽象文件系統，並在Hadoop抽象文件系統的基礎上，提供了大量的具體文件系統的實現，滿足構建於Hadoop上應用的各種數據訪問需求。Hadoop文件抽象類在org.apache.hadoop.fs.FileSystem。

與Linux和Java文件API類似，Hadoop抽象文件系統的方法可以分爲兩部分：一部分用於處理文件和目錄的相關事務；另一部分用於讀寫文件數據。

Hadoop抽象文件系統中，用於讀文件數據的流是FSDataInputStream，對應地，寫文件通過抽象類FSDataOutputStream來實現。其中它們的定義如下：

public interface Seekable {
  /**
   * Seek to the given offset from the start of the file.
   * The next read() will be from that location.  Can't
   * seek past the end of the file.
   */
  void seek(long pos) throws IOException;
  
  /**
   * Return the current offset from the start of the file
   */
  long getPos() throws IOException;

  /**
   * Seeks a different copy of the data.  Returns true if 
   * found a new source, false otherwise.
   */
  boolean seekToNewSource(long targetPos) throws IOException;
}


/** Stream that permits positional reading. */
public interface PositionedReadable {
  /**
   * Read upto the specified number of bytes, from a given
   * position within a file, and return the number of bytes read. This does not
   * change the current offset of a file, and is thread-safe.
   */
  public int read(long position, byte[] buffer, int offset, int length)
    throws IOException;
  
  /**
   * Read the specified number of bytes, from a given
   * position within a file. This does not
   * change the current offset of a file, and is thread-safe.
   */
  public void readFully(long position, byte[] buffer, int offset, int length)
    throws IOException;
  
  /**
   * Read number of bytes equalt to the length of the buffer, from a given
   * position within a file. This does not
   * change the current offset of a file, and is thread-safe.
   */
  public void readFully(long position, byte[] buffer) throws IOException;
}

public interface Closeable extends AutoCloseable {

    /**
     * Closes this stream and releases any system resources associated
     * with it. If the stream is already closed then invoking this
     * method has no effect.
     *
     * @throws IOException if an I/O error occurs
     */
    public void close() throws IOException;
}

public class FSDataInputStream extends DataInputStream
    implements Seekable, PositionedReadable, Closeable {
......
}

public class FSDataOutputStream extends DataOutputStream implements Syncable {
  ......
}

Hadoop實現的具體文件系統，主要的有本地的fs.LocalFileSystem、fs.RawLocalFileSystem，HDFS的hdfs.DistributedFileSystem，內存的rs.RamInMemoryFileSystem、fs.InMemoryFileSystem等具體實現。這麼多的文件系統的實現保證了Hadoop應用可以訪問不同環境中的數據。

版權申明：本文部分摘自【蔡斌、陳湘萍】所著【Hadoop技術內幕深入解析Hadoop Common和HDFS架構設計與實現原理】一書，僅作爲學習筆記，用於技術交流，其商業版權由原作者保留，推薦大家購買圖書研究，轉載請保留原作者，謝謝！

劍邑龍泉

發佈了1 篇原創文章 · 獲贊 0 · 訪問量 3萬+

私信關注

Hadoop源碼分析筆記(四)：Hadoop文件系統簡介

Hadoop文件系統簡介

文件系統的實現

Hadoop抽象文件系統

Hadoop源碼分析筆記(三)：Hadoop遠程過程調用

常用排序算法小結

Java IO流系統詳解

Hadoop源碼分析筆記(十一)：數據節點--數據節點整體運行

Hadoop源碼分析筆記(十二)：名字節點--文件系統目錄樹

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結