大數據技術思想入門(四):分佈式文件的元數據是怎麼存儲的

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"如果你不喜歡閱讀文字的話,可以選擇滑到最後看 視頻講解 喲~~~"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"我們知道主節點主要存儲的元數據包括:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"所有從節點的元數據信息,包括從節點的數量、每個節點的 IP 地址以及使用情況等信息"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"所有分佈式文件的元數據信息,包括文件名、大小等基礎信息,還有文件對應的數據塊的元數據信息"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"對於存儲的從節點的元數據信息很好理解。就是當從節點啓動的時候,會將自己的 IP 地址、自己的磁盤總大小以及使用情況告訴主節點。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"對於文件的元數據的存儲和管理相對來說複雜點,這篇文章我們就要說明白主節點中存儲的文件元數據。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"文件元數據數據結構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":12}}],"text":"首先我們需要搞明白的是文件元數據是以什麼樣的數據結構組織起來的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":12}}],"text":"分佈式文件的元數據主要包括兩個信息:"}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":12}}],"text":"文件名"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":12}}],"text":"文件對應的基本信息:文件大小、文件對應哪些數據塊"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":12}}],"text":"每個數據塊的元數據又包括:"}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":12}}],"text":"數據塊的唯一 id"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":12}}],"text":"數據塊的基本信息:數據塊大小、數據塊的備份數、每個數據塊存儲在哪臺 slave 服務器中"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":12}}],"text":"所以在主節點中肯定需要兩個數據結構來存儲兩個部分的信息:"}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":12}}],"text":"文件名及其對應的文件基本信息"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":12}}],"text":"數據塊唯一 id 及其對應的數據塊的基本信息"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":12}}],"text":"我們來舉一個具體的例子,我們現在將下面的兩個文件存儲在集羣中:"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":12}}],"text":"/douma/tmp/test/text.txt  這個文件被劃分成 3 個數據塊,分別是 b1、b2、b3"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":12}}],"text":"/douma/data/product/word.txt 這個文件被劃分成 2 個數據塊 ,分別是 b4、b5"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":12}}],"text":"那麼在主節點中的元數據的存儲形式應該如下圖所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/eb/eb435a9fce888be7c06fc97f80f2a91c.png","alt":"","title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"從上圖中可以看出,文件名實際上被組織成了一棵樹,這棵樹我們一般稱爲文件目錄樹,其實這個和 Linux 中的文件目錄樹是一個意思的,在目錄樹中分爲兩種類型的節點:"}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"NodeDirectory (表示文件目錄),它一般會有若干個孩子節點"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"NodeFile (表示文件),它具有幾個屬性,那就是這個文件的基本信息(文件大小等)和文件對應的數據塊信息"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"如果要查詢文件對應的數據塊信息的話,就可以根據這個文件對應的數據塊 id 去數據塊元數據中查找了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"總結,在主節點中關於文件元數據的管理主要就是兩種數據的存儲:"}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"文件目錄樹,以及文件數據塊的索引,即每個文件對應的數據塊列表"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"數據塊與 slave 服務器的關係,即某一個數據塊存在哪些 slave 節點上"}]}]}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"文件元數據存儲在主節點內存"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"搞明白了文件元數據的存儲結構,接下來我們來看下這個文件元數據到底存儲在主節點的內存好點,還是存儲在主節點的磁盤中好點呢?要搞明白這個問題,我們需要弄明白文件元數據本身的特點及其被訪問的特點。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"文件元數據本身的特點很簡單,那就是數據量不大。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"而文件元數據被訪問的特點:"}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"訪問很頻繁,不僅客戶端會訪問文件元數據,slave 節點也是會訪問文件元數據的"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"會被實時的更新,因爲文件會實時的創建、刪除等,所以文件元數據也是需要實時被更新的"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"鑑於以上的特點,那麼文件元數據最好就是存儲在主節點的內存中,因爲查詢內存中的數據是很快的,而且內存中的數據的更新也是實時的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"所以在主節點的內存中會存在兩個 Map :"}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"Map 這個用於存儲文件名及其對應的文件的基本信息"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"Map 這個用於存儲數據塊唯一 id 及其對應的數據塊的基本信息"}]}]}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"保證元數據不丟失"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"上面我們說到,文件元數據是存儲在主節點的內存中的,內存的查詢和更新速度是非常快,但是數據存儲在內存有個缺點,那就是一旦主節點掛了,那麼存儲在主節點內存中的所有文件元數據都會丟失了,根本就找不到了,這個問題就非常的嚴重了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"我們可以通過將主節點內存中的文件元數據保存一份到磁盤中,也就是說對於文件元數據,內存中存儲一份,磁盤中保持一份,這樣既能保證文件元數據的訪問性能,又能保證這些元數據不會丟失。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"怎麼做到將內存中的元數據保存到磁盤中呢?爲了減輕主節點的壓力,我們可以引入另一個輔助服務器,這個輔助服務器可以將主節點的內存中元數據拉取過來,然後將這些數據組織成可以存儲到磁盤中的數據結構,最後將轉換後的數據結構保存到主節點的磁盤中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"以上只是一個解決元數據丟失這個問題的思路,至於細節我們後續會再詳細講解。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"video","attrs":{"videoHTML":""}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"系統學習大數據技術:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://link.zhihu.com/?target=http%3A//douma.ke.qq.com/","title":null},"content":[{"type":"text","text":"抖碼課堂騰訊課堂官網​douma.ke.qq.com"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章