大數據技術思想入門(三):分佈式文件存儲的流程

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"如果你不喜歡閱讀文字的話,可以選擇滑到最後看 "},{"type":"text","marks":[{"type":"size","attrs":{"size":14}},{"type":"strong"}],"text":"視頻講解"},{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":" 喲~~~"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"進程和 RPC"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"在上一篇文章中,我們講解了要解決好大數據集的存儲問題,需要引入一個主從結構的集羣,其中,主服務器用於存儲元數據,從服務器用於存儲真正的數據塊數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"在這裏,我們還需要了解兩點:"}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"既然 master 和 slave 服務器要管理存儲的服務器,那麼,必須在服務器中啓動相應的進程,比如在主服務器上需要啓動一個管理元數據的進程,在從服務器中需要啓動一個管理數據塊數據存儲的進程。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"master 服務器和 slave 服務器上的進程間的通訊是由 RPC 完成,RPC 本質上就是 Socket,所以在不同服務器上的兩個進程間的通訊是由 Socket 完成的,比如 slave 服務器上的進程需要將當前的 slave 機器的元數據通過 Socket 傳遞給 master 服務器上的進程,這樣 master 服務器上的進程才能知道有哪些 slave 服務器,每臺服務器的情況是什麼樣的。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"分佈式文件存儲的流程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"假設現在有一個主從集羣,從節點有 3 個,每個節點的磁盤容量是 10 TB,每個數據塊的大小假設設置爲 128 MB,每個數據塊備份存儲 3 份。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"服務器也可以稱爲節點"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"接下來我們想將一個大數據集存儲到上面的主從集羣中,那麼這個存儲的流程步驟是什麼樣的呢?爲了能講清楚這個流程步驟,我們假設這個大數據集存儲在服務器 A 中,它的大小是 422 MB,我們將這個大數據集存儲在文件名爲/douma/data/text.txt 的分佈式文件中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"相對於分佈式主從集羣,這裏的服務器 A 可以稱爲客戶端 (也就是向主從集羣發起請求的節點機器)。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"首先,客戶端向主節點發起創建文件的請求,這個請求中包含的主要信息是:"}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"文件的名稱,即 /douma/data/text.txt"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"文件的大小,即 422 MB"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"主節點上的進程收到這個請求後,會做如下的處理:"}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"主節點上的進程收到請求後,就判斷這個文件是否已經存在,如果不存在則創建這個文件。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"根據文件的大小和設置好的每個數據塊大小計算這個文件需要切分成多少個數據塊,這裏的數據塊數量等於 422 / 128 = 4 塊(向上取整的結果)"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"主節點進程根據現在的所有從節點的磁盤剩餘情況,來確定上面計算出來的 4 個數據塊及其備份數據塊的分別存儲在哪些從節點上"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"主節點做完上面的事情之後,就會將以下的信息告訴客戶端:"}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"文件需要被切割成 4 個數據塊"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"數據塊 1 存儲在 slave1 上,它的其他兩個備份數據塊分別存儲在 slave2 和 slave3"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"數據塊 2 存儲在 slave1 上,它的其他兩個備份數據塊分別存儲在 slave3 和 slave4"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"數據塊 3 存儲在 slave2 上,它的其他兩個備份數據塊分別存儲在 slave1 和 slave3"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":5,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"數據塊 4 存儲在 slave4 上,它的其他兩個備份數據塊分別存儲在 slave2 和 slave3"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"客戶端接到主節點的響應信息後,就將數據切分成數據塊,然後將對應的數據塊直接寫到對應的 slave 節點上,比如數據塊 1 的的寫入流程是:"}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"因爲數據塊 1 是需要存儲在 slave1 上的,所以客戶端先和 slave1 建立連接,然後通過 Socket 將數據塊的數據傳輸到 slave1 上進行存儲,這個數據塊的數據最終存儲在 slave1 的磁盤中,當數據塊存儲好後,需要向主節點報備這個數據塊已經在 slave1 上存儲好了。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"當數據塊 1 的第一個備份在 slave1 上存儲好後,然後將這個數據塊拷貝一份通過 Socket 分別傳輸給 slave2 和 slave3 ,並且向主節點報備這兩個備份數據塊已經分別在 slave2 和 slave3 上存儲好。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"當數據塊所有的備份 (即 3 份) 都存儲好後,將成功的結果信息告訴客戶端,這樣客戶端可以關閉數據流,從而結束了分佈式文件的存儲"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"分佈式文件的讀取"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"以上說的是分佈式文件寫的流程,接下來,我們看下分佈式文件的讀取過程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"首先客戶端向主節點發起讀取請求,請求的信息就是需要讀取的文件名稱,比如讀取文件 /douma/data/text.txt 。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"主節點接收到客戶端的讀取請求後,會做如下的事情:"}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"現根據文件名判斷這個文件是否存在,如果不存在則直接告訴客戶端,說文件不存在"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"文件存在後,則拿到這個文件對應的數據塊的元數據信息,包括有多少個數據塊,每個數據塊存儲在哪一臺 slave 節點上"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":14}}],"text":"主節點將文件對應的數據塊的元數據返回給客戶端,客戶端接收到了這些響應信息後,就分別從對應的 slave 服務器中讀取對應的數據塊,然後將數據塊組合起來就是這個文件的所有數據了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"video","attrs":{"videoHTML":""}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"系統學習大數據技術:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://link.zhihu.com/?target=http%3A//douma.ke.qq.com/","title":null},"content":[{"type":"text","text":"抖碼課堂騰訊課堂官網​douma.ke.qq.com"}]}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ba/ba25ee8f31d406f532b9d307fa3f99e7.png","alt":"圖標","title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://link.zhihu.com/?target=http%3A//douma.ke.qq.com/","title":null}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章