This BigData,Hadoop組成及生態

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"text":"引言","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着科技的發展,我們在網上留下的數據越來越多,大到網上購物、商品交易,小到瀏覽網頁、微信聊天、手機自動記錄日常行程等,可以說,在如今的生活裏,只要你還在,你就會每時每刻產生數據,但是這些數據能稱爲大數據麼?不,這些還不能稱爲大數據。那麼大數據數據到底是什麼呢?","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/99/992fbf928758c3d011d94eca2bd41ac3.gif","alt":null,"title":"","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"text":"大數據概述","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"定義","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"百度百科的定義","attrs":{}},{"type":"text","text":":大數據是指","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"無法在一定時間範圍內","attrs":{}}],"marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"attrs":{}},{"type":"text","text":"用常規的軟件工具進行捕捉、管理和處理的","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"大數據集合","attrs":{}}],"marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"attrs":{}},{"type":"text","text":",是需要新處理模式才能具有更強的決策力、洞察發現力和流程優化能力的海量、高增長率和多樣化的信息資產。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以總結出大數據的特點:","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據量大,需用採取某些工具才能採集,進而做分析計算。","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"舉個例子來解釋下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Simon是個成龍影片的愛好者,他在網站上搜集到了100G的成龍經典電影**(採集)","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":",爲了收藏這些電影,他決定把這些電影都存儲起起來","attrs":{}},{"type":"text","text":"(存儲)","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":",有一天Simon突然想看成龍的主演的“十二生肖”,這部電影就在他存儲的100G影片中,所以Simon按照上映年份","attrs":{}},{"type":"text","text":"(分析計算)**找出了這部電影並觀看了它。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此,我們可以把大數據的定義凝練如下:","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大數據主要解決的是海量數據的","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"採集","attrs":{}},{"type":"text","text":"、","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"存儲","attrs":{}},{"type":"text","text":"和","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"分析計算問題","attrs":{}},{"type":"text","text":"。","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所謂大數據,數據的規模是怎麼衡量呢?","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/19/19d8c29c27fb31f293b8d7175a7f5491.gif","alt":null,"title":"","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"數據單位","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"衡量數據的規模,需要先認識數據的單位。按照數據單位的從小到大劃分,依次爲:bit、Byte、KB、MB、GB、","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"TB","attrs":{}},{"type":"text","text":"、","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"PB","attrs":{}},{"type":"text","text":"、","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"EB","attrs":{}},{"type":"text","text":"、ZB、YB、BB、NB、DB","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"單位之間的換算如下:","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1 Byte =8 bit","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1 KB = 1,024 Bytes = 8192 bit","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1 MB = 1,024 KB = 1,048,576 Bytes","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1 GB = 1,024 MB = 1,048,576 KB","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1 TB = 1,024 GB = 1,048,576 MB","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1 PB = 1,024 TB = 1,048,576 GB","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於我們來講,接觸最多的可能是KB、MB、GB等單位。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是由於計算機計算性能的不同,如果放在上古時代的計算機,讓它們處理","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"GB","attrs":{}},{"type":"text","text":"級別的數據就已經算是極限了;對於現在內存普遍是128G的服務器,多臺並行處理","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"EB","attrs":{}},{"type":"text","text":"級別的數據也不在話下。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"數據意義與價值","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每個時代都有對數據的定義,關鍵的目標是要挖掘出數據背後的","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"意義和價值","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據量那麼大,格式又各不相同,我們處理數據的意義何在呢?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"試想下,再大的數據也是由許許多多的小數據組成,","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"大數據","attrs":{}}],"marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"attrs":{}},{"type":"text","text":"可以認爲是","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"數據的集合","attrs":{}}],"marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"attrs":{}},{"type":"text","text":",我們可以從這些數據中推理出一個","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"近似客觀的規律","attrs":{}}],"marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"attrs":{}},{"type":"text","text":",利用這個規律可以預測產生數據的本體下一次要發生的","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"概率","attrs":{}}],"marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"attrs":{}},{"type":"text","text":"。如一個用戶經常在某電影網站上觀看成龍的電影,那麼當該用戶下一次訪問電影網站時,將有關成龍的電影放在推薦列表中比較靠前的位置,因爲我們通過用戶的瀏覽的數據發現他很喜歡成龍的電影,並且相信相信該用戶的興趣短時內不會發生變化。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這是大數據在生活中的一個簡單應用--","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"挖掘用戶的偏好,建立推薦模型","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據時代,我們的數據具有","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"海量,多樣性,實時性,不確定性","attrs":{}},{"type":"text","text":"等特點。那麼,我們要存儲,處理這些特點的海量數據,用什麼樣的方式或說什麼樣的平臺比較適合呢,經過多年的技術發展和自然選擇,Hadoop分佈式模型脫穎而出。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此,學習大數據肯定繞不開","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"Hadoop","attrs":{}}],"marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"attrs":{}},{"type":"text","text":",但是對於接觸大數據時間較短或者尚未接觸過大數據的同學來說,如果問他們我們應該學習Hadoop的那些內容,","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"分佈式存儲","attrs":{}}],"marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"attrs":{}},{"type":"text","text":"和","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"計算","attrs":{}}],"marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"attrs":{}},{"type":"text","text":"一定會說出來,但是僅僅這兩個概念還是太籠統了,那麼我們應該怎樣把控Hadoop的學習呢,莫慌,且聽","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"Simon郎","attrs":{}}],"marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"attrs":{}},{"type":"text","text":"慢慢道來。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/39/39e95f95862e47771bb143f454827ccd.gif","alt":null,"title":"","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"text":"Hadoop概述","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hadoop是一個由Apache基金會所開發的","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"分佈式系統基礎架構","attrs":{}},{"type":"text","text":",它主要解決的是海量數據的","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"存儲","attrs":{}},{"type":"text","text":"和海量數據的","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"分析計算","attrs":{}},{"type":"text","text":"問題,從廣義上來說,Hadoop通常是指","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Hadoop生態圈","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a4/a4464576c5dce9b8ad321dc8f239367f.png","alt":"image-20210322164226320","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們先看","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Hadoop的組成結構","attrs":{}},{"type":"text","text":",然後介紹","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Hadoop生態圈","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Hadoop組成","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hadoop的組成結構在","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1.x","attrs":{}},{"type":"text","text":"和","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"2/3.x","attrs":{}},{"type":"text","text":"有所不同,如下圖所示","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/84/84a66a77b0614209644aff993ebcd3e2.png","alt":"image-20210520215604023","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hadoop主要是由:就","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"計算","attrs":{}},{"type":"text","text":"、","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"資源調度","attrs":{}},{"type":"text","text":"、","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"數據存儲","attrs":{}},{"type":"text","text":"和","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"輔助工具","attrs":{}},{"type":"text","text":"組成。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"在Hadoop1.x時代,Hadoop中的MapReduce","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"同時處理業務邏輯運算和資源調度","attrs":{}}],"marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}},{"type":"strong"}],"attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":",耦合性較大。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"在Hadoop2/3.x時代,增加了","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"Yarn","attrs":{}}],"marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}},{"type":"strong"}],"attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":",","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"Yarn只負責資源的調度,MapReduce只負責運算","attrs":{}}],"marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}},{"type":"strong"}],"attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Note","attrs":{}},{"type":"text","text":":資源調度指的是CPU、內存、服務器計算的選擇等","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來,我們分別介紹:","attrs":{}}]},{"type":"blockquote","content":[{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"用於存儲的HDFS","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"用於資源調度的YARN","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"用於計算的MapReduce","attrs":{}}]}]}],"attrs":{}}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"HDFS架構概述","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hadoop Distributed File System,簡稱","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"HDFS","attrs":{}},{"type":"text","text":",是一個分佈式文件系統 ,HDFS的結構圖如下:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/9c/9c7e669d96e518146cf574f820e0cc61.png","alt":"image-20210520221117080","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"HDFS架構包含一個","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"NameNode、DataNode和備用SecondaryNode","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"NameNode(nn)","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"NameNode(nn)就是Master,它是一個主管,管理者,它主要有以下功能:","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"①管理HDFS的名稱空間","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"②配置副本策略(如數據配置幾個副本)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"③管理數據塊(block)的映射信息","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據存在Datanode的哪些數據塊中,分佈式存儲的","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"④處理客戶端的請求","attrs":{}}]}],"attrs":{}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"DataNode(dn)","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DataNode就是Slave,NameNode下達命令,DataNode執行實際的操作。","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"①存儲實際的數據塊","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"②執行數據塊的讀/寫操作","attrs":{}}]}],"attrs":{}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Secondary NameNode(2nn)","attrs":{}}]}]}],"attrs":{}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"①輔助NameNode,分擔其工作量,比如定期合併Fsimage和Edits,並推送給NameNode","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"②在緊急情況下,可輔助恢復NameNode。","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"NOTE","attrs":{}},{"type":"text","text":":Secondary NameNode並非NameNode的熱備,即當NameNode掛掉的時候,它並不 能馬上替換NameNode並提供服務。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"YARN架構概述","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Yet Another Resource Negotiator 簡稱","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"YARN","attrs":{}},{"type":"text","text":",它是Hadoop的資源管理器,負責爲運算程序提供服務器運算資源,相當於一個分佈式的","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"操作系統平臺","attrs":{}},{"type":"text","text":",而 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"MapReduce","attrs":{}},{"type":"text","text":" 等運算程序則相當於運行於","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"操作系統之上的應用程序","attrs":{}},{"type":"text","text":"。如下圖所示:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/17/173d45121f7e5872f101ce59cd03bc51.png","alt":"image-20210520230922949","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"YARN 主要由 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"ResourceManager","attrs":{}},{"type":"text","text":"、","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"NodeManager","attrs":{}},{"type":"text","text":"、","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"ApplicationMaster","attrs":{}},{"type":"text","text":" 和 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Container","attrs":{}},{"type":"text","text":" 等組件構成。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"ResourceManager(RM):整個集羣資源(內存、cpu等)的老大","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"NodeManager(NM):單個節點服務器資源的老大","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"ApplicationMaster(AM):單個任務運行的老大","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Container:容器,相當於一臺獨立的服務器,裏面封裝了任務運行所需的資源,如內存、CPU、磁盤、網絡等。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"NOTE","attrs":{}},{"type":"text","text":":","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ResourceManager是一個Master,在每一個子節點以下都有一個NodeManager,由RM給NM分配資源。在每一個節點中還會有ApplicationMaster(後面簡稱AM)的東西。他會負責與RM通信以獲取資源,還會與NM通信來啓動或者是停止任務。","attrs":{}}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"MapReduce架構概述","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MapReduce是一個","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"分佈式運算程序","attrs":{}},{"type":"text","text":"的編程框架,是用戶開發“基於Hadoop的數據分析應用”的核心框架。他的核心功能是將用","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"戶編寫的業務邏輯代碼","attrs":{}},{"type":"text","text":"和","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"自帶默認組件","attrs":{}},{"type":"text","text":"整合成一個完整的分佈式運算程序,併發運行在一個Hadoop集羣上。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"MapReduce","attrs":{}},{"type":"text","text":"將計算過程分爲兩個階段:Map和Reduce","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8f/8f99aaff7b82f0d0fa5770932178803b.png","alt":"image-20210520231933220","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Map階段並行處理輸入數據","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Reduce階段對Map結果進行彙總","attrs":{}}]}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"三者之間的關係","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"HDFS、YARN、MapReduce三者之間的關係","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/6d/6d4f7846ce53ad9371c7af5d10778770.png","alt":"image-20210520233912648","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"從HDFS中讀取數據","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Yarn資源調度處理這些數據","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"MapReduce收到Yarn的指令後開啓相應的MapTask任務和ReduceTask任務","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"處理後的數據被存儲在HDFS中","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"你以爲Hadoop結束了,NO! NO! NO! Hadoop生態圈瞭解一波~","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"好吧,繼續學!","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c8/c860aaabcc9d5a1beb6aa40ff962ca23.gif","alt":null,"title":"","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Hadoop生態圈","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"先看一張","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"Hadoop生態體系","attrs":{}}],"marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"attrs":{}},{"type":"text","text":"的腦圖。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e0/e08706d2ecee2e12acca9398737b0dda.png","alt":"image-20201201224043405","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"媽耶,咋那麼多內容啊,快把我幹懵逼了。千萬別懵,雖然看起來很多,但是可以用一句總結:","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Hadoop","attrs":{}},{"type":"text","text":"是一個","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"分佈式計算開源框架","attrs":{}}],"marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"attrs":{}},{"type":"text","text":",提供了一個","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"分佈式系統","attrs":{}}],"marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"attrs":{}},{"type":"text","text":"子項目(HDFS)和支持MapReduce","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"分佈式計算軟件架構","attrs":{}}],"marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"attrs":{}},{"type":"text","text":"。既然腦圖的內容有點多,咱們就介紹幾個在Hadoop生態圈中佔有地位較高的幾個組件,如果小夥伴對其它組件感興趣,可以自行查閱。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Hive","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hive是基於Hadoop的一個數據倉庫工具,可以將結構化的數據文件映射爲一張數據庫表,通過類SQL語句快速實現簡單的MapReduce統計,不必開發專門的MapReduce應用,十分適合數據倉庫的統計分析。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Hbase","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hbase是一個高可靠性、高性能、面向列、可伸縮的分佈式存儲系統,利用Hbase技術可在廉價PC Server上搭建起大規模結構化存儲集羣。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Sqoop","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Sqoop是一個用來將Hadoop和關係型數據庫中的數據相互轉移的工具,可以將一個關係型數據庫(MySQL ,Oracle ,Postgres等)中的數據導進到Hadoop的HDFS中,也可以將HDFS的數據導進到關係型數據庫中","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Zookeeper","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Zookeeper 是一個爲分佈式應用所設計的分佈的、開源的協調服務,它主要是用來解決分佈式應用中經常遇到的一些數據管理問題,簡化分佈式應用協調及其管理的難度,提供高性能的分佈式服務。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Ambari","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Ambari是一種基於Web的工具,支持Hadoop集羣的供應、管理和監控。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Oozie","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Oozie是一個工作流引擎服務器, 用於管理和協調運行在Hadoop平臺上(HDFS、Pig和MapReduce)的任務。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Hue","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hue是一個基於WEB的監控和管理系統,實現對HDFS,MapReduce/YARN, HBase, Hive, Pig的web化操作和管理。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":".........","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hadoop生態體系先介紹這麼多,對其它內容感興趣的同學自行補充。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#009688","name":"user"}}],"text":"學習資源","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了方便大家學習大數據,我整理了一份大數據學習資源,包含了","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"學習路線","attrs":{}},{"type":"text","text":"、","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"視頻","attrs":{}},{"type":"text","text":"和","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"項目","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"私信我或者加我微信:langyakun9768","attrs":{}}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章