深入淺出Spark(四):存儲系統

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"專題介紹"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2009 年,Spark 誕生於加州大學伯克利分校的 AMP 實驗室(the Algorithms, Machines and People lab),並於 2010 年開源。2013 年,Spark 捐獻給阿帕奇軟件基金會(Apache Software Foundation),並於 2014 年成爲 Apache 頂級項目。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如今,十年光景已過,Spark 成爲了大大小小企業與研究機構的常用工具之一,依舊深受不少開發人員的喜愛。如果你是初入江湖且希望瞭解、學習 Spark 的“小蝦米”,那麼 InfoQ 與 FreeWheel 技術專家吳磊合作的專題系列文章——"},{"type":"link","attrs":{"href":"https:\/\/www.infoq.cn\/theme\/84","title":"","type":null},"content":[{"type":"text","text":"《深入淺出 Spark:原理詳解與開發實踐》"}]},{"type":"text","text":"一定適合你!"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文系專題系列第四篇。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先感謝各位看官在百忙之中來聽我說書,真是太給面子啦!在前文書"},{"type":"link","attrs":{"href":"https:\/\/www.infoq.cn\/article\/5aOHzQIaXX6NlHriLtSI","title":"xxx","type":null},"content":[{"type":"text","text":"《Spark調度系統之權力的遊戲》"}]},{"type":"text","text":"中咱們提到SparkContext的初始化就像是打開了潘多拉的盒子,宛如三十六天罡臨凡、七十二地煞降世,穩坐聚義廳頭三把交椅的是Spark調度系統的三位大佬。三位大佬通力配合最終將任務(代碼)分發到Executor,Executor則將分佈式任務封裝爲TaskRunner並交由線程池執行。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"籠統地說,任務執行的過程通常是將數據從一種形態轉換爲另一種形態,對於計算成本較高的數據形態,Spark通過緩存機制來保證作業的順利完成,今天咱們就來說說Spark的存儲系統,看看Spark存儲系統如何爲任務的執行提供基礎保障。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/11\/8f\/11f55f2c71613366094e6d5b7347b38f.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"SparkContext初始化"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"任何一個存儲系統要解決的關鍵問題無非是數據的存與取、收與發,不過,在去探討Spark存儲系統如何工作之前,咱們先來搞清楚Spark存儲系統中“存”的主要是什麼內容?總的來說,Spark存儲系統用於存儲3個方面的數據:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RDD緩存"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Shuffle中間結果"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"廣播變量"}]}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章