深入浅出Spark(四):存储系统

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"专题介绍"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2009 年,Spark 诞生于加州大学伯克利分校的 AMP 实验室(the Algorithms, Machines and People lab),并于 2010 年开源。2013 年,Spark 捐献给阿帕奇软件基金会(Apache Software Foundation),并于 2014 年成为 Apache 顶级项目。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如今,十年光景已过,Spark 成为了大大小小企业与研究机构的常用工具之一,依旧深受不少开发人员的喜爱。如果你是初入江湖且希望了解、学习 Spark 的“小虾米”,那么 InfoQ 与 FreeWheel 技术专家吴磊合作的专题系列文章——"},{"type":"link","attrs":{"href":"https:\/\/www.infoq.cn\/theme\/84","title":"","type":null},"content":[{"type":"text","text":"《深入浅出 Spark:原理详解与开发实践》"}]},{"type":"text","text":"一定适合你!"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文系专题系列第四篇。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先感谢各位看官在百忙之中来听我说书,真是太给面子啦!在前文书"},{"type":"link","attrs":{"href":"https:\/\/www.infoq.cn\/article\/5aOHzQIaXX6NlHriLtSI","title":"xxx","type":null},"content":[{"type":"text","text":"《Spark调度系统之权力的游戏》"}]},{"type":"text","text":"中咱们提到SparkContext的初始化就像是打开了潘多拉的盒子,宛如三十六天罡临凡、七十二地煞降世,稳坐聚义厅头三把交椅的是Spark调度系统的三位大佬。三位大佬通力配合最终将任务(代码)分发到Executor,Executor则将分布式任务封装为TaskRunner并交由线程池执行。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"笼统地说,任务执行的过程通常是将数据从一种形态转换为另一种形态,对于计算成本较高的数据形态,Spark通过缓存机制来保证作业的顺利完成,今天咱们就来说说Spark的存储系统,看看Spark存储系统如何为任务的执行提供基础保障。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/11\/8f\/11f55f2c71613366094e6d5b7347b38f.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"SparkContext初始化"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"任何一个存储系统要解决的关键问题无非是数据的存与取、收与发,不过,在去探讨Spark存储系统如何工作之前,咱们先来搞清楚Spark存储系统中“存”的主要是什么内容?总的来说,Spark存储系统用于存储3个方面的数据:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RDD缓存"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Shuffle中间结果"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"广播变量"}]}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章