SparkStreaming 运行架构

原創

大数据开发

2020-06-30 22:10

SparkStreaming 进行数据的处理大致分为四个步骤：启动流处理引擎、接受以及存储数据、处理数据、输出结果等。

（1）初始化StreamingContext对象，在该对象启动过程中实例化DStreamGraph和JobGenrator,其中DStreamGraph用于存放DStream以及之间的依赖关系等信息，而jobscher中ReceiverTracker和JobGentator。其中ReceiverTracker为Driver端流数据接收器（Recevicer）的管理者，JobGentator启动过程中，根据流数据接收器分发策略通知对应的executor中的流数据接收器（ReciverSupervisor）启动，再由ReciverSupervisor启动流数据接收器。

（2）当数据接收器Recevicer启动后，持续不断地接受实时流数据，根据传过来的数据的大小进行判断，如果数据量很小，则等到多条数据成一块，在进行块存储；如果数据量大直接块存储。对于这些数据Receiver直接交给ReciverSupervisor，由其进行数据转存操作。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

相關文章

第四范式OpenMLDB: 拓展Spark源码实现高性能Join

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"

第四范式技术团队

2021-09-18 17:23:51

伴鱼数仓演进

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

伴鱼技术团队

2021-08-14 08:03:57

Apache Kyuubi PPMC燕青：为什么说这是开源最好的时代？

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-08-04 09:33:50

如何从Pandas迁移到Spark？这8个问答解决你所有疑问

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-06-18 08:03:55

伴鱼实时计算平台 Palink 的设计与实现

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

伴鱼技术团队

2021-06-13 07:03:55

提效7倍，Apache Spark 自适应查询优化在网易的深度实践及改进

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null

2021-05-19 11:08:57

大数据技术升级脉络及认知陷阱 | InfoQ 大咖说

直播內容：多年來，大數據技術經歷了幾輪更迭，在計算、存儲、大規模落地等層面均取得了不錯的進展，並在不斷的成長和成熟，整個生態領域也得到了快速發展。目前，基於分析的大數據計算平臺在各大公司發揮着非常重要的基礎設施的作用。本期，網易數據科學

InfoQ 中文站

2021-04-26 10:43:51

实时数据仓库的发展、架构和趋势

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null

2021-04-02 09:43:51

大数据+云：Kylin/Spark/Clickhouse/Hudi 的大佬们怎么看？

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-03-22 18:35:29

如何用Spark计算引擎执行FATE联邦学习任务？

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null

2021-03-22 18:34:37

估值突破280亿美元！大数据独角兽公司Databricks再获10亿美元融资

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-02-02 03:03:58

数据倾斜？Spark 3.0 AQE专治各种不服

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-01-21 19:33:54

英雄惜英雄-当Spark遇上Zeppelin之实战案例

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-01-18 18:53:58

Apache Spark 3.0新特性在FreeWheel核心业务数据团队的应用与实战

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"引言"}]},{"t

2021-01-06 15:53:58

深入浅出Spark（四）：存储系统

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null

2020-12-28 09:03:52

24小時熱門文章

最新文章

最新評論文章