🌏【架构师指南】分布式技术知识点总结(数据处理)

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"数据分析","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"从传统的基于关系型数据库并行处理集群、用于内存计算近实时的,到目前的基于hadoop的海量数据的分析,数据的分析在大型电子商务网站中应用非常广泛,包括流量统计、推荐引擎、趋势分析、用户行为分析、数据挖掘分类器、分布式索引等等。","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"并行处理集群有商业的EMC Greenplum,Greenplum的架构采用了MPP(大规模并行处理),基于postgresql的大数据量存储的分布式数据库。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"内存计算方面有SAP的HANA,开源的nosql内存型的数据库mongodb也支持mapreduce进行数据的分析。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"海量数据的离线分析目前互联网公司大量的使用Hadoop、Spark、Blink、Flink。Hadoop在可伸缩性、健壮性、计算性能和成本上具有无可替代的优势,事实上已成为当前互联网企业主流的大数据分析平台。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hadoop通过MapReduce的分布式处理框架,用于处理大规模的数据,伸缩性也非常好;但是MapReduce最大的不足是不能满足实时性的场景,主要用于离线的分析。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基于MapRduce模型编程做数据的分析,开发上效率不高,位于hadoop之上Hive的出现使得数据的分析可以类似编写sql的方式进行,sql经过语法分析、生成执行计划后最终生成MapReduce任务进行执行,这样大大提高了开发的效率,做到以ad-hoc(计算在query发生时)方式进行的分析。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"基于MapReduce模型的分布式数据的分析都是离线分析,执行上是暴力扫描,无法利用类似索引的机制;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"开源的Cloudera Impala是基于MPP的并行编程模型的,底层是Hadoop存储的高性能的实时分析平台,可以大大降低数据分析的延迟。","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"目前Hadoop使用的版本是Hadoop1.0,一方面原有的MapReduce框架存在JobTracker单点的问题,另外一方面JobTracker在做资源管理的同时又做任务的调度工作,随着数据量的增大和Job任务的增多,明显存在可扩展性、内存消耗、线程模型、可靠性和性能上的缺陷瓶颈;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"Hadoop2.0 yarn对整个框架进行了重构,分离了资源管理和任务调度,从架构设计上解决了这个问题。","attrs":{}}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"实时计算","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在互联网领域,实时计算被广泛实时","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"监控分析、流控、风险控制","attrs":{}},{"type":"text","text":"等领域。电商平台系统或者应用对日常产生的大量日志和异常信息,需要经过实时过滤、分析,以判定是否需要预警;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同时需要对系统做自我保护机制,比如对模块做流量的控制,以防止非预期的对系统压力过大而引起的系统瘫痪,流量过大时,可以采取拒绝或者引流等机制;","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"有些业务需要进行风险的控制,比如彩票中有些业务需要根据系统的实时销售情况进行限号与放号。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原始基於单节点的计算,随着系统信息量爆炸式产生以及计算的复杂度的增加,单个节点的计算已不能满足实时计算的要求,需要进行多节点的分布式的计算,分布式实时计算平台就出现了。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}},{"type":"strong","attrs":{}}],"text":"这里所说的实时计算,其实是流式计算,概念前身其实是CEP复杂事件处理,相关的开源产品如Esper,业界分布式的流计算产品Yahoo S4,Twitter storm、flink、blink等,以storm和blink和flink开源产品使用最为广泛。","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"对于实时计算平台,从架构设计上需要考虑以下几个因素:","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"伸缩性","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"随着业务量的增加,计算量的增加,通过增加节点处理,就可以处理。","attrs":{}}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"高性能、低延迟","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"从数据流入计算平台数据,到计算输出结果,需要性能高效且低延迟,保证消息得到快速的处理,做到实时计算。","attrs":{}}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"可靠性","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"保证每个数据消息得到一次完整处理。","attrs":{}}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"容错性","attrs":{}}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 系统可以自动管理节点的宕机失效,对应用来说,是透明的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 整个集群的管理是通过zookeeper来进行的。客户端提交拓扑到nimbus。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Nimbus针对该拓扑建立本地的目录根据topology的配置计算task,分配task,在zookeeper上建立assignments节点存储task和supervisor机器节点中woker的对应关系。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" zookeeper上创建taskbeats节点来监控task的心跳;启动topology。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Supervisor去zookeeper上获取分配的tasks,启动多个woker进行,每个woker生成task,一个task一个线程;根据topology信息初始化建立task之间的连接;Task和Task之间是通过zeroMQ管理的;之后整个拓扑运行起来。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Tuple是流的基本处理单元,也就是一个消息,Tuple在task中流转,Tuple的发送和接收过程如下:","attrs":{}}]},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"发送Tuple,Worker提供了一个transfer的功能,用于当前task把tuple发到到其他的task中。以目的taskid和tuple参数,序列化tuple数据并放到transfer queue中","attrs":{}},{"type":"text","text":"。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"在0.8版本之前,这个queue是LinkedBlockingQueue,0.8之后是DisruptorQueue","attrs":{}},{"type":"text","text":"。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":3,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"在0.8版本之后,每一个worker绑定inbound transfer queue和outbound queue,inbound queue用于接收message,outbound queue用于发送消息","attrs":{}},{"type":"text","text":"。","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 发送消息时,单个线程从transferqueue中拉取数据,把这个tuple通过zeroMQ发送到其他worker中。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 接收Tuple,每个worker都会监听zeroMQ的tcp端口来接收消息,消息放到DisruptorQueue中后,后从queue中获取message(taskid,tuple),根据目的taskid,tuple的值路由到task中执行。每个tuple可以emit到direct steam中,也可以发送到regular stream中,在Reglular方式下,由Stream Group(stream id-->component id -->outbound tasks)功能完成当前tuple将要发送的Tuple的目的地。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 通过以上分析可以看到,Storm在伸缩性、容错性、高性能方面的从架构设计的角度得以支撑;同时在可靠性方面,Storm的ack组件利用异或xor算法在不失性能的同时,保证每一个消息得到完整处理的同时。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"实时推送","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 实时推送应用场景非常多,比如系统的监控动态的实时曲线绘制,手机消息的推送,web实时聊天等。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 实时推送有很多技术可以实现,有Comet方式,有websocket方式等。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Comet基于服务器长连接的“服务器推”技术,包含两种:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Long Polling:服务器端在接到请求后挂起,有更新时返回连接即断掉,然后客户端再发起新的连接","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Stream方式:每次服务端数据传送不会关闭连接,连接只会在通信出现错误时,或是连接重建时关闭(一些防火墙常被设置为丢弃过长的连接, 服务器端可以设置一个超时时间, 超时后通知客户端重新建立连接,并关闭原来的连接)。","attrs":{}}]}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":5},"content":[{"type":"text","text":"Websocket:长连接,全双工通信","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 是HTML5的一种新的协议。它实现了浏览器与服务器的双向通讯。webSocket API中,浏览器和服务器端只需要通过一个握手的动作,便能形成浏览器与客户端之间的快速双向通道,使得数据可以快速的双向传播。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Socket.io是一个NodeJS websocket库,包括客户端的JS和服务端的的nodejs,用于快速构建实时的web应用。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"数据存储","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 数据库存储大体分为以下几类,有关系型(事务型)的数据库,以oracle、mysql为代表,有keyvalue数据库,以redis和memcached db为代表,有文档型数据库如mongodb,有列式分布式数据库以HBase,cassandra,dynamo为代表,还有其他的图形数据库、对象数据 库、xml数据库等。每种类型的数据库应用的业务领域是不一样的,下面从内存型、关系型、分布式三个维度针对相关的产品做性能可用性等方面的考量分析。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"内存型数据库","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 内存型的数据库,以高并发高性能为目标,在事务性方面没那么严格,以开源nosql数据库mongodb、redis为例。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":5},"content":[{"type":"text","text":"Mongodb","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"通信方式","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"多线程方式,主线程监听新的连接,连接后,启动新的线程做数据的操作(IO切换)。","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"数据结构","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"数据库-->collection-->record","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MongoDB在数据存储上按命名空间来划分,一个collection是一个命名空间,一个索引也是一个命名空间。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同一个命名空间的数据被分成很多个Extent,Extent之间使用双向链表连接。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在每一个Extent中,保存了具体每一行的数据,这些数据也是通过双向链接连接的。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每一行数据存储空间不仅包括数据占用空间,还可能包含一部分附加空间,这使得在数据update变大后可以不移动位置。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"索引以BTree结构实现。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果你开启了jorunaling日志,那么还会有一些文件存储着你所有的操作记录。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"持久化存储","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MMap方式把文件地址映射到内存的地址空间,直接操作内存地址空间就可以操作文件,不用再调用write,read操作,性能比较高。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"mongodb调用mmap把磁盘中的数据映射到内存中的,所以必须有一个机制时刻的刷数据到硬盘才能保证可靠性,多久刷一次是与syncdelay参数相关的。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"journal(进行恢复用)是Mongodb中的redo log,而Oplog则是负责复制的binlog。如果打开journal,那么即使断电也只会丢失100ms的数据,这对大多数应用来说都可以容忍了。从1.9.2+,mongodb都会默认打开journal功能,以确保数据安全。而且journal的刷新时间是可以改变的,2-300ms的范围,使用--journalCommitInterval命令。Oplog和数据刷新到磁盘的时间是60s,对于复制来说,不用等到oplog刷新磁盘,在内存中就可以直接复制到Sencondary节点。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"事务支持","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Mongodb只支持对单行记录的原子操作","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"HA集群","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"用的比较多的是Replica Sets,采用选举算法,自动进行leader选举,在保证可用性的同时,可以做到强一致性要求。","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"当然对于大量的数据,mongodb也提供了数据的切分架构Sharding.","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"部署平台","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"监控、统计","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大型分布式系统涉及各种设备,比如网络交换机,普通PC机,各种型号的网卡,硬盘,内存等等,还有应用业务层次的监控,数量非常多的时候,出现错误的概率也会变大,并且有些监控的时效性要求比较高,有些达到秒级别;在大量的数据流中需要过滤异常的数据,有时候也对数据会进行上下文相关的复杂计算,进而决定是否需要告警。因此监控平台的性能、吞吐量、已经可用性就比较重要,需要规划统一的一体化的监控平台对系统进行各个层次的监控。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"平台的数据分类","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"应用业务级别:应用事件、业务日志、审计日志、请求日志、异常、请求业务metrics、性能度量","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"系统级别:CPU、内存、网络、IO","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"时效性要求","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"阀值,告警:","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"实时计算:","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"近实时分钟计算","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"按小时、天的离线分析","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"实时查询","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"节点中Agent代理可以接收日志、应用的事件以及通过探针的方式采集数据,agent采集数据的一个原则是和业务应用的流程是异步隔离的,不影响交易流程。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"数据统一通过collector集群进行收集,按照数据的不同类型分发到不同的计算集群进行处理;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有些数据时效性不是那么高,比如按小时进行统计,放入hadoop集群;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有些数据是请求流转的跟踪数据,需要可以查询的,那么就可以放入solr集群\\ES集群进行索引;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有些数据需要进行实时计算的进而告警的,需要放到storm集群中进行处理。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"数据经过计算集群处理后,结果存储到Mysql或者HBase中。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"监控的web应用可以把监控的实时结果推送到浏览器中,也可以提供API供结果的展现和搜索。","attrs":{}}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章