RocketMQ Streams:将轻量级实时计算引擎融合进消息系统

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"作者 | 袁小栋、程君杰"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"审核校对 | 杜恒、岁月、白玙、不周"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"随着各行各业移动互联和云计算技术的普及发展,大数据计算已深入人心,最常见的比如flink、spark等。这些大数据框架,采用中心化的Master-Slave架构,依赖和部署比较重,每个任务也有较大开销,有较大的使用成本。RocketMQ Streams着重打造轻量计算引擎"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#171A1D","name":"user"}}],"text":",除了消息队列,无额外依赖,对过滤场景做了大量优化,性能提升3-5倍,资源节省50%-80%。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RocketMQ Streams适合大数据量->高过滤->轻窗口计算的场景,核心打造轻资源,高性能优势,在资源敏感场景中有很大优势,最低1core,1g可部署,建议的应用场景(安全,风控,边缘计算,消息队列流计算)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RocketMQ Streams兼容Blink(Flink的阿里内部版本) 的SQL,UDF\/UDTF\/UDAF,多数Blink任务可以直接迁移成RocketMQ Streams任务。将来还会发布和Flink的融合版本,RocketMQ Streams可以直接发布成Flink任务,既可以享有RocketMQ Streams带来的高性能,轻资源,还可以和现有的Flink任务统一运维和管理。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本篇文章主要从五个方面来介绍RocketMQ Streams实时计算平台:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先简单先介绍一下什么是RocketMQ Streams;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二部分,基于RocketMQ Streams的SDK,来了解下它是怎么去使用的;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第三部分,RocketMQ Streams整体的架构以及它的原理实现;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第四部分,在云安全的场景下该怎么使用RocketMQ Streams;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第五部分,RocketMQ Streams的未来规划。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"一、什么是RocketMQ Streams?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本章节从基础简介、设计思路和特点三方面对RocketMQ streams进行整体介绍。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.1 RocketMQ Streams简介"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1)首先,它是一个Lib包,启动即运行,和业务直接集成;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)然后,它具备SQL引擎能力,兼容Blink SQL语法,兼容Blink UDF\/UDTF\/UDAF;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3)其次,它包含ETL引擎,可以无编码实现数据的ETL,过滤和转存;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4)最后,它基于数据开发SDK,大量实用组件可直接使用,如:Source、sink、script、filter、lease、scheduler、configurable不局限流的场景。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/aa\/aa244c6ab90502ab0237c57e662295e7.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.2 RocketMQ Streams的特点"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RocketMQ streams基于上述的实现思路,可以看到它有以下几个特点:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"1.2.1 轻量"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1核1g就可以部署,依赖较轻,在测试场景下用Jar包直接写个main方法就可以运行,在正式环境下最多依赖消息队列和存储(其中存储是可选的,主要是为了分片切换时的容错)。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"1.2.2 高性能"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"实现高过滤优化器,包括前置指纹过滤,同源规则自动归并,hyperscan加速,表达式指纹等,比优化前性能提升3-5倍,资源节省50%以上。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"1.2.3 维表 JOIN(千万数据量维表支持)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"设计高压缩内存存储数据,无java头部和对齐的开销,存储接近原始数据大小,纯内存操作,性能最大化,同时对于Mysql提供了多线程并发加载,提高加载维表的速度。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"1.2.4 高扩展的能力"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Source可按需扩展,已实现:RocketMQ,File,Kafka;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Sink可按需扩展,已实现:RocketMQ,File,Kafka,Mysql,ES;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可按Blink规范扩展 UDF\/UDTF\/UDAF;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"提供了更轻的UDF\/UDTF扩展能力,不需要任何依赖就可以完成函数的扩展。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"1.2.5 提供了丰富的大数据的能力"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"包括精确计算一次灵活的窗口,双流join,统计,开窗,各种转换过滤,满足大数据开发的各种场景,支持弹性容错的能力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"二、RocketMQ Streams的使用"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RocketMQ Streams 对外提供两种SDK,一种是DSL SDK,一种是SQL SDK,用户可以按需选择; DSL SDK支持实时场景DSL语义; SQL SDK 兼容Blink(Flink的阿里内部版本) SQL的语法,多数Blink SQL可以通过RocketMQ Streams运行;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下来,我们详细的介绍一下这两种SDK。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.1 环境要求"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" JDK1.8 版本以上;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Maven 3.2版本以上。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.2 DSL SDK"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"利用DSL SDK开发实时任务时,需要做如下的一些准备工作:"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.2.1 依赖准备"}]},{"type":"codeblock","attrs":{"lang":"xml"},"content":[{"type":"text","text":"\n org.apache.rocketmq\n rocketmq-streams-clients\n 1.0.0-SNAPSHOT\n\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"准备工作完成后,就可以直接开发自己的实时程序。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.2.2 代码开发"}]},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"DataStreamSource source=StreamBuilder.dataStream(\"namespace\",\"pipeline\");\n\nsource.fromFile(\"~\/admin\/data\/text.txt\",false)\n .map(message->message + \"--\")\n .toPrint(1)\n .start();\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1)Namespace是业务隔离的,相同的业务可以写成相同的Namespace。相同的Namespace在任务调度里可以跑在进程里,也可以共享一些配置;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)pipelineName可以理解成就是job name ,唯一区分job;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3)DataStreamSource主要是创建Source,然后这个程序运行起来,最终的结果就是在原始的消息里面会加\"--\",然后把它打印出来。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.2.3 丰富的算子"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RocketMQ streams提供了丰富的算子, 包括:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"source算子:包括fromFile, fromRocketMQ, fromKafka 以及可以自定义source来源的from算子;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"sink 算子: 包括toFile, toRocketMQ, toKafka,toDB,toPrint, toES 以及可以自定义sink的to算子;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"action算子:包括Filter,Expression,Script,selectFields,Union,forEach,Split,Select,Join,Window 等多个算子。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.2.4 部署执行"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基于DSL SDK完成开发,通过下面命令打成jar包,执行jar,或直接执行任务的main方法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"mvn -Prelease-all -DskipTests clean install -U\njava -jar jarName mainClass &\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.3 SQL SDK"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.3.1 依赖准备"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#595959","name":"user"}}],"text":"  "}]},{"type":"codeblock","attrs":{"lang":"xml"},"content":[{"type":"text","text":" \n com.alibaba\n rsqldb-clients\n 1.0.0-SNAPSHOT\n\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.3.2 代码开发"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先开发业务逻辑代码, 可以保存为文件也可以直接使用文本;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"CREATE FUNCTION json_concat as 'xxx.xxx.JsonConcat';\n\nCREATE TABLE `table_name` (\n `scan_time` VARCHAR,\n `file_name` VARCHAR,\n `cmdline` VARCHAR,\n) WITH (\n type='file',\n filePath='\/tmp\/file.txt',\n isJsonData='true',\n msgIsJsonArray='false'\n);\n\n\n-- 数据标准化\n\ncreate view data_filter as\nselect\n *\nfrom (\n select\n scan_time as logtime\n , lower(cmdline) as lower_cmdline\n , file_name as proc_name\n from\n table_name\n)x\nwhere\n (\n lower(proc_name) like '%.xxxxxx'\n or lower_cmdline like 'xxxxx%'\n or lower_cmdline like 'xxxxxxx%'\n or lower_cmdline like 'xxxx'\n or lower_cmdline like 'xxxxxx'\n )\n;\n\nCREATE TABLE `output` (\n `logtime` VARCHAR\n , `lower_cmdline` VARCHAR\n , `proc_name` VARCHAR\n) WITH (\n type = 'print'\n);\n\ninsert into output\nselect\n *\nfrom\n aegis_log_proc_format_raw\n;\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" CREATE FUNCTION:引入外部的函数来支持业务逻辑, 包括flink以及系统函数;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CREATE Table:创建source\/sink;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CREATE VIEW:执行字段转化,拆分,过滤;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"INSERT INTO:数据写入sink;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"函数:内置函数,udf函数。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.3.3 SQL扩展"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RocketMQ streams支持三种SQL扩展能力,具体实现细节请看:"},{"type":"link","attrs":{"href":"rsqldb","title":null,"type":null},"content":[{"type":"text","text":"https:\/\/github.com\/alibaba\/rsqldb"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1)通过Blink UDF\/UDTF\/UDAF扩展SQL能力;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)通过RocketMQ streams扩展SQL能力,只要实现函数名是eval的java bean即可;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3)通过现有java代码扩展SQL能力,create function 函数名就是java类的方法名。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.3.4 SQL执行"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"你可以从"},{"type":"link","attrs":{"href":"https:\/\/github.com\/apache\/rocketmq-streams","title":null,"type":null},"content":[{"type":"text","text":"这里"}]},{"type":"text","text":"下载最新的Rocketmq Streams代码并构建。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"cd rsqldb\/\nmvn -Prelease-all -DskipTests clean install -U\ncp rsqldb-runner\/target\/rocketmq-streams-sql-{版本号}-distribution.tar.gz 部署的目录\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"解压tar.gz包, 进入目录结构"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"tar -xvf rocketmq-streams-{版本号}-distribution.tar.gz\ncd rocketmq-streams-{版本号}\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其目录结构如下 "}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" bin 指令目录,包括启动和停止指令"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"conf 配置目录,包括日志配置以及应用的相关配置文件"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"jobs 存放sql,可以两级目录存储"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ext 存放扩展的UDF\/UDTF\/UDAF\/Source\/Sink"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"lib 依赖包目录"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"log 日志目录"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2.3.4.1 执行SQL"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"#指定sql的路径,启动实时任务\nbin\/start-sql.sh sql_file_path\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2.3.4.2 执行多个SQL"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果想批量执行一批SQL,可以把SQL放到jobs目录,最多可以有两层,把sql放到对应目录中,通过start指定子目录或sql执行任务。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2.3.4.3 任务停止"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"# 停止过程不加任何参数,则会将目前所有运行的任务同时停止\nbin\/stop.sh\n\n# 停止过程添加了任务名称, 则会将目前运行的所有同名的任务都全部停止\nbin\/stop.sh sqlname\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2.3.4.4 日志查看"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前所有的运行日志都会存储在 log\/catalina.out文件中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"三、架构设计及原理分析"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.1 RocketMQ Streams设计思路"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在了解完RocketMQ streams的基本简介,接下来,我们看下RocketMQ streams的设计思路,设计思路主要从设计目标和策略两个方面来介绍:"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"3.1.1 设计目标"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"依赖少,部署简单,1核1g单实例可部署,可随意扩展规模;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"打造场景优势,重点打造大数据量->高过滤->轻窗口计算的场景,功能覆盖度要全,实现需要的大数据特性:Exactly-ONCE、灵活的窗口(滚动、滑动、会话窗口);"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"要在保持低资源的前提下,对高过滤有性能突破,打造性能优势;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"兼容Blink SQL,UDF\/UDTF\/UDAF,让非技术人员更容易上手。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"3.1.2 策略(适配场景:大数据量>高过滤\/ETL>低窗口计算)"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"采用shared-nothing的分布式架构设计,依赖消息队列做负载均衡和容错机制,单实例可启动,增加实例实现能力扩展,并发能力取决于分片数;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"利用消息队列的分片做shuffle,利用消息队列负载均衡实现容错;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"利用存储实现状态备份,实现Exactly-ONCE的语义。用结构化远程存储实现快速启动,不等本地存储恢复。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"重力打造过滤优化器,通过前置指纹过滤,同源规则自动归并,hyperscan加速,表达式指纹提高过滤性能"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/e7\/e76d95a0030f0a2019289afdc75e3d0f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.2 RocketMQ Streams Source的实现"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1)Source要求实现最少消费一次的语义,系统通过checkpoint系统消息实现,在提交offset前发送checkpoint消息,通知所有算子刷新内存。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)Source支持分片的自动负载和容错"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"数据源在分片移除时,发送移除系统消息,让算子完成分片清理工作;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"当有新分片时,发送新增分片消息,让算子完成分片初始化。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3)数据源通过start方法,启动consuemr获取消息;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4)原始消息经过编码,附加头部信息包装成Message投递给后续算子。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/1c\/1cca76f0483c7f580841a527f0cd2407.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.3 RocketMQ Streams Sink的实现"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1)Sink是实时性和吞吐的一个结合;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)实现一个sink只要继承AbstractSink类实现batchInsert方法即可。batchInsert的含义是一批数据写入存储,需要子类调用存储接口实现,尽量应用存储的批处理接口,提高吞吐;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3)常规的使用方式是写message->cache->flush->存储的方式,系统会严格保证每次批次写入存储的量不超过batchsize的量,如果超过了,会拆分成多批写入;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/51\/51f673ce31b340b21dee51196efe9441.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4)Sink有一个cache,数据默认写cache,批次写入存储,提高吞吐(一个分片一个cache);"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"5)可以开启自动刷新,每个分片会有一个线程,定时刷新cache数据到存储,提高实时性。实现类:DataSourceAutoFlushTask;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"6)通过调用flush方法刷新cache到存储;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"7)Sink的cache会有内存保护,当cache的消息条数>batchSize,会强制刷新,释放内存。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.4 RocketMQ Streams Exactly-ONCE实现"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1)Source确保在commit offset时,会发送checkpoint系统消息,收到消息的组件会完成存盘操作,消息至少消费一次;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)每条消息会有消息头部,里面封装了queueld和offset;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)组件在存储数据时,会把queueld和处理的最大offset存储下来,当有消息重复时,根据maxoffset去重;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3)内存保护,一个checkpoint周期可能有多次flush(条数触发),保障内存占用可控。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/27\/27debc75c6657bf3f0a341799a68a434.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.5 RocketMQ Streams Window"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"实现方式:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1)支持滚动、滑动和会话窗口,支持事件时间和自然时间(消息进入算子的时间);"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)支持Emit语法,可以在触发前或触发后,每隔n段时间,更新一次数据;比如1小时窗口,窗口触发前希望每分钟看到最新结果,窗口触发后希望不丢失迟到一天内的数据,且每10分钟更新数据。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3)支持高性能模式和高可靠模式,高性能模式不依赖远程存储,但在分片切换时,有丢失窗数据的风险;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4)快速启动,无需等待本地存储恢复,在发生错误或分片切换时,异步从远程存储恢复数据,同时直接访问远程存储计算;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"5)利用消息队列负载均衡,实现扩容缩容容,每个queue是一份组,一个分组同一刻只被一台机器消费;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"6)正常计算依赖本地存储,具备flink相似的计算性能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/22\/224a8430a195476c08baf70a13b49b32.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"四、RocketMQ Streams在安全场景的最佳实践"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4.1 背景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"从公共云转战专有云,遇到了新的问题。因为专有云像大数据这种saas服务是非必须输出的,且最小输出规模也比较大,用户成本会增加很多,难落地,导致安全能力无法快速同步到专有云。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/cd\/cd110af3ee715c3fe204f55115bf88ba.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4.2 解决办法"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"RocketMQ Streams在云安全的应用-流计算"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基于安全场景打造轻量级计算引擎,基于安全高过滤的场景特点,可以针对高过滤场景优化,然后再做较重的统计、窗口、join操作,因为过滤率比较高,可以用更轻的方案实现统计和join操作;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SQL和引擎都可热升级"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/95\/95079fc049905f6f608e698480d6e7a5.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"业务结果"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"1)规则覆盖:自建引擎,覆盖100%规则(正则,join,统计);"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"2)轻资源,内存是公共云引擎的1\/24,cpu是1\/6,依赖过滤优化器,资源不随规则线性增加,新增规则无资源压力,通过高压缩表,支持千万情报;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"3)SQL发布,通过c\/s部署模式,SQL引擎热发布,尤其护网场景,可快速上线规则;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"4)性能优化,对核心组件进行专题性能优化,保持高性能,每实例(2g,4核,41规则)5000qps以上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"五、RocketMQ Streams的未来规划"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"5.1 打造RocketMQ一体化计算能力"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1)和RocketMQ整合,去除DB依赖,融合RocketMQ KV;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)和RocketMQ混部,支持本地计算,利用本地特点,打造高性能;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3)打造边缘计算最佳实践"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"5.2 Connector增强"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1)支持pull消费方式,checkpoint异步刷新;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)兼容blink\/flink connector。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"5.3 ETL能力建设"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1)增加文件,syslog的数据接入能力"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)兼容Grok解析,增加常用日志的解析能力;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3)打造日志ETL 的最佳实践"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"5.4 稳定性和易用性打造"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1)Window多场景测试,提升稳定性,性能优化;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2)补充测试用例,文档,应用场景。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"六、开源地址"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RocketMQ-Streams: "},{"type":"link","attrs":{"href":"https:\/\/github.com\/apache\/rocketmq-streams","title":null,"type":null},"content":[{"type":"text","text":"https:\/\/github.com\/apache\/rocketmq-streams"}]},{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RocketMQ-Streams-SQL:"},{"type":"link","attrs":{"href":"https:\/\/github.com\/alibaba\/rsqldb","title":null,"type":null},"content":[{"type":"text","text":"https:\/\/github.com\/alibaba\/rsqldb"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以上是本次对RocketMQ stream的整体介绍,希望对大家有所帮助和启发。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章