Kylin 5 年的成长与未来规划

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Apache Kylin 5 Year Milestone"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2015年底,Kylin 正式从 Apache 软件基金会毕业,成为第一个来自中国的Apache 顶级开源项目。在过去的5 年里,Kylin 社区不断发展壮大,有了非常活跃的技术社区,协同开发,推动着 Kylin 成长为一款高性能的大数据分析型数据仓库。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/ce\/72\/ce4ca6ec56dc80c7875230029d427c72.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"Kylin 过往 5年里主要的发展里程碑"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1)Kylin 技术发展概览"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2016年初,Kylin 1.5 版本支持 Plug-in 架构"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"为未来开发出更多的数据源、构建引擎以及存储引擎打下了坚实基础。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2016年底,Kylin 1.6 版本支持 Kafka 实时数据源"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通过微批次的方式加载数据,大大降低了数据加载的延迟,从过去的天级或若干小时级变成了分钟级。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2017年4月,Kylin 2.0 发布"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"正式支持 Spark 作为构建引擎,以及增加了对雪花模型的支持。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2018年3月,eBay 团队贡献了 Cube Planner 和Dashboard"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"这让 Kylin Cube 优化比过去方便了很多,用户可以对数据进行采样和分析,让算法决定哪些Cuboid 需要计算,再结合 Dashboard 中收集到的查询历史,可以让 Cube 进一步的瘦身优化,在此基础上性能和存储都得到大大的改善。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2018年9月,Kylin 2.5 版本支持了 Hadoop 3.0 以及 HBase 2.0"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以进一步从分钟级的延迟降低到秒级。Kylin 也从此能同时支持历史查询、准实时查询以及实时查询。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2020年中,Kylin 3.1 版本发布"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"支持了 Flink 作为计算引擎,至此 MapReduce、 Spark 以及 Flink 都与 Kylin 实现对接,用户可以根据他们的喜好选择合适的引擎做 Cube 计算。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2020年9月,Kylin 4.0 Alpha 版本正式发布"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"全新的构建引擎和查询引擎极大地提升构建性能和查询性能,解决了查询单点问题等痛点;去除了 HBase 依赖,很大程度地解决了 Kylin 的难运维问题,也使得 Kylin 的计算和存储分离变为可能。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2)Kylin 社区成长"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在过去的 5 年中,Kylin 俘获了非常多的用户,在数据分析及报表展示中广泛得应用,用户群不仅包括 eBay、Yahoo 日本、美团、网易等这些互联网厂商,也包括了一些制造业厂商,如小米、华为、VIVO 等;此外也有一批海外用户,如 VISA、CISCO、迪卡侬、沃达丰等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/71\/3c\/712532d07fd8c92bd4959ba49380103c.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kylin 的社区也在这 5 年中不断的发展和壮大,2015 年只有 16 个初始的开发人员,当时 Kylin 发布了 5 个版本,从那之后 Kylin 社区在稳步的发展,截至到 2020 年 12 月,共有 44 个 Committer,包含 24 个 PMC Members,除此之外还有超过180+ 位 Code Contributors。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"从 2015 年到 2020 年,Kylin 共发布了 36 个 Apache Release,平均算下来每年会发布 6-7 个版本。另外,社区的活跃度从 Jira Issue 以及 GitHub Star 上也可以得到一个概览。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/ed\/8b\/ed11ed103691cf991dc043a229ff028b.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"可以从这个曲线上看到 JIRA issue 一直在增长,说明每年有许多用户在 Kylin 社区活跃,GitHub Star 数也同样是呈持续增长态势。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"最新版本:Apache Kylin 4.0"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1)Kylin 4.0 开发节点"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先从时间上来看,Kylin 4.0 Beta 版本最快将在本月底或 2021 年初的时候发布,GA 版本将在 2021 年中旬正式发布。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/fc\/e3\/fcc2f56f70c481520806d7eafac7ffe3.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kylin 4.0 改革的一大目标就是将 HBase 存储替换成 Parquet 存储。这一目标考虑的是 HBase 的功能比较复杂,对 Kylin 运维带来非常大的挑战,随之而来的是企业维护的成本及扩容成本高。未来我们希望 Kylin 能发展成存储和计算分离的架构,通过使用轻量级的列式存储来帮助用户更容易运维。Cube 的计算引擎及查询引擎都会替换成 Spark,整个这个体系架构也是基于 Kylin 的可插拔架构来实现的,但是因为存储是一个非常基础的模块,所以它对上层和对下层都有不少的改动。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前,4.0 Alpha 版本已经在 Kylin 官网上,感兴趣的小伙伴可以登录官网下载预览,基础功能包括 Cube 的计算和查询相对来说比较完整。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"预计在本月底或 2021 年初,4.0 Beta版本将正式发布,目前正在开发一些高级功能以及它们跟新的存储及查询引擎的对接,包括 Cube Planner、Dashboard,以及对存储和查询引擎的性能优化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"待明年(2021年)中旬,4.0 GA 的时候还会增加对 Kafka 的数据源的对接,以及实现 Spark 3.0 的升级,其它一些正在规划中的高级功能也会在GA 版本中跟大家见面。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2)Kylin 4.0 性能概览"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下来会主要展示 Kylin 4.0 版本与 3.0 版本在离线的 Cube 计算性能以及在线 SQL 查询性能对比情况。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"离线的 Cube 计算性能"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/d4\/c2\/d48e5807726eccdf09300677683f9fc2.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先来看下离线 Cube 计算性能 ,"},{"type":"text","marks":[{"type":"strong"}],"text":"左图显示了构建时间的对比,使用的是 3.0 的 MapReduce 引擎和 4.0 的 Spark引擎做对比,可以看到 4.0 的构建时间相比 3.0 能减少 30%—50%,也就是说理想情况下 4.0 版本中 Cube 构建速度可以比 3.0 快一倍。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"从存储方面来看,右边图显示 "},{"type":"text","marks":[{"type":"strong"}],"text":"4.0 版本的 Cube 大小比 3.0 能减少差不多 40%"},{"type":"text","text":",因为 3.0 版本中 Cube 数据会存储两份:一份是在 HDFS 上的中间结果用于未来的 segment 合并,还有一份是在 HBase 中用于查询;但 4.0版本中只需用一份 Parquet 数据就可以来做合并以及查询。所以4.0的存储大小大概是 3.0 的 1\/3。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"在线 SQL 查询性能"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/02\/c5\/02c1e1acc69a2141d51c8bd9712bc8c5.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上图中用了典型的两个 Benchmark 数据来做对比,左边是 SSB 数据集,它相对来说是一种比较简单的分析场景。从左图看到大多数简单场景的查询下,4.0 的性能和 3.0 是比较接近,但会略微慢一点,3.0 版本的查询时间大概在 0.5 秒左右,4.0 比 3.0 略微慢了 0.2、0.3秒左右。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"右图是在复杂的 TPCH 模型下,查询会比较复杂,可以看到 3.0 上很多查询响应差不多是在 10 秒或者 20-30 秒之间,但Kylin 4.0 借助 Spark 的分布式计算和分布式聚合,使得 Kylin 的 query 节点的负载进一步被分散(也不再有字典的加载和解码),使得查询性能取得进一步的提升,图上明显可以看到在一些慢的查询上,Kylin 4.0 相比于 3.0 有差不多 10 倍的性能提升。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"总而言之,Kylin 4.0 的性能比 3.0 在小而简单的查询下基本持平,但是在复杂且慢查询下会有非常大幅度的提升。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Apache Kylin 未来展望"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1)Cloud-Native"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"随着云计算的普及,过去可能很多原本只能在 Hadoop 上做的事情有了新的选择。但云随之而来给企业的应用,特别是给 IT 架构带来很大的变化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/b1\/20\/b1825fb5c615e3003ca2b61519a48320.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"现在越来越多的企业正在拥抱云原生(Cloud-Native)架构,目的是让企业能更适应于在云上部署。对于 Kylin 而言,Kylin 也会做计算和存储分离,这样就能让计算资源和存储资源分别独立的弹性伸缩,从而实现资源的更高效利用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另一方面,为了能实现应用之间更好的隔离,促进应用往更高性能、更高稳定性发展,我们也希望对 Kylin 整体架构的调整能更好得在云上部署,来使用云原生的存储计算的框架,以及监控运维的框架。在这方面可以看到最火的就是以 Kubernetes 为代表的容器编排技术,以及云上以 S3 所代表的新的分布式对象存储技术。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2)实时分析(Real-time)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"过去大数据处理大部分是使用批处理来对数据做加工、清洗和聚合;因为批处理相对来说比较简单,以及高吞吐的经济效率比较高。但是批处理最大问题在于数据的延迟比较久,通常在 T+1,就是1天或者是若干个小时之后才能看到这个数据分析的结果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"今天越来越多的企业对数据时效性提出了更高要求,业务希望数据的处理时间能够降低在分钟级甚至是秒级,这些年可以看到实时计算的技术越来越火,包括Spark Structed Streaming、Flink 以及一些其它的流计算框架。对 OLAP 来说,未来我们希望 Kylin 一方面能够继续支持批量的数据加载,另一方面也能支持流数据的处理,实现流批一体化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/cc\/5a\/cca2a11c20bbefe8bbc8461fe0afac5a.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大家可以从 Kylin3.0 版本就能看到,我们统一了流计算和批计算,未来用户只用运维一套 OLAP 平台就可以服务不同的场景。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3)人工智能与机器学习"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"人工智能就是让机器学会人的思考方式,通过模型的训练让机器可以自动的做出判断决策,从而减少人工的投入,提高人类的工作效率。如何才能让 AI 更加的智能,模型更加的准确,成熟度更高?在这背后就需要更多的数据进行训练,而这就是大数据与机器学习所结合的价值所在。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/a9\/ca\/a94ddf53a0d020d5d21e7bf4b8d0d0ca.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kylin 过去主要服务的场景是在 BI 领域,也就是说把数据采集出来,通过对接 BI 赋能分析人员来看到数据中发生了哪些本质的变化。在未来我们希望 Kylin 可以通过这些数据来赋能于 AI,能直接从数据中挖掘出来价值告诉人类,而不是让人通过 BI 来获取这些信息。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"总结"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最后我们总结未来对 Kylin 的期待。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/86\/94\/865195bf15e8b3f7b56d3cc2265d1394.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我们希望 Kylin 能成长为一个云原生的,可以支持批处理以及流处理的分析型数据仓库。一方面能服务于 BI,另一方面可以实现对 AI 的支持,相比于其它技术和引擎来说,它在高性能和高并发上具有明显的优势。未来它的底层架构可以直接运行在轻量级的分布式计算框架上,也可以直接部署在容器上,可以对接多种数据源,包括文件集合,实时流,传统的 RDBMS,或是现在很热门的数据湖。此外未来也希望 Kylin 的部署、运维、监控、扩容、缩容都会变得更加容易,最终也可以让用户的总体成本比以前有一个大幅降低。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/80\/75\/8055f6e6cbdcfd0399d3510822645a75.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"到 Kylin 5.0 的时候,我们希望它能够基于 Cloud-Native 架构再次统一流批 OLAP 分析,并实现对 Machine Learning 及 AI 的支持。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"作者介绍"},{"type":"text","text":":"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"史少锋,Kyligence 合伙人 & 首席软件架构师,Apache Kylin 核心开发者和项目管理委员会主席 (PMC Chair),专注于大数据分析和云计算技术。曾任eBay全球分析基础架构部大数据高级工程师,IBM云计算软件架构师。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"本文转载自公众号apachekylin(ID:ApacheKylin)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文链接"},{"type":"text","text":":"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s?__biz=MzAwODE3ODU5MA==&mid=2653081989&idx=1&sn=7bf3941ef5f63eebf3f800cf92299862&chksm=80a4af74b7d32662914b25b8197633a8d708637eebb5d2a39fb88301583d80a5c676d6fa5434&token=1340822333&lang=zh_CN#rd","title":"","type":null},"content":[{"type":"text","text":"Kylin 5 年的成长与未来规划"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章