同程旅行大数据集群在 Kubernetes 上的服务化实践

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"前言"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同程旅行大数据集群从 2017 年开始容器化改造,经历了自研调度 docker 容器 ,到现在的"},{"type":"codeinline","content":[{"type":"text","text":"云舱"}]},{"type":"text","text":"平台,采用 "},{"type":"codeinline","content":[{"type":"text","text":"Kubernetes"}]},{"type":"text","text":" 调度编排工具管理大数据集群服务。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/fd\/d2\/fd07cba5e9553f999d2c3c5f240d5ed2.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在这个过程中遇到很多问题和难点,本文会向大家介绍上云过程中总结的经验和教训。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"今天的议题主要分下面几点来阐述:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"为什么要将大数据集群服务搬到Kubernetes上"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在上云的过程遇到哪些痛点"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大数据服务上云攻略"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"现状和未来发展"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"集群即服务的理念"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"部门内部很早就提出集群即服务的理念,作为基础组件研发,希望从产品的角度来看待组件或者集群,让业务研发能直接触达底层集群,可以包含节点、日志、监控等功能,让集群使用更简单。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"推行小集群化"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以前组件研发部署一个组件集群,这个集群会陆续承接一些业务,时常会遇到A业务影响B业务,集群负责人会开始考虑拆分,搭建出一个新集群将消耗资源的业务拆分出去。这种是以人工介入的方式去评估业务体量并分配资源。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"现在部门开始推行小集群模式,每个业务研发组都可以申请一个或者多个集群,在物理层面做到资源隔离,互不影响,不会因为A业务的流量上升而影响其他业务。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"自动化运维建设"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"小集群化会导致集群数量成倍的上升,如果不做自动化运维,人力会远远跟不上业务增长,到那时组件研发会淹没在救火和运维的海洋。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所以需要构建一个集群全流程自动化平台。这里面包含服务申请,服务部署,服务运维等功能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/89\/a7\/897a240def1127cb6b6cdfed2ffec4a7.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"如何利用 Kubernetes 利器"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"起初自研编排工具去调度容器,但是实现的东西太多,在人力有限的情况下,认为这条路不可行。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2019年开始采用 "},{"type":"codeinline","content":[{"type":"text","text":"Kubernetes"}]},{"type":"text","text":" 调度编排容器,先后采取过用"},{"type":"codeinline","content":[{"type":"text","text":"Helm"}]},{"type":"text","text":" 工具编写模板部署组件,用"},{"type":"codeinline","content":[{"type":"text","text":"Operator"}]},{"type":"text","text":"的方式管理服务,用Statefulset\/Deployment 部署大数据集群。这些方式最后都被放弃。Helm 只是解决了部署的问题,想要基于 "},{"type":"codeinline","content":[{"type":"text","text":"Helm"}]},{"type":"text","text":" 做平台精细化运维比较麻烦。Operator的理念是针对某个组件做自定义CRD,大数据服务有十几种组件,为每个组件专门定制Operator,运维和开发成本过大,基于此还要解决Operator和平台层的交互逻辑,这个也不适合同程的人力配比。"},{"type":"codeinline","content":[{"type":"text","text":"Statefulset"}]},{"type":"text","text":"和"},{"type":"codeinline","content":[{"type":"text","text":"Deployment"}]},{"type":"text","text":" 没法做到精细化运维,比如业务提出关闭某个指定的点,当业务逻辑和底层运维逻辑耦合在一起的时候,已经封装好的 Workload 并不能拿来即用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由于是大数据生态,同程选择采用"},{"type":"codeinline","content":[{"type":"text","text":"Java Client"}]},{"type":"text","text":" 和 "},{"type":"codeinline","content":[{"type":"text","text":"Kubernetes"}]},{"type":"text","text":" 进行交互,在"},{"type":"codeinline","content":[{"type":"text","text":"Kuberentes"}]},{"type":"text","text":" 上自研 "},{"type":"codeinline","content":[{"type":"text","text":"云舱"}]},{"type":"text","text":" 调度器,将运维侧业务逻辑和平台交互代码放在一起,构建了一套适合自己的大数据服务自动化运维框架,当前覆盖了几乎所有的大数据服务,计算组件有Hive、Presto、Yarn,存储组件有 HDFS、ClickHouse、Kafka、Kudu等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/7a\/e2\/7af40a43862647261e9a2fc5d76ba0e2.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"上云过程遇到了哪些痛点"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Kubernetes 环境问题"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由于大数据组件有很多是分布式存储系统,组件本身会要求客户端和服务端能够网络互通,端到端的建立连接。这就需要"},{"type":"codeinline","content":[{"type":"text","text":"Kubernetes"}]},{"type":"text","text":"容器网络要和外部物理网络打通,当然也可以采用"},{"type":"codeinline","content":[{"type":"text","text":"Proxy"}]},{"type":"text","text":"层来屏蔽底层存储。同程大数据选择构建 "},{"type":"codeinline","content":[{"type":"text","text":"Underlay"}]},{"type":"text","text":" 的容器网络,做到IP保持,容器IP提前分配,IP自动回收等功能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"将"},{"type":"codeinline","content":[{"type":"text","text":"Service"}]},{"type":"text","text":"层网络和公司四层负载 "},{"type":"codeinline","content":[{"type":"text","text":"TVS"}]},{"type":"text","text":" 服务做到很好的集成,利用Endpoints和Service 事件监听来保证负载数据的一致性。由于网络环境的限制,一个机房没有办法只搭建一个Kuberntes集群,需要支持一个应用跨多"},{"type":"codeinline","content":[{"type":"text","text":"Kubernetes"}]},{"type":"text","text":"集群部署,负载服务要支持跨多个"},{"type":"codeinline","content":[{"type":"text","text":"Kubernetes"}]},{"type":"text","text":"集群的应用负载。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"DNS"}]},{"type":"text","text":" 层采用子域的方式做到Kubernetes 内部"},{"type":"codeinline","content":[{"type":"text","text":"CoreDns"}]},{"type":"text","text":" 和公司"},{"type":"codeinline","content":[{"type":"text","text":"DNS"}]},{"type":"text","text":"服务器数据同步,保证一致性,保证内外部域名通信一致。由于一些组件迁移的需求,需要提供在容器拉起来之前预先配置"},{"type":"codeinline","content":[{"type":"text","text":"DNS"}]},{"type":"text","text":"和"},{"type":"codeinline","content":[{"type":"text","text":"IP"}]},{"type":"text","text":"映射的功能,所以只好根据已知的"},{"type":"codeinline","content":[{"type":"text","text":"Pod"}]},{"type":"text","text":"标识,提前分配IP。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"基于Pod的方式管理容器"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"刚开始的时候采用"},{"type":"codeinline","content":[{"type":"text","text":"Statefulset"}]},{"type":"text","text":"来部署一些服务,一些开源的Operator也是基于"},{"type":"codeinline","content":[{"type":"text","text":"STS"}]},{"type":"text","text":"管理服务,比如我正在持续贡献的 "},{"type":"codeinline","content":[{"type":"text","text":"TiDB Operator"}]},{"type":"text","text":" 、"},{"type":"codeinline","content":[{"type":"text","text":"Prometheus Operator"}]},{"type":"text","text":"。虽然可以复用已有Workload的功能,但是当场景复杂,这么做反而会缝缝补补。大数据组件就是这样一个复杂的场景,所以决定采用纯"},{"type":"codeinline","content":[{"type":"text","text":"Pod"}]},{"type":"text","text":"管理容器,基于Pod去组装成 "},{"type":"codeinline","content":[{"type":"text","text":"Group"}]},{"type":"text","text":"。比如HDFS组件,会拆分成 "},{"type":"codeinline","content":[{"type":"text","text":"namenode"}]},{"type":"text","text":" 、"},{"type":"codeinline","content":[{"type":"text","text":"journalnode"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"datanode"}]},{"type":"text","text":" 这三个"},{"type":"codeinline","content":[{"type":"text","text":"Group"}]},{"type":"text","text":",每个"},{"type":"codeinline","content":[{"type":"text","text":"Group"}]},{"type":"text","text":"可以理解为是同一种节点类型的容器。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/6d\/17\/6d9e26daedab4a1a67ab2344324a6b17.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Pod 配置有状态"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"存储组件有个明显的特性就是配置文件中会有一个唯一标识,比如"},{"type":"codeinline","content":[{"type":"text","text":"Zookeeper"}]},{"type":"text","text":"的 "},{"type":"codeinline","content":[{"type":"text","text":"myid"}]},{"type":"text","text":" , "},{"type":"codeinline","content":[{"type":"text","text":"Kafka"}]},{"type":"text","text":" 的 "},{"type":"codeinline","content":[{"type":"text","text":"broker id"}]},{"type":"text","text":"。将老集群逐步迁移到"},{"type":"codeinline","content":[{"type":"text","text":"Kubernetes"}]},{"type":"text","text":"上的时候,这些配置项需要自定义且持久化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/6b\/bc\/6b5e35997a54d35bf7ac782dec5fe3bc.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果组件本身的配置文件格式比较固定,会做成模板化,将特定的配置项抽出来提供给组件研发配置,通过环境变量的方式注入到容器中。对于自定义特别强的组件,会基于"},{"type":"codeinline","content":[{"type":"text","text":"ConfigMap"}]},{"type":"text","text":"做配置的版本控制,让组件研发可以很方便的填写配置并推送配置,"},{"type":"codeinline","content":[{"type":"text","text":"ClickHouse"}]},{"type":"text","text":" 就是非常自定义配置的组件。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"以虚拟机的方式启动容器"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用"},{"type":"codeinline","content":[{"type":"text","text":"Kubernetes"}]},{"type":"text","text":"部署有状态服务的时候,由于配置错误会导致容器反复"},{"type":"codeinline","content":[{"type":"text","text":"crash"}]},{"type":"text","text":",这个时候组件研发只希望快速进入现场排查问题,所以针对存储类组件均采用"},{"type":"codeinline","content":[{"type":"text","text":"tail -F"}]},{"type":"text","text":"的方式启动容器,让服务进程作为后台进程启动,配置完善的健康检查,快速发现节点的不健康性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"这种方式虽然违反了"},{"type":"codeinline","content":[{"type":"text","text":"Kubernetes"}]},{"type":"text","text":"的设计原则,但是易用性会显著提升。在部署Yarn组件的时候,由于"},{"type":"codeinline","content":[{"type":"text","text":"tail -F"}]},{"type":"text","text":"命令为主进程,导致大量僵尸进程,最后改用"},{"type":"codeinline","content":[{"type":"text","text":"bash"}]},{"type":"text","text":"命令启动。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"资源异构问题和多盘挂载问题"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在部署 Yarn 组件过程中,由于机器规格的问题,导致同一个应用节点之间的资源配置不一样,我们设计采用划分资源池,将相同规格的机器分为一个资源池,一个应用根据资源池的配置来调整合适的资源。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在"},{"type":"codeinline","content":[{"type":"text","text":"Kubernetes"}]},{"type":"text","text":"中使用本地盘,一般会推荐"},{"type":"codeinline","content":[{"type":"text","text":"localpv"}]},{"type":"text","text":"的方式,大数据某些组件会采用多盘写入的方式部署,"},{"type":"codeinline","content":[{"type":"text","text":"local pv"}]},{"type":"text","text":"的方式并不能解决这个问题。同程大数据选择采用"},{"type":"codeinline","content":[{"type":"text","text":"hostpath"}]},{"type":"text","text":"+"},{"type":"codeinline","content":[{"type":"text","text":"nodeselector"}]},{"type":"text","text":"的方式来做到多盘绑定且节点不漂移。在提交给"},{"type":"codeinline","content":[{"type":"text","text":"Kubernetes Scheduler"}]},{"type":"text","text":"之前,会在"},{"type":"codeinline","content":[{"type":"text","text":"云舱Scheduler"}]},{"type":"text","text":"基于资源池和节点信息对容器提前做一层调度。起初准备用"},{"type":"codeinline","content":[{"type":"text","text":"hostpath"}]},{"type":"text","text":"+"},{"type":"codeinline","content":[{"type":"text","text":"nodename"}]},{"type":"text","text":"的方式来做到节点不漂移,但是"},{"type":"codeinline","content":[{"type":"text","text":"nodename"}]},{"type":"text","text":" 会跳过 Scheduler update 步骤,并不会进行 "},{"type":"codeinline","content":[{"type":"text","text":"bind"}]},{"type":"text","text":" pvc等步骤。详情可以参考 "},{"type":"link","attrs":{"href":"https:\/\/github.com\/kubernetes\/kubernetes\/issues\/93145","title":"","type":null},"content":[{"type":"text","text":"issue 93145"}]},{"type":"text","text":"。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"DNS 问题"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大数据里面很多组件节点都采用 "},{"type":"codeinline","content":[{"type":"text","text":"hostname"}]},{"type":"text","text":" 作为节点标识,比如"},{"type":"codeinline","content":[{"type":"text","text":"NodeManager"}]},{"type":"text","text":"采用"},{"type":"codeinline","content":[{"type":"text","text":"hostname"}]},{"type":"text","text":"注册,"},{"type":"codeinline","content":[{"type":"text","text":"Hbase"}]},{"type":"text","text":"组件要支持域名反解,"},{"type":"codeinline","content":[{"type":"text","text":"Kudu"}]},{"type":"text","text":"的master节点依赖自身的域名提前通信。这些都违背了Kubernetes的设计理念,"},{"type":"codeinline","content":[{"type":"text","text":"Kubernetes"}]},{"type":"text","text":" 创建容器,CNI分配得到IP,进程启动OK,容器变成Ready状态,Pod的Service域名才能通信。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同程大数据选择用"},{"type":"codeinline","content":[{"type":"text","text":"Host"}]},{"type":"text","text":"网络部署大部分的存储组件,沿用宿主机网络,除了"},{"type":"codeinline","content":[{"type":"text","text":"Kubernetes"}]},{"type":"text","text":"集群子域外再创建一个子域用于组件本身标识,这样组件迁移会很方便,也不有网络损耗的烦恼。但是要做好宿主机端口的管理划分。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"调度问题"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"为了提升资源利用率,"},{"type":"codeinline","content":[{"type":"text","text":"云舱"}]},{"type":"text","text":" 平台会有很多分时段的部署任务和资源销毁任务。比如某个"},{"type":"codeinline","content":[{"type":"text","text":"Yarn"}]},{"type":"text","text":"集群,晚上的时候,对可以混部的资源池打上标签,在晚高峰的时候尽可能的扩容"},{"type":"codeinline","content":[{"type":"text","text":"NodeManager"}]},{"type":"text","text":"。这个类似于"},{"type":"codeinline","content":[{"type":"text","text":"HPA"}]},{"type":"text","text":",由于业务逻辑的复杂性,同程基于自研 "},{"type":"codeinline","content":[{"type":"text","text":"云舱Scheduler"}]},{"type":"text","text":" 做到这一点。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"大数据服务基于Kubernetes的架构体系"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"从"},{"type":"codeinline","content":[{"type":"text","text":"2019"}]},{"type":"text","text":"年开始转向 "},{"type":"codeinline","content":[{"type":"text","text":"Kubernetes"}]},{"type":"text","text":" 到现在,同程已经建立了一套成熟的大数据服务"},{"type":"codeinline","content":[{"type":"text","text":"PAAS"}]},{"type":"text","text":"体系。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/71\/7c\/71a6f060bd2bef106043cdd9369e547c.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基于Kubernetes屏蔽底层的基础设施,支持多机房多"},{"type":"codeinline","content":[{"type":"text","text":"Kubernetes"}]},{"type":"text","text":"集群的应用部署,除了要考虑各种大数据服务如何迁移上云,也要考虑整个平台的易用性,让组件研发无需登录机器进行运维和迁移等操作。同程自研了"},{"type":"codeinline","content":[{"type":"text","text":"云舱"}]},{"type":"text","text":"平台,主要承担这一职责。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"考虑到业务研发的接入成本,学习成本,研发"},{"type":"codeinline","content":[{"type":"text","text":"控制台"}]},{"type":"text","text":"平台,让只读的集群信息和集群管理结合起来。改变以前底层信息触摸不到的情景,让业务研发也能在平台层获取更多的信息,可以对自己的服务做出一些合理的判断。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"监控收集"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用"},{"type":"codeinline","content":[{"type":"text","text":"Thanos"}]},{"type":"text","text":"+"},{"type":"codeinline","content":[{"type":"text","text":"Prometheus Operator"}]},{"type":"text","text":"框架部署收集各个组件集群的监控,按照以下原则来做到监控的可扩展。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一个组件集群对应一个"},{"type":"codeinline","content":[{"type":"text","text":"Prometheus"}]},{"type":"text","text":"节点"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每个组件都对应一套独立的Thanos集群,"},{"type":"codeinline","content":[{"type":"text","text":"Thanos Query"}]},{"type":"text","text":" 聚合同一组件的所有集群,"},{"type":"codeinline","content":[{"type":"text","text":"Thanos Rule"}]},{"type":"text","text":" 通过自研的"},{"type":"codeinline","content":[{"type":"text","text":"Sidecar"}]},{"type":"text","text":"同步组件报警规则,部署独立的"},{"type":"codeinline","content":[{"type":"text","text":"AlterManager"}]},{"type":"text","text":",独立的"},{"type":"codeinline","content":[{"type":"text","text":"Grafana"}]},{"type":"text","text":"应用。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每个组件都有一个ceph bucket,将历史监控数据存储到"},{"type":"codeinline","content":[{"type":"text","text":"Ceph"}]},{"type":"text","text":"中。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/d9\/e6\/d95b82a4f5ab721b4ca6cd4004a270e6.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"监控域名规则配置如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Prometheus: "},{"type":"codeinline","content":[{"type":"text","text":"\/prometheus\/\/"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Thanos Query:"},{"type":"codeinline","content":[{"type":"text","text":" \/thanos\/"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Thanos Rule:"},{"type":"codeinline","content":[{"type":"text","text":" \/thanos\/rule\/\/alerts"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"AlertManger: "},{"type":"codeinline","content":[{"type":"text","text":" \/thanos\/alert\/\/#\/alerts"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Grafana: "},{"type":"codeinline","content":[{"type":"text","text":" \/grafana\/"}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"集群服务日志收集"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用"},{"type":"codeinline","content":[{"type":"text","text":"Filebeat"}]},{"type":"text","text":"采集集群节点的服务日志,将"},{"type":"codeinline","content":[{"type":"text","text":"Filebeat"}]},{"type":"text","text":"容器和服务容器放在一个"},{"type":"codeinline","content":[{"type":"text","text":"Pod"}]},{"type":"text","text":"中,用富容器的方式来启动服务。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/1e\/9f\/1e5fcedb5e3a52e5a469b3072825a39f.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在"},{"type":"codeinline","content":[{"type":"text","text":"Flink"}]},{"type":"text","text":"计算层做日志诊断,提供配置规则动态更新,便于更快速发现集群的故障问题。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"集群生命周期平台化"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一个组件的集群从申请创建到服务销毁中间包含很多环节,应该将这些环节程序并平台化,让基础技术能以平台代码的形式沉淀下来。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下图是用户申请"},{"type":"codeinline","content":[{"type":"text","text":"Hbase"}]},{"type":"text","text":"集群服务的工单,用户在申请的时候只需要填写少量配置。简单就是让业务少思考。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/59\/89\/590c44858e804178214574b719f8d789.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"组件"},{"type":"codeinline","content":[{"type":"text","text":"控制台"}]},{"type":"text","text":"为业务研发侧提供只读信息,例如集群信息、监控、日志、报警等功能,和组件本身管控平台相结合,不提供操作或者运维集群的功能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/2a\/3e\/2ab2a6bbfyyf3722a9a35f627f1ff23e.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"云舱平台"}]},{"type":"text","text":"会为组件研发提供完善的运维和诊断功能,让他们无需关心底层基础设施层。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/72\/81\/7247ab33b6167f3954bbbe427981e381.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"集群服务化后,计费,报警配置,日志诊断能功能都能轻松的集成起来。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/c5\/8b\/c5fb3d5f6a9d37250cb3baefb38ab98b.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"自研大数据云原生服务框架"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"云舱"}]},{"type":"text","text":"平台将服务分为单个容器和多个容器,用数量来区分,在此之上用组装的方式支持多节点类型,一个节点类型对应一个"},{"type":"codeinline","content":[{"type":"text","text":"Group"}]},{"type":"text","text":",这个Group就是一组相同规格的容器。比如Kudu组件就分成两个"},{"type":"codeinline","content":[{"type":"text","text":"Group"}]},{"type":"text","text":",master和tserver两个"},{"type":"codeinline","content":[{"type":"text","text":"Group"}]},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用一个UML图来简单描述代码层结构:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/b3\/3c\/b3caf2f7f04bdd70221fcbdbdc1d9d3c.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"对"},{"type":"codeinline","content":[{"type":"text","text":"Kubernetes"}]},{"type":"text","text":"集群的操作会分解成多个"},{"type":"codeinline","content":[{"type":"text","text":"Task"}]},{"type":"text","text":","},{"type":"codeinline","content":[{"type":"text","text":"Task"}]},{"type":"text","text":"之间有依赖关系,组装成"},{"type":"codeinline","content":[{"type":"text","text":"Job"}]},{"type":"text","text":"发送给"},{"type":"codeinline","content":[{"type":"text","text":"Kafka"}]},{"type":"text","text":",云舱Scheduler进行消费和处理。比如部署一个"},{"type":"codeinline","content":[{"type":"text","text":"Zookeeper"}]},{"type":"text","text":"集群,先创建容器,再创建"},{"type":"codeinline","content":[{"type":"text","text":"Service"}]},{"type":"text","text":"负载,配置"},{"type":"codeinline","content":[{"type":"text","text":"DNS"}]},{"type":"text","text":"策略,配置监控,这是一个完整的部署任务。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/32\/33\/32a181ede3e9a9bfeb9934d9282dc633.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"现状"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"当前同程将几乎所有的大数据服务都采用 "},{"type":"codeinline","content":[{"type":"text","text":"Kubernetes"}]},{"type":"text","text":" 工具部署和调度,有近"},{"type":"codeinline","content":[{"type":"text","text":"400+"}]},{"type":"text","text":"集群服务跑在"},{"type":"codeinline","content":[{"type":"text","text":"Kubernetes"}]},{"type":"text","text":"上, 一个新的组件集群可以在15分钟之内完成交付,极大地减少组件部署消耗的时间。当所有的集群服务被平台化管理后,对于机器资源层的调度和利用率提升的需求越来越明显,同程基于资源监控对组件做混合部署,利用率提升"},{"type":"codeinline","content":[{"type":"text","text":"30%"}]},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大数据底层一般会分为计算和存储,但是随着机器资源越来越多,资源层的研发也是很关键的一环。同程希望将数据,资源,算法流程打通,让数据使用更简单,让数据处理更快更稳定。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"业界有很多公司会考虑将大数据计算任务 "},{"type":"codeinline","content":[{"type":"text","text":"native on Kubernetes"}]},{"type":"text","text":",同程也进行调研和尝试,当前大家都只是解决了部署的问题,任务的完整生命周期还需要研发和测试。所以同程还是着重于 "},{"type":"codeinline","content":[{"type":"text","text":"Yarn"}]},{"type":"text","text":" on "},{"type":"codeinline","content":[{"type":"text","text":"Kubernetes"}]},{"type":"text","text":",一些算法和分析类的"},{"type":"codeinline","content":[{"type":"text","text":"Python"}]},{"type":"text","text":"任务会采用容器调度方式运行。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"未来方向"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同程大数据上云还有很多问题没有去优雅的解决,比如已有服务如何平滑的通过平台的方式迁移上云,现在还有很多中间过程需要资源研发介入。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"未来的方向主要分为:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"采用混部和分时调度,提升集群资源整体利用率。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用混沌工程的方式提升组件稳定性。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"计算任务 "},{"type":"codeinline","content":[{"type":"text","text":"native on Kubernetes"}]},{"type":"text","text":",提供高优保障。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"持续提升PAAS平台易用性。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"让底层资源触手可及。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"作者简介"},{"type":"text","text":":"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"程威,同程旅行数据中心集群研发部,云原生爱好者,github ID: "},{"type":"codeinline","content":[{"type":"text","text":"mikechengwei"}]},{"type":"text","text":",TiDB Operator ,Prometheus Operator 核心贡献者。"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章