大促突围:京东到家基于Canal的数据异构设计

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"今天的内容分享将主要包含以下四个方面:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"介绍京东到家的订单履约业务背景"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在业务背景的基础上说明订单的底层存储方案"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基于存储方案的数据异构设计与实践"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"面向复杂度的架构设计方法论"}]}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"简单来看,面向复杂度的架构设计方法论与上面3个部分没有直接关系,把这部分内容放到文章中来讲,主要是因为数据异构本质上也是解决了软件的写入复杂度问题。在这个基础上,我们向上抽象一层,来讨论一下面向复杂度的架构设计方法论。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"一、京东到家订单履约业务背景"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/6f\/6fc3617ff74bc59dd5667ee1c55b1632.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"从用户提交订单到服务履约系统,我们大致经历了支付、下发商家、商家确认、订单打印、拣货、下发物流、配送、妥投等环节。这是一个基本的新零售履约流程。这里,我标蓝了一些流程。比如:下发商家、订单打印等环节。主要是因为这些环节是我们要和商家交互的功能点,当我们把订单下发给商家时,首当其冲的环节是商家要确认这个订单,并且开始履约流程。但是,在我们的实际业务中,商家在大促期间往往会出现履约瓶颈,忙到看不到我们下发的订单,甚至不忙的时候也会看不到我们的订单已经下发到他们的系统中,商家需要一个提示功能。这也就是我们的提示音需求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"提示音需求需要不断的查询底层存储ES,并提示给商家有订单到达了,需要他们去履约,如果商家没有看到,就不断查询,不断提示。就是这样的一个循环查询量级,在大促期间,订单量级增大,查询量级增大。基本上每次大促都会把我们的ES查到CPU飙高,甚至出现不可用的情况。为了保护履约系统,我们做的临时方案是做一个功能开关,在大促期间对提示音功能降级。可是这样的降级并不是我们想要的。因为最终商家还是收不到提示。导致履约质量下降。于是我们就面临一个问题“存储组件无法支撑大促时提示音业务的查询请求量。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"二、底层数据源的职责分工"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/d4\/d4f009bf65bba287fade744db1584ba8.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"要解决我们面临的查询量级问题,就必须首先分析一下底层的存储方案。以上,是我对到家订单履约系统底层存储的一个整体概括。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.Redis"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Redis在履约系统中主要承载的一个职责是worker跑批任务的存储和查询。因我们在系统中大量运用了跑批任务来实现最终一致性的一个设计,而Redis的Zset结构正好满足了这样的需求,将时间作为分值,不断的提供近期任务的查询是Redis充当的根本职能。这里解释一下Redis为什么没有承载过多的查询职能。Redis虽然性能更好一些,但是,在数据量和查询复杂度上,没有ES支持的好,关键点是我们的查询条件复杂度是比较高的,所以,Redis没有承载过多的查询职能。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.MySQL"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MySQL在履约系统存储中的职能是持久化存储订单数据,这里主要还是使用其强大的事务机制,以保障我们的数据是正确写入的。这是其他的两个组件所不支持的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"从履约流程上来看,我们将数据做了冷热分离,热点数据是我们在履约中的订单(也就是未完成的订单),而完成的订单,由于其使用率不会太高,所以,我们称之为冷数据。这样的一个拆分也就是上图中对应的业务库和历史库。业务库是热库,而历史库则是冷库。这样的一个冷热分离思想,使我们的单库单表数据量级维持在千万级别。从而避免了对应的分库分表复杂度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"从部署架构上看,我们对业务库进行了大量的主从分割。其中biz slave是我们的业务库从库,它也会承载一些履约中的订单查询职能。接下来的big data slave集群则是大数据抽数据用做统计分析。最后的delay slave设置延迟一定时间消费binlog则是为了防止master被误操作而兜底的。比如有人错误执行了删除db的命令,这样的一个延迟消费的机制就可以利用binlog进行兜底回滚。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.Elasticsearch"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ES在数据存储中承担了几乎所有的查询职责,这主要取决于它支持复杂查询,并有天然分布式的特点。在数据量复杂度解决方案上,避免了MySQL分库分表的复杂度。这里我们一共有3个ES集群。其中HOT ES和Full ES也是进行了冷热分离,这样对我们的查询流量进行拆分。有助于保证履约系统的稳定性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而第三套集群Remind Elastic Cluster则是为了解决我们上述提示音的问题。在有提示音集群之前,我们所有的提示音查询流量都是打到热集群的。也正是这样的一个访问量需求,导致了我们的热集群时有发生CPU飙高,接口响应缓慢,卡顿业务线程。所以,我们对热集群进行了进一步的拆分,于是就正式提出了提示音单独集群的方案。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"三、写入复杂度问题"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/7b\/7b0c642dc6e498094c0aace7952c7bed.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"当确定冗余一套提示音集群以后,我们面临的问题就是上述这样的一个写入复杂度问题,从图上来看,我们在拆分这套集群之前,订单中心每次操作一次订单写入。面临的是3个数据源的写入工作,这对研发人员是非常不友好的,维护难度过大。于是,我们就开始考虑用异构中间件的方式来去写入这套ES数据。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"异构中间件的优势是屏蔽了数据同步的复杂度,但是随之而来的是数据写入链路可靠性、及时性等问题。而且,数据传输本身一般都具有高可用的需求,之前高可用在业务应用上,因为业务应用的集群方式本身是计算高可用的。但异构中间件则要在这高可用、可靠性、及时性三个维度上满足我们的要求。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"四、数据异构产品选型"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/d6\/d6835cbefd2f23eddc10c166b1905ccf.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在上述分析之后,我们也陆续调研了一些异构产品,在数据类型支撑上没有太大差别,常用的存储组件,这些异构中间件都是支持的。所以,我们更在意以上3个指标。社区活跃度代表了后续的维护性以及开源产品快速的问题响应,可用性方面的需求是非常强烈的,最终采用Canal的根本原因还是在学习成本和熟练度上。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"五、Canal简介"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"这里简单说一下我对于Canal的理解,以便于后续有意向应用Canal的同学有一个简单的了解。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/3b\/3bb879cab90c27daf4506cb28cabddae.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Step1 Load&Store:"},{"type":"text","text":" Connection从Zookeeper 获取到当前消费的binlog filename和position信息。随后将该信息附带到dump协议里,mysql master开始推送binlog数据。Binlog经过Parser解析投递到Sink,Sink则承载了过滤消息的作用,过滤掉没有订阅的binlog事件,最终把消息存储到Store中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Step2:Send&Ack:"},{"type":"text","text":"用任务worker的方式,不断扫描store,最终将store中的数据发送到目的地,目的地可以是具体的存储,也可以是mq产品。图中,我用了kafka也主要是因为我们的实践方案。投递消息完成之后将消息ACK给Store组件。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Step3:Update MetaInfo:"},{"type":"text","text":"这个时候数据虽然发送了。但是,我们的元信息binlog的filename和position仍然没有更新,在这个操作上,Canal仍然采取了异步的方式去同步该信息。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Canal这种异步通信的设计要求你的系统必须具备可回溯、重试、幂等、延迟特点。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以上,是我对整个Canal的一个理解,图中的两个HA,后续将会和大家说到。这里,我讲Canal的工作的角色、运作规则都是从一个4R视角来说明的,这是为了后续来讲复杂度方法论的时候大家也好理解一些。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"六、提示音功能基于Canal数据异构实践"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/75\/75e6c8281f099783bc4feaa0d5f076f0.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"提示音异构生产部署方式如上图。我们部署了两台Deployer用于数据传输的高可用。同时把消息投递到了kafka,利用adapter的集群部署进行批量消费,插入到提示音集群的ES中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在顺序性保障上采用了订单id hash的策略,保证在partition上是有序的。这样也就保证了在业务操作上是整体有序的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在链路上采用kafka来传输,主要还是应对大促期间binlog数据量级的特点,保证插入到ES之前有缓冲buffer的一个作用。这也是直连方式的弱点,直连方式在大数据量短时间写入时,对目的地存储组件有可能会造成瞬间的大量插入,从而损耗目的地存储组件的资源,可能影响到业务使用。但是,长链路也有数据延迟的缺点,如果对数据时效要求比较高的业务。还是建议用直连方式来搭建对应的异构方案。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在META Manager上使用Zookeeper来存储,与Deployer的HA形成有效配合。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"问题一:(网络环境问题)kafka不可用"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/0c\/0c070192442cb0f110c25a77de6fdad1.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在实践中,我们遇到第一个比较有代表性的问题是kafka集群不可用,直接导致ES数据断层,从而影响到商家的履约体验。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先,kafka集群所在的网络环境和机器主机发生问题,deployer的store数据存满,直接导致delay了8个小时。提示音没有提示,也会有电脑端的管理系统同步订单,但是需要人工刷新,所以,过了很久我们才发现这样一个问题。紧急把访问切到之前的ES热集群,之后,我们重新把kafka服务部署到可用状态,数据虽然慢慢追上了,但是原来在kafka中没有被adapter消费的一部分数据却丢掉了,这主要还是因为设置的kafka落盘频率问题。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/36\/36654543b75e902dd31c2534ffdc7606.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"丢数据在数据异构的需求中是不可容忍的事情,索性这次事故基本上锁定了丢数据的原因,所以,我们将Zookeeper中的jouralName和position设置到对应的事故之前的位置,将数据重新跑到ES中,至此问题解决。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"总结一"}]}]}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/43\/43326dbbec2fb5cb8514f6bb5a39bd53.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"总结二"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/a8\/a8e182280d2ff869322ec114c929fd56.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"至此,总结以上两点,数据异构的实践在问题监控、报警、及时降级方面是非常重要的。希望这样的总结经验能够让大家少走弯路。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"问题二:Deployer故障,自动HA"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/42\/423222dc53dbf76c815bb419feb6af58.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Deployer机器发生故障,系统自动HA到备机,任务得以继续消费。总起来说,问题二并没有给我们的业务带来任何的损耗,但是,还是比较经典的一个案例。这主要反应出来,对于数据异构这样一个需求。它的链路上所有环节,基本上都是有高可用的要求的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/d7\/d7922acc62d9de90cdead79e3858b437.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Canal一共提供两种HA,其中Deployer的HA是靠Zookeeper的临时节点和重试机制实现的,而Mysql的HA则是靠一个单独的线程不断的Detect来实现的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是MySQL的HA,只能用GTID的模式,这是因为Mysql master和slave的binlogfile name、position是不一样的。如果用master的binlogfilename和position去slave发送dump协议,这会出现无法匹配的问题。但是GTID是全局有序的,这也就保证了Mysql的HA只在GTID模式下才可用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/ce\/ce6697a4ef643a0cea2470096a7843f0.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"谈到高可用,提出上述总结。这里我与大家互动了一个问题:“单机器部署两台Canal实例是否算是高可用?”答案是:“不算高可用,原因是单机部署了两台Deployer,但是机器如果故障,两台Deployer均不可用。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"问题比较有代表性,也有一些同学掉进了坑里。这里我与大家一起回顾一下高可用的范围:多机器、多机房、多地区、多国家。范围越大,高可用自然越是稳定。但是带来的成本和数据传输要求也越高,一般都是根据业务量级和重要程序进行取舍的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"总结三"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/db\/dbb72e5c3eb42b128144058e4438f762.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以上就是我在数据异构中的一些经验教训。下面我们将问题向上抽象,聊一聊面向复杂度的架构设计方法论。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"七、面向复杂度的架构设计方法论"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.4R模型"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/03\/03f087fc228437f0e445ca1c0070d1ef.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大家是否发现,我在和大家聊Canal或是到家的数据异构方案时,更多的都是以角色、关系、规则这种描述方法。相信大家也不是第一次碰到这种描述方式,在很多的架构中,都是这样的一种描述方法。就像上图中,说到的Parser、Sink、Store。这些角色的职责是什么?他们是如何配合完成Canal这样的一个产品功能的呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4R模型本质上就是一个视角,它是Rank、Role、Relation、Rule这4个单词的首字母构成,它强调了描述方法、也强调了我们要用这样的视角来看待我们的系统。这样整体来看,系统会更加清晰和简洁。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.区分复杂度"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/ec\/ecab787a914f8ba3164c344d9fe91b10.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如上图,将复杂度问题分为技术方向和业务方向两个部分,其中灰色的部分,一般都有一些开源软件来帮我们解决,比如Dubbo、Spring、Canal等。而红色的部门正是我们日常工作中所不能避免的复杂度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"这些复杂度问题,如果平时不加以重视,忽视掉的复杂度问题最终则会演变成为不可维护的技术债务,最终打掉系统的可维护性,只能重新推倒重来了。很多重构行为都是因为复杂度的忽视累积而成的后果。所以,学会如何区分复杂度就是比较重要的点了。比如这次Canal的数据异构,同时面临了数量级复杂度和写入复杂度两个。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.复杂度的架构设计环"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/0c\/0c2c88e2ea68eb5a8823958868fdf975.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同样,面向复杂度的架构设计方法论,最终会归结到业务实现上。下面描述一下具体的步骤含义:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"1-需求:"},{"type":"text","text":"产品同学提出需求描述,或一句话需求、或完善的PRD文档"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2-判断:"},{"type":"text","text":"对需求进行判断,需要什么样的数据量,什么样的峰值,是否要高可用等等,如果不能理解清楚,则找对应的需求人员不断澄清,直到清晰为止"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"3-复杂度识别:"},{"type":"text","text":"将需求精确化以后,对需求的复杂度问题进行识别,比如业务复杂度问题、数据量复杂度问题"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"4-拆解到备选架构:"},{"type":"text","text":"针对识别出来的复杂度设计出多个具体的架构方案。比如采用ES存储数据屏蔽分库分表的数据量复杂度、采用数据异构的方式写入数据,从而屏蔽数据写入的复杂度。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"5-取舍:"},{"type":"text","text":"对备选架构进行取舍,任何的架构方案都有好的一面和坏的一面。在不同的时间都有不同的选择,这里建议大家从简单、合适、演进三个架构原则来进行方案取舍,选择最适合自己的那一套方案。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"6-架构方案4R:"},{"type":"text","text":"用4R视角来设计系统分层、角色、关系、规则。以这种视角设计出来一套抽象模型"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"7-实现需求:"},{"type":"text","text":"将4R架构模型实践即可。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/7b\/7b1c28ba813f33652dc4b9bafb7877d4.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同样,将本需求的一个架构设计环案例呈现给大家。(由于部分设计有保密性,4R此处用Canal4R代替)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以上,就是我和大家分享的全部内容啦,谢谢大家!"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Q&A"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Q1:订单表中,如果有一些商品id,那么同步到ES中也是id吗,不会关联出name打成宽表存到ES吗?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"A1:"},{"type":"text","text":"具体的映射字段需要在Adapter中配置映射即可,存入到ES中的情况也与配置的映射是直接且唯一关系。是否宽表要在实际应用中把控字段的个数。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Q2:Canal部署deploy主从和canal-adapter有没有遇到官方的bug?有,改动了哪些?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"A2:"},{"type":"text","text":"遇到过Column not match的异常.具体看Canal的TSDB来解决。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Q3:这套复杂度方法论如何落地到实际应用?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"A3:"},{"type":"text","text":"需要对系统进行4R视角拆分、识别复杂度类型并按照架构设计环的方式来评定需求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Q4:平时的Canal有消息延迟吗?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"A4:"},{"type":"text","text":"有一定延迟的,binlog的数量、网络等因素,都会造成一定的延迟,所以,建议异构还是要建立在业务数据可延迟的基础上的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Q5:我主要用canal-adapter读取Kafka中的binlog日志然后写到数据库中,Kafka中有多个表的日志,我rdb目录下的yml文件只配置了一个表的为什么其他的表也会同步?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"A5:"},{"type":"text","text":"Yml的作用是配置映射关系,具体的过滤职能在Deployer的Sink配置。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Q6:异构数据是直接同步原表吗,还是做了关联?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"A6: "},{"type":"text","text":"做了关联,直接在Adapter中配置对应映射关系即可。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Q7:请问为什么不直接增加热集群的节点和分片,而是重新建一套ES集群呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"A7: "},{"type":"text","text":"这里主要还是一个数据拆分的思想,如果通过提高配置来解决访问量问题,那么,随着业务量级增加,流量混在一起,对应的ES集群流程会呈现不可评估的情况。本质上还是一个数据存储职责的问题。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Q8:如何保证Zookeeper的高可用?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"A8: "},{"type":"text","text":"Zookeeper本身就是高可用的,如果想在机房或异地方面做高可用,建议做主备同步、多集群部署等手段。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Q9:新集群的查询请求峰值是多少?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"A9: "},{"type":"text","text":"大约2000-4000 QPS。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Q10:怎么把握冗余尺度呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"A10: "},{"type":"text","text":"冗余的维度在机器、机房、地区、国家是不断增加的。维度越大,对应的高可用方案越可靠,但是,对应的费用以及实现复杂度也会变高。因为这种冗余方案肯定会有数据copy。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"嘉宾介绍:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"张磊,"},{"type":"text","text":"京东到家高级研发工程师。"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"8年+软件研发经验,曾先后就职于链家地产、互动吧、寺库网等公司,任研发人员或团队leader,在解决业务系统设计落地方面拥有丰富经验;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"现就职于达达-京东到家,主要负责订单履约、金额拆分、计费相关业务域的研发设计工作。业余时间主攻软件复杂度优化。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文转载自:dbaplus社群(ID:dbaplus)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文链接:"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/juMj7eUKbFKX0xbyQ8781g","title":"xxx","type":null},"content":[{"type":"text","text":"大促突围:京东到家基于Canal的数据异构设计"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章