从MySQL到AWS DynamoDB数据库的迁移实践

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在FreeWheel的核心业务系统中,我们使用MySQL来存储数据。但随着数据量的不断增加,原有数据库已经无法满足如今的业务需求。经过前期大量的调研,我们决定将MySQL中的部分表迁移到AWS Dynamodb中。本文主要介绍从关系型数据库平顺迁移到非关系型数据库的实践经验。"}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"业务挑战"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"最初我们使用asset表来存储客户的视频库存信息,但是随着时间的推移,系统中的asset表体量越来越大。目前,asset表以及相关附属表已经占用了全部数据库50%以上的存储,服务中使用的表联查操作以及复杂SQL操作都会使数据库的性能骤降,从而导致应用服务性能变差。在此情况下,我们不得不开始考虑拆表或者数据库迁移,其中拆表的方法并不能长久地解决这个问题。同时为了提升性能以及扩展性、降低成本,我们最终选择将asset及其相关表迁移出MySQL数据库。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"主流非关系型数据库对比及选型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"由于我们的业务需求要求在高并发下的读写速度以及良好的可扩展性,并且不需要强一致性,所以我们最终决定使用非关系型数据库来存储asset以及相关数据。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在非关系型数据库中,我们选取了几种主流的数据库进行对比。这里列出其中应用较为广泛的MongoDB以及DynamoDB进行对比,如下表所示。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"embedcomp","attrs":{"type":"table","data":{"content":"

比较基础

MongoDB

DynamoDB

简介

MongoDB是最著名的文档存储之一。

DynamoDB是Amazon提供的一种可扩展的托管NoSQL数据库服务,具有将数据存储在Amazon云中的功能。

数据库结构

MongoDB使用JSON类的文档来存储无模式数据。在MongoDB中,不需要预定义的结构来存储文档的集合。

在DynamoDB中,表由项目集合组成,并且每个项目都是属性的集合。主键用于唯一标识表中的每个项目,还用于DynamoDB中的辅助索引,以提供更大的查询灵活性。

高可用性

集群容错,自动化灾备机制。

基于云服务的完善的灾备容错监控能力。

安全

MongoDB的默认情况下会在未启用身份验证的情况下进行安装。

通过使用用户名和强密码启用用户身份验证。 DynamoDB中的安全性更安全,并且通常由可用的AWS安全措施提供。"}}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"根据上述对比,基于DynamoDB有着更加完善的安全服务及灾备容错能力,并且与FreeWheel的AWS云服务相匹配,因此我们最终决定选用DynamoDB作为迁移的数据库对象。下面主要介绍下DynamoDB的技术特性。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"DynamoDB技术特性"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"AWS DynamoDB是一种完全托管的无服务器(Serverless)类型的NoSQL数据库,可以通过HTTP API来使用。同时它提供了托管的内存缓存,比较适用于需要存储大量数据并且同时要求低延迟的应用服务。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"DynamoDB有几个关键概念,它是由表(tables)、数据项(items)和每项数据的属性(attributes)来构成的。 表是数据项的集合,不同类型的数据项都可以放到一张表里。下图展示了这些关键概念的构成关系。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/b8\/b8409afedb0c9dd1d49244c06064af36.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"每条数据(item)在表里就是一条记录,包含了多个属性(Attributes)。在表里,每条数据由主键(Primary Key)唯一确定。每条数据类似于关系型数据库表中的某一行或者多行的集合。数据的属性组合成了每条数据,每条数据由多个数据属性构成。属性类似于关系型数据库表中的列。DynamoDB要求每一项数据都至少包含构成该数据主键的属性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"表中的每项数据由主键唯一标识。在创建表的时候,必须定义由哪些属性构成主键。除了必要的主键以外,DynamoDB还提供附加索引(Secondary Index)来满足不同的查询模式。比如我们经常会用到的GSI(global secondary index)"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":","},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"使用不同的属性来构成索引达到更高效的查询。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"迁移方案设计"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"从关系型数据库转变到非关系型数据库,我们需要重新定义新的数据模型。在设计新模型时,主要需要考虑的是新表中每项数据的属性以及迁移后的数据模型能否继续支持原有的业务需求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"与关系型数据库不同的是,DynamoDB中的表类似于表的集合,经常会用来存储不同类型的数据,所以在结合DynamoDB的的特性以及原有的数据特点以及业务需求,我们将MySQL中的数十张表统一成了一张表,将之前不同表的不同colomn进行了重新整合,定义为新表中的属性,具体如下图所示。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/ec\/ec67a7dae6864c2432a57532eb4186d3.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在迁移每张表的过程中,首先我们将原来在MySQL中需要迁移的相关表的SQL语句都整理了出来,利用之前所设计的主键以及附加索引将这些SQL语句对应到DynamoDB中各个API。下面以asset表中的一些字段为例。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/dd\/dd09cca04e875bb96d1fe65074b45c1f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"如上图所示,在MySQL中asset表有name、description等列,asset_group_assignment表中有assetId、groupId等列。在迁移到DynamoDB后,这些列变成了每条item记录的属性值,同时从上图中也可以看到其数据存储类型的改变,例如原来asset表中name这一列存储的是varchar类型,groupid与assetid都为bigInt类型,到DynamoDB中分别对应为String类型和Number Set类型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在对新的数据表结构以及模型定义完成后,我们还需要定义其中各种属性的主键以及根据我们的业务需求来定义其中的附加索引。比如在MySQL中我们有这样的业务场景,select * from asset where xx_id = '123' ,如果xx_id不是主键的话,我们就需要将xx_id这一属性定义成为附加索引来满足我们的查询需求。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"用户无感知平顺迁移的实现"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在部署上线的过程中,为了确保数据库迁移过程的服务质量,并且让用户对此做到无感知,我们花了很大功夫将整个迁移过程分为大致三个步骤(如下图所示):"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/53\/53bb0c4ec342da10e1003a89a4177f45.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"数据迁移:"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"首先先将MySQL中的数据进行迁移到DynamoDB中,这时所有的流量还读写原来的MySQL;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"数据同步:"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"接下来我们部署了一个后台job专门用于将MySQL的数据同步更新到DynamoDB中,这样两边的数据就保持了一致;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"流量切换:"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"之后便可以让一些只读的应用服务来在DynamoDB与MySQL之间切换流量进行测试,从而验证数据迁移的正确性;最后就是一些读写的应用服务来进行流量的切换,我们通过程序中添加一个runtime的开关来实时的进行逐步的流量切换。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"为了保证在迁移过程中做到不停服的效果,我们保留了所有传统 MySQL 的业务逻辑,程序中通过runtime的开关来判断当前系统是读写 MySQL 还是 DynamoDB。所有的上层服务都会支持这个逻辑从而判断开关的状态进而判定读写的数据源是 MySQL 还是 DynamoDB。而开发人员则可以通过实时更新开关的状态,从而在遇到问题的时候,及时在两个数据源MySQL与DynamoDB之间进行切换,从而避免用户问题的产生。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在流量切换过程中,分为三种状态:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/84\/8409d2bde7eb3bad19d86fdd4f5db603.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"第一个状态是开始切流量之前此时所有服务的读写还在 MySQL 中,DynamoDB 可以看作为一个 back up 的数据库。在这个阶段中,我们将所有写入 MySQL 的数据同步到 DynamoDB 中。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"接下来,我们将流量逐渐从 MySQL 中切换到 DynamoDB 中。如果是关闭开关的流量,所有应用服务还是会读写 MySQL,并将 MySQL 的数据同步到 DynamoDB 中。如果打开开关的流量,则所有应用服务都会读写 DynamoDB 并且将 DynamoDB 的数据同步回 MySQL,从而保证 MySQL 和 DynamoDB 中的数据是一致的,以应对出现问题后的迁移回滚操作。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"最后,在迁移后并测试验证后,这时所有应用服务流量都切换到了 DynamoDB,此时 DynamoDB 的数据仍然会同步到 MySQL,这时 MySQL 就可以看作另一个 back up数据库以备不时之需。至此,我们就完成了整个数据迁移工作。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"迁移中遇到的问题及解决方案"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"关系型与非关系型数据库不论是在数据存储类型上还是对数据的操作上都存在着很大差别,这就导致我们在对数据库操作的接口实现上会有明显的不同。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"下面主要列出我们在实践过程中所发现的由于两种数据库的特性的不同之处所带来的一些变化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"embedcomp","attrs":{"type":"table","data":{"content":"

区别

MySQL

DynamoDB

数据类型

BigInt, TinyInt, Int, varchar,enum...

number, string, set, map, list...

SQL

支持

不支持

默认值

支持

不支持

大小写敏感

不敏感

敏感

自增ID

支持

不支持

唯一键

支持

不支持"}}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"存储类型的变化"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"由于我们的核心业务系统使用的语言是Golang,所以在从MySQL到DynamoDB的迁移实现过程中,由于数据存储类型的变化,微服务程序中需要重新按照DynamoDB中的数据类型重新定义数据结构。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"NO SQL 的转变"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在迁移的具体实现中,首先我们将原来在MySQL中需要迁移的相关表的SQL语句都整理了出来,利用之前所设计的主键以及附加索引将这些SQL语句对应到DynamoDB中各个API。这个过程中我们发现NoSQL带来的性能提升还是很大的,比如原来在MySQL中一个更新需求涉及到多张表可能需要建立几个甚至更多的数据库链接,而在DynamoDB中只要一个数据库操作就能完成整条记录的更新。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"默认值的变化"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在MySQL中是有默认值的,而在DynamoDB是没有默认值存在的,如果不传某种属性的写入,该条记录则没有对应属性。为了MySQL中所留下的默认值的业务需求,我们在DynamoDB的写入时也做了相应的处理,具体如下图所示。如果该属性的类型是string时, 当没有传入这种属性时,默认写入Null值,如果该属性的类型时int,当没有传入改属性时默认写入0。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/e8\/e8c5c005f092b34b2bd5481a41d80cd7.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"大小写敏感的变化"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在迁移前的业务系统的在查询过程中是大小写不敏感的(linux系统下MySQL默认情况是大小写不敏感的),在迁移之后,DynamoDB是默认大小写敏感的,因此为了仍然能够满足大小写不敏感这一业务需求,我们专门为需要大小写不敏感的属性改成了全部小写作为一个新的属性定义在存储结构中来满足需求"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"自增ID的变化"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"DynamoDB 不支持自增 ID, 但是我们传统的业务需要支持,所以我们需要在业务层面加了一张表来实现自增 ID。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"除了上述由于数据库特点不一致所带来的实现上的变化之外,我们在迁移的过程中也发现了一些由于DynamoDB的限制所引发的一些问题。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"数据一致性问题"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在并发测试的过程中,我们发现了这样一种现象。以下图为例,当有两个请求同时操作一条记录asset1时,我们预期的结果是asset1的groups在两个请求之后在原有的基础增加两个请求所添加的值,但实际上只添加了一个。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/8c\/8c95f889486a01ef5d83530519696ba8.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"这个现象是由于请求2本该读到的记录应该是请求1更新之后的记录,但因为两个并发请求同时读到的都是更新之前的记录,所以最终更新成的值也就不是我们预期的值。说到底,其实就是想要达到强一致性读的效果,但实际上是最终一致性。因为DynamoDB 使用的是最终一致性读取,虽然它也提供了一个 ConsistentRead 参数来支持强一致性读取,但是只有主键支持,全局二级索引是不支持强一致性读取的。所以我们在表中加了version这一属性来控制同时写入的顺序问题。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/74\/7419e1adadd433d6993e79b5f560e8cc.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"GSI delay 导致的问题"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在开发完成后做压力测试时,我们发现调用创建新记录的接口总是会出现失败的情况。原因是当客户端发起创建新记录的请求后,服务端会先在主表中创建数据,然后会通过GSI拿到新创建的这条记录。在这种情况下,有万分之五的概率会拿不到新创建的数据,因为DynamoDB主表到其GSI的同步过程存在延时(如下图所示),AWS官方给出的数据是豪秒级的延时。针对这一问题,我们在服务端增加了重试逻辑,如果没有拿到新创建的数据,最多会重试三次。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/71\/719d56cf6f04e2e23eacbe5e7fe71d60.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"DynamoDB数据大小的限制"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在极限值的测试中我们发现,在更新一个asset的别名属性时,其属性的类型是数组,当其个数超过1000个的时候会发生更新失败的现象。通过查阅DynamoDB的官方文档,我们发现对于DynamoDB的每个属性的value,DynamoDB都是有大小限制的,占用内存不能超过400KB。当然这只是在测试极限值时发现的问题,实际业务中并不会出现这样的情况,但为了以防出现问题,我们也在实际的业务中添加了验证的业务逻辑,并提前通知了客户这一变化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"DynamoDB 的事务问题"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"起初我们使用DynamoDB官方提供的TransactWriteItems API来处理多张表同时更新的事务问题,示例代码如下图所示。但在并发测试的过程中我们发现,如果同时操作非常多的记录的情况下,服务会报错。原因是目前DynamoDB的事务还不支持超过 25 个以上的 item 写入操作。所以当遇到要同时操作25个以上item的写入时,我们放弃了原生提供的事务方法,通过加悲观锁以及补偿的方式实现了此种业务需求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/d6\/d6fbed5a601feb8e8bd285a7c510b649.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"DynamoDB的 Cost 问题"}]}]}]},{"type":"embedcomp","attrs":{"type":"table","data":{"content":"

类型

价格

特别情况

WCU 写入容量单位

每百万WCU $1.25

事务双倍

RCU 读取容量单位

每百万RCU $0.25

强一致性读双倍"}}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在使用DynamoDB时一定要注意花销问题。如上表所示,DynamoDB中每百万写入"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"容量单位"},{"type":"text","text":"WCU花费1.25$, 每1KB数据的写入会花费1WCU, 如果是事务会加倍。每百万的读取容量单位RCU花费0.25$,每4KB的读会花费0.5个RCU,如果是强一致性读会加倍。所以在使用DynamoDB时,如果不是必须的操作,需要尽量避免使用强一致性读,并且通过尽可能将多次写操作合并为一次操作来减少写入的花销。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"结语"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"通过团队的共同努力,我们在数个月的时间内完成了从MySQL到DynamoDB的数据存储迁移,也见证了迁移之后所带来的应用服务及数据库性能所带来的巨大提升,下图为迁移前和迁移后的同一接口的请求时间对比,可以看到迁移前Duration平均为90ms,而迁移后的Duration降为平均50ms,降低了近50%。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/8c\/8c90cefdaaf9dddfc8ff4f95deb15a8d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在完成迁移后,我们也不断发现一些问题,例如跨数据库的transaction处理以及对DynamoDB的数据进行复杂查询等等,未来我们也会针对这些问题继续探索解决办法并不断改进。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"作者介绍:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"岳京典,毕业于北京邮电大学,目前就职于FreeWheel核心业务团队。致力于Golang系统开发、微服务架构等,热衷于新技术的分享与探索。"}]}]}

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章