业界前所未有:10分钟部署十万量级资源、1小时完成微博后端异地重建

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"机房断电、数据中心着火,极端情况下全站持续不可用已经成为很多公司不得不直面的现实问题。微博的目标是在遭受极端情况下在线数据完全损毁时,1 个小时内在异地重新构建完整的微博服务,同时确保数据完整性。这在整个业界都是一个前所未有的巨大挑战。"}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"大数据时代数据至关重要"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"数据时代全球每天新产生的数据达到2.3EB,存量数据达到"},{"type":"link","attrs":{"href":"https:\/\/www.datanami.com\/2020\/09\/04\/10-big-data-statistics-that-will-blow-your-mind\/","title":null,"type":null},"content":[{"type":"text","text":"33ZB"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",无论是传统企业还是新晋独角兽企业,都在基于大数据进行更快、更好的决策支持,从数据中孵化新的产品与服务,同时降低成本。可以说,"},{"type":"link","attrs":{"href":"https:\/\/www.seagate.com\/files\/www-content\/our-story\/trends\/files\/data-age-2025-white-paper-simplified-chinese.pdf","title":null,"type":null},"content":[{"type":"text","text":"数据就是生产力"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"。一旦出现数据丢失问题,对于企业来说是毁灭性的,据IDC统计数据,有高达"},{"type":"link","attrs":{"href":"http:\/\/www.zaibei.net\/ziliao\/0F350212016.html","title":null,"type":null},"content":[{"type":"text","text":"84%"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"的企业在遭遇严重数据丢失后的2~3年内退出了市场,随着企业对数据依赖程度的递增,这个比例会变得更高。在数字化信息化时代,没有哪个组织能够从无法快速恢复的数据灾难中全身而退。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/57\/57f341b7f929af9fdc3ad1d1ef14f132.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/medium.com\/@syedjunaid.h47\/what-is-big-data-why-is-big-data-important-in-todays-era-8dbc9314fb0a","title":null,"type":null},"content":[{"type":"text","text":"图片来源于medium"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在过去几年里,黑天鹅事件层出不穷,"},{"type":"link","attrs":{"href":"https:\/\/www.sohu.com\/a\/150656574_813379","title":null,"type":null},"content":[{"type":"text","text":"机房大面积断电"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"、整个可用区不可用等意外时有发生。就在2021年3月9日欧洲最大公有云服务提供商"},{"type":"link","attrs":{"href":"https:\/\/www.ovh.com\/","title":null,"type":null},"content":[{"type":"text","text":"OVH Cloud"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"由于一场大火导致整个IDC被毁灭,"},{"type":"link","attrs":{"href":"https:\/\/www.reuters.com\/article\/us-france-ovh-fire-idUSKBN2B20NU","title":null,"type":null},"content":[{"type":"text","text":"数百万的网站不可用"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"。在2015年的天津大爆炸事件中,腾讯亚洲最大的IDC"},{"type":"link","attrs":{"href":"https:\/\/www.sohu.com\/a\/78018744_259978","title":null,"type":null},"content":[{"type":"text","text":"离毁灭仅仅只有1.5公里"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",更不用提因为各种人为操作失败导致的数据丢失。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"极端情况下全站持续不可用已经成为一个现实问题,在数据容灾领域,唯一能确定的就是数据的易失性。所以,各大公司都在构建自己的数据容灾体系。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/9a\/9ae513ec5edc4ad180a20c08f4d5e457.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"BACKUP "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"是数据容灾的黄金法则,一切数据都是通过备份提升冗余度来容灾。数据的重要程度、恢复的时效性,决定了数据的备份策略。低级别的日志类数据一般采取单机离线冷备,重要数据则采用多副本热备,而影响公司命脉的核心数据通常采用321备份策略,即:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"至少3个副本"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"2个不同的存储介质"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"1个offsite"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"2012年,美国计算机应急响应组(US-CERT)"},{"type":"link","attrs":{"href":"https:\/\/us-cert.cisa.gov\/sites\/default\/files\/publications\/data_backup_options.pdf","title":null,"type":null},"content":[{"type":"text","text":"推荐321备份策略"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",里面特别提到了异地备份对于从自然灾害或者严重故障恢复的重要性。于同城多活、异地多活、冷热结合等备份策略,都是321规则的实现或者变体。但多活策略一方面大多是onsite的热备设计,另外一方面业界缺乏产品化的解决方案,微博部分核心业务实现了同城与异地多活,投入了巨大的人力与资源成本,所以在全站级别容灾时,微博选择了异地快速重建的方案。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"微博数据容灾1小时异地构建方案"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"由于微博的社交媒体属性等业务特点,其对于数据丢失带来的不可用的容忍度远低于一般公司。对于社交媒体属性非常强的公司来说,数天不可用,基本就等同关站。因此,对于微博而言,备份只是手段,快速恢复才是实际需求。我们需要在遭受极端情况下在线数据完全损毁时,"},{"type":"text","marks":[{"type":"underline"},{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"1个小时"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"内在异地重新构建完整的微博服务,同时确保数据完整性。这在整个业界都是一个前所未有的巨大挑战。为此,微博构建了数据恢复中心,为全站数据提供数据备份与极速恢复服务。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"截止到2020年10月,微博月活跃用户达"},{"type":"link","attrs":{"href":"https:\/\/finance.sina.cn\/chanjing\/gdxw\/2020-10-20\/detail-iiznezxr7023868.d.html?from=wap","title":null,"type":null},"content":[{"type":"text","text":"5.23亿"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",每天产生的各类数据已达PB级,核类在线业务存量数据达到100PB级别。这些数据分散在微博的数百个独立的业务当中,为了满足不同用户的数据展示场景,这些数据会以各种形式存在,包括图片、视频、镜像文件、MySQL、Redis、原始的二进制文件等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"面对微博复杂的业务场景、超大规模的数据量,微博数据恢复中心需要同时权衡可用性、经济成本、安全性、效率。在整个设计过程中,所有的设计策略都围绕恢复时效性展开,把数据与恢复链路中涉及的所有环节都进行标准化与自动化。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1 数据分级标准化"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"按照80\/20原则,数据的重要性是有差异的。如果不加区分的恢复100PB级别的数据,无论是从成本上还是效率上都是无法承受的。微博从垂直与水平两个方向对数据的优先级进行拆分:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"垂直层面:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"按业务的重要程度划分核心与非核心。核心业务的确认相对复杂,通常需要公司层面的领导拍板,但核心业务变更的频率非常低。微博上千个对外提供的API,核心API只有十几个。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"水平层面:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"数据的访问是有时效性的,在微博场景下表现更加明显,7天内的数据访问量超过98%。部分超过1年的数据被访问的吞吐基本维持在个位数甚至是零,简单的使用吞吐量作为数据的访问热力值,通过热力值对数据进行二次分级。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"通过垂直与水平数据分级之后,核心热门数据的数据规模下降两个数量级到PB级别,这使得整个数据在1小时内重建成为了可能。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2 数据资源服务拓朴构建自动化"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"有了数据分级标准后,需要按照分级标准找出备份哪些资源。数据是由服务生产的,所有的数据都会归约到一个特定的服务,因此数据的备份转化成了服务的备份,这样就可以通过追踪流量路径依赖的方式,发现流量路径中的服务节点,从而完成个服务网络拓朴图的构建。拓扑中服务依赖关系的一个核心准则是:核心业务不依赖非核心业务。避免核心业务拓扑扇出不可控,影响数据备份的范围与可用性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"微博服务的注册通过"},{"type":"link","attrs":{"href":"https:\/\/github.com\/weibo-mesh","title":null,"type":null},"content":[{"type":"text","text":"weibo-mesh"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"启动完成后自动注册构建,形成服务依赖关系拓扑,开源或者第三方资源类的服务包括MySQL、Redis类的数据,微博通过resource mesh agent发现并自动注册到所属的服务池。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3 数据备份标准化与自动化"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"不同的数据类型,备份方式各有不同。包括流量最开始入口四七层的配置、RPC服务需要备份的二进制版本(镜像URL)和MySQL、Redis等业务数据。所有接入数据恢复中心的服务,都需要提供snapshot+streaming两个API。Snapshot用于生成数据的快照,streaming提供自上一次快照以来产生的所有的operation。服务提供API后,数据恢复中心就能自动备份数据,实现备份与业务的解耦合。目前数据恢复中心提供了常用数据类型的snapshot与streaming的API,相关业务只要上线,即可纳入数据恢复中心进行备份。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4 数据备份服务化:微博的321备份机制"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"数据备份两个最基本的要求是数据的一致性与数据完整性。单个文件的数据一致性通过数据摘要进行动态存储验证,使用纠错码有效处理bit反转静默存储错误。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"对于数据的完整性,微博使用大块存储结合Merkle Hash Tree来解决。所有的数据文件,都拆分或者合并成1GB的一个数据块(1GB大小的块是一个最佳实践值,过小网络传输效率低,过大单个块传输耗时长,不利于提升并发效率)。一个完整的数据备份元数据由四元组构成,数据备份服务提供API可以进行全网备份或者指定业务与数据类型的备份。备份API与业务无关,与数据类型无关。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"数据备份利用snapshot进行全量备份,使用streaming支持动态增量备份。我们采用了watch变化的机制,数据容量累计到一定程度或者超时则会把增量数据做一次checkpoint。snapshot通常是天级别,checkpoint一般是小时级。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在微博使用的资源中,一类是类似Redis,读写量非常高,增量数据产生的operation streaming量非常大,但这类资源的单实例容量一般控制在10GB级别,可以提升snapshot的频率,降低两个checkpoint之间的数据量以提升恢复效率。另一类是MySQL,单实例容量非常大,通常在TB级别,写入量较低,可以降低snapshot的频率,提升checkpoint的频率,以在存储成本与数据恢复效率上达到平稳。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"数据服务服务通过checkpoint机制,将数据备份转换成了数据块的存储与备份。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"微博在数据备份上遵循了321策略,1)所有的数据备份都至少包含2个热备,2个冷备;2)在线热备数据存储在SSD设备,冷备数据存储在独立的OSS集群;3)数据会在异地离线存储,通过专线进行数据同步与传输。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"数据备份服务存储中心选择的是在云原生场景下应用广泛的对象存储OSS。在逻辑上,恢复中心由管理端与存储端组成,且二者逻辑上是独立的。存储端支持多物理存储,具体来说,支持在物理上各个机房内的自己OSS存储集群,同时还支持接入云厂商的OSS服务;由管理端来统一调度。管理端本身亦支持异地多活。自建存储共8个集群,每个小集群7台存储主机,单机容量20T,单集群容量 140T, 8个集群 1120 TB。集群对外网络带宽为100Gbps。自建存储集群示意图如下,能够支撑PB级别的数据备份与快速读取。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/5f\/5f799ce0a8b7b77940a777fadd103547.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"5 数据恢复服务化:1小时异地重建"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"全站异地构建主要是应对极端灾难,所有站内(onsite)的数据几乎都处于不可用状态,所有的数据恢复与构建都围绕1小时展开。服务构建的每一个环节都需要进行自动化处理,整个重建过程包括以下几个部分:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"第1分钟:实现系统"},{"type":"link","attrs":{"href":"https:\/\/en.wikipedia.org\/wiki\/Bootstrap","title":null,"type":null},"content":[{"type":"text","text":"自举"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"第13分钟:基础设施的部署与准备"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"第53分钟:服务启动与数据分发"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"第58分钟:服务自检、自动注册与负载均衡变更"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"第60分钟:完成流量迁移"}]}]}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"恢复服务系统自举过程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"通过部署在offsite的event-listener收到恢复事件后,首先开启备份数据自检流程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"主要包括:"}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"加载对应版本的备份元数据。元数据包含备份的整体统计信息,包括服务的版本、数量、规格以及服务依赖,用于在服务恢复时构建拓朴信息及加速分发。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"恢复混合云平台服务DCP。DCP作为混合云的IaaS层,后续所有的物理机资源都通过DCP创建。为了减少DCP恢复时的依赖,提升DCP的恢复速度,DCP部署涉及的所有的二进制包、存储等支持单机部署,所有实例都部署在一台高配机器(256C\/2TB内存)上,可以做到1分钟以内完成DCP的恢复。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"自动完成基础设施部署"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"DCP启动后作为IaaS设备混合云提供平台,借助云厂商5分钟扩容2000台神龙裸金属的能力,相当于8万台ECS服务器。DCP启动后,读取备份元数据中IaaS设备的规格、数量,快速扩容指定数量机器,并完成包括Docker、kuberlet等初始化(初始化过程需要花费2min左右时间)。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"服务启动与数据分发"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"完整的服务启动与数据分发包括服务拓朴解析、服务镜像拉取与启动以及数据分发等几个阶段。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"服务依赖树解析:"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"offsite解析模块开始解析灾备待恢复服务元数据,将服务依赖关系解析到服务依赖树。各个服务树全速并行恢复,服务与资源按照存储在拓朴图中的距离就近甚至同机部署,最大程度上提升带宽吞吐,在机器上挂载磁盘时每业务一块盘,提升整体磁盘顺序写入IO带宽。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"服务恢复流程"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"微博构建了以Kubernetes、Docker为代表的容器调度混合云平台,为业务提供serverless服务。容器调度平台可以快速适配待恢复服务所需运行时环境。为了解决待恢复服务对CPU、内存、磁盘、带宽等五花八门运行时环境的诉求,我们将其抽象提炼到规格,根据规格匹配锁定IaaS层节点设备,在锁定节点上拉取镜像,启动容器服务。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"业务数据分发"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在服务恢复过程中,当容器启动后,对于同一份数据需要分发多次的场景,例如服务的应用的可执行文件、镜像,使用P2P的分发方式,多点并行分发,大幅提高分发效率,同时降低存储server的负载。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"服务自检、自动注册与负载均衡变更"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"所有纳入到weibo-mesh管理的服务及资源,都实现了标准的回调API:服务注册、健康检查、预热以及启动心跳保持。服务在启动完成首先会健康检查,健康检查通过后调用预热接口,避免系统冷启动导致的大量超时;冷启动完成后,服务实例会自动注册到分布式配置中心(微博自建的服务:vintage),启动心跳汇报功能完成服务的注册。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"四、七层的启动过程与普通的服务并无差异,以nginx为例:在nginx完成部署部署启动后,"},{"type":"link","attrs":{"href":"https:\/\/github.com\/weibocom\/nginx-upsync-module","title":null,"type":null},"content":[{"type":"text","text":"nginx-upsync-module"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"会自动从分配式配置中心vintage同步其相关的配置并进行动态reload,完成upstream的同步更新与backend的注册。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"流量迁移"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"数据恢复中心通过检查分配式配置中心vintage并与服务拓朴进行比对,所有服务启动完成后,则变更出口DNS,以实现流量的迁移,完成最终的异地构建。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"后记"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"微博数据恢复中心涉及十余个独立系统的密切交互,包括:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"1)IaaS层的基础设施管理DCP,抹平了公有云、自建IDC等异构基础设施差异,向上提供5分钟2000台机器的闪电交付能力,后续基于神龙裸金属可以提供5分钟相当于8万台机器的交付能力;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"2)把所有的物理机资源看成一个超大的CPU、内存与存储池,由KRS(自研的基于K8S的容器编排调度系统)实现服务的全网调度,有效利用了超大规格机器(目前重点使用256C\/2TB规格)百Gb级别的高带宽吞吐等能力,实现全网服务与数据的极速分发;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"3)把Redis、MySQL、RPC、代码、二进制和四七层等所有都看成资源,提供RaaS(Resource as a Service)的抽象,每一类资源实现标准的backup、prestop、poststart、register等接口,都能够自动接入数据恢复中心,做到整个过程零参与;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"4)微博的热点联动系统在系统完成冷启动后,自动依赖流量的变更,快速扩容,完成业务的重建过程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"整个异地重建项目过程中,每一个环节都可能会成为瓶颈,包括数据备份的可靠性、基础设施的自动化部署与快速构建、基于P2P的数据快速分发等等。本文限于篇幅未能充分展开描述,受限于成本,目前微博也未在全站维度进行1:1规模进行数据异地构建与恢复的验证,接下来一段时间,微博的基础设施建设会围绕此展开。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"作者介绍"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"本文作者"},{"type":"text","text":" "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":":微博研发中心基础架构部 姚四芳、胡云鹏、臣勇、胡春林"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"姚四芳"},{"type":"text","text":" "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":":微博技术专家,基础架构策略负责人,"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#262525","name":"user"}}],"text":"2012年加入微博。经历并主导微博数次架构变迁,设计并支持亿级别日活用户的基础架构服务,支撑春晚等极端峰值流量。主要的技术方向为分布式存储及跨地域多IDC高可用服务优化。近期专注于大规模分布式集群的治理与优化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"臣勇:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"微博资深架构开发工程师,2018年加入微博;主要技术方向为分布式缓存、KV存储服务,近期主要负责数据备份与恢复服务。"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章