字节跳动10万节点HDFS集群多机房架构演进之路

原創

2021-07-20 07:03

组件	多机房方案
ZooKeeper	一个 ZK ensemble 由 5 台 server 组成，这 5 台 server 分布在 3 个机房，分布比例为 A:B:C = 2:2:1
BookKeeper	一个 BK cluster 通常由 14 台 server 组成，分布在 2 个机房，分布比例为 1:1
DanceNN	一个 NameService 包含 5 个 DanceNN，这个 5 个 DanceNN 分布在 2 个机房，分布比例为 3:2，工作模式为 1 active + 4standby"}}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在实现上，这里面的关键就是 DanceNN 的 editlog 机房写策略，因为 DanceNN 在做主备切换的时候，如果 editlog 没法保持同步，那么服务是不可用的，得益于 BookKeeper 的机房感知的数据放置功能，DanceNN 可以通过这些策略来完成双机房容灾方案。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"常态下，editlog 会以 4 个副本存放到 BookKeeper 上，这 4 个副本的机房分布比例为 1:1。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"容灾场景下，DanceNN 可以快速切换成单机房模式，editlog 依然以 4 个副本存放，但是存储策略变为单机房存储，历史的 editlog 也能正常消费。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"旁路系统"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"前面已经介绍完了 HDFS 双机房方案的主体设计，但是事实上一个方案的推进落地除了架构上的迭代演进之外，还需要一系列的旁路系统来配合支持，包括："}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Balancer：需要感知机房放置"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Mover：需要保证数据的实际放置满足多机房策略"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"运维系统"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 federation 架构下，多个 nameservice 需要保证切主的效率"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"运维操作预案：提前预判相关可能的故障，并且能在运维系统上执行"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"业务的平稳过渡方案，尽可能少地减少对业务干扰"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"限于篇幅，本文不会对这些展开细节描述，感兴趣的同学可以再交流。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"多机房"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"HDFS 多机房架构是对双机房架构的扩展，其研发直接动机是机房的资源供应短缺问题，例如 2020 年 B 机房几乎就没有资源供应，但是在公司新的主机房 C 却有较为充裕的资源。一开始我们是尝试将 C 机房作为一个独立的集群提供服务，但是发现业务的血缘关系太过复杂，迁移成本太高，因此选择了基于双机房机房扩展到多机房的方法，该方案需要满足这些需求："}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"合理使用跨机房带宽"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"兼容已有的双机房方案"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"迁移成本尽可能小"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"符合字节跳动的机房级别容灾标准"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最终的设计方案为："}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"数据放置策略支持多机房，同时兼容已有的双机房放置策略"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"NameNode 的容灾方案策略不变，因为在多机房架构下，HDFS 依然只保证一个机房范围的故障容灾"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"相应的旁路系统也做相应的调整，尽管 HDFS 底层提供了数据放置在多个机房的策略，但是在离线场景中，用户只能选择 2 个机房存放，例如 A\/B， B\/C，A\/B，这个运营上的策略选择是综合考虑了稳定性、带宽使用的合理性以及资源的合理利用之后确定的，核心目标还是保障业务的平稳发展，从后续实践下来看，这个策略是一个非常正确的选择。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"结语"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根据我们的不完全调研，字节跳动 HDFS 的多机房架构在业界中是有自己独特的路线，这个中原因主要还是公司业务高速发展和机房建设方向在业界中也是独树一帜的，这些因素驱动 HDFS 进行自己独特迭代演进，从结果来看是达到预期，例如 2020 年 C 机房的充分使用，在 B 机房没有资源供应的情况下依然保障了业务的平稳；2021 春晚活动，为近线业务例如 ByteMQ、流式 CheckPoint 等提供了多机房的容灾策略保障。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最后，HDFS 的多机房架构依然在持续迭代，中长期来看，不排除有更多新机房的出现，这些都给 HDFS 多机房架构提出更多的挑战，原来多机房方案的基础条件不再具备，因此 HDFS 团队已经开启相关功能的迭代，敬请期待！"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文转载自：字节跳动技术团队（ID：toutiaotechblog）"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文链接："},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/_AumiAx9wbpOZCGMdf0Arw","title":"xxx","type":null},"content":[{"type":"text","text":"字节跳动10万节点HDFS集群多机房架构演进之路"}]}]}]}