基于DPDK实现VPC和IDC间互联互通的高性能网关

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"随着云计算和网络技术的不断发展,越来越多的业务有着上云的需求。上云后,业务能够使用云上已有的服务提升开发效率,也可以利用云平台的弹性伸缩特性,及时应对业务的负载变化。实际生产环境中,用户的服务一部分部署在云平台上,另一部分部署在自己的IDC机房。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用户有从VPC访问IDC中服务的需求,且IDC内的服务需要支持负载均衡。为了实现IDC的平滑上云,必须打通VPC网络到IDC机房经典网络间的互联互通,其中最核心的设备是VXLAN网关,用来完成VXLAN网络和VLAN网络间的映射。虽然可以通过交换机完成VXLAN到VLAN的转换,但是业务的负载均衡需求无法满足。因此,360虚拟化团队根据业务需求,决定自研CLOUD-DPVS设备支持负载均衡、VXLAN隧道、BFD探活等功能,来实现VPC网络到IDC网络的互联互通。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"CLOUD-DPVS网关整体架构"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CLOUD-DPVS工作在VXLAN网络和VLAN网络的中间层,来自VPC网络的用户请求被引流到CLOUD-DPVS网关,进行VXLAN解封装和SNAT\/DNAT处理后,请求被发送至IDC内服务所在的机器上。回包同样会经过CLOUD-DPVS进行SNAT\/DNAT后,进行VXLAN封装发送至VPC,如下图1所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/9c\/9c2c632acd2f919a27e238d93d4ffff2.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"图1"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"CLOUD-DPVS网关整体架构选型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"为了满足高性能,多主部署和负载均衡等需求,360虚拟化团队在经过调研后决定在DPVS的基础上进行开发。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DPVS是一个基于DPDK软件库加速LVS的高性能负载均衡器,通过使用网卡用户态驱动、零拷贝、大页内存和队列绑定等技术解决了LVS的性能瓶颈,同时保留LVS的负载均衡逻辑。基于DPVS,我们只需要在现有逻辑的基础上增加VPC属性,支持VXLAN封装解封装等功能,就可以为每个VPC业务提供虚拟IP来访问IDC内的服务。选型完成后随即启动了cloud-dpvs的项目,其核心架构如下图2所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/00\/00b2e897167699879c49bab20f1aed47.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"图2"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"CLOUD-DPVS网关方案概述"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.高可用方案的改进"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"传统的高可用方案大都采用 BGP+ECMP 的模式构建集群,用 ECMP 将数据包散列到集群中各个节点上,通过 BGP 协议保证单台机器故障后将这台机器的路由动态剔除出去,由此做到了动态 failover,组网结构如下图3所示:"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/a4\/a423bcaa4cd87ced5bc52dc6a3bee3b5.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"图3"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"服务器通过 BGP 将 VIP 发布到网络中,交换机学习到 VIP,形成 BGP 等价多路径路由(ecmp),然后根据哈希因子计算得到 hash lb key,进行 ECMP 下一跳链路数(Member-count)求余计算,再与 ECMP 基值进行加法运算得出转发下一跳 index,即确定了下一跳转发路由到达服务器。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"这种方式要求服务器必须部署在三层网络中,需要与交换机建立BGP连接。另外服务器集群出现问题后,其恢复时间受限于BGP协议的收敛时间,这个时间值一般是秒级,根据以往在现网环境中的经验,集群的收敛时间一般在6~9秒左右。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"为了提高收敛时间和减少对环境的依赖,360虚拟化团队对上面提到的两点进行了改进,引入BFD协议将收敛时间减少到毫秒级别,并在VPC网络中增加调度器使其不再依赖底层网络就可以把流量hash到各台服务器上,如下图4所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/ac\/ac72b3257a343215548e9b9183ef177d.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"图4"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"支持BFD探测"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"BFD"},{"type":"text","text":"(Bidirectional Forwarding Detection) 双向转发检测协议,提供了一个通用的标准化的介质无关和协议无关的快速故障检测机制。具有以下优点:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"1.对网络设备间任意类型的双向转发路径进行故障检测,包括直连物理链路、虚电路、隧道、MPLS LSP、多条路由路径以及单向链路等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2.可以为不同的上层应用服务,提供一致的快速故障检测时间。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"3. 提供小于1s的检测时间,从而加快网络收敛速度,减少应用中断时间,提高网络的可靠性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"利用BFD的优点,VPC内的机器会周期性向CLOUD-DPVS各个节点发送BFD探测报文,根据探测结果动态更新hash计算结果,选择可用的CLOUD-DPVS服务器,从而保证服务的高可用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在CLOUD-DPVS中,我们实现了BFD协议处理模块,并将其挂载在INET_HOOK_PRE_ROUTING。当数据包进入CLOUD-DPVS后,优先判断其是否为BFD报文,并回复相应状态的BFD报文,如STATE_INIT\/STATE_UP。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2.负载均衡转发层适配"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"FULLNAT模式"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DPVS支持NAT、Tunnel、DR和FULLNAT等几种负载均衡模式,但是 NAT、DR、Tunnel这三种模式都有一定的限制,需要一定的环境的支持,才能正常的工作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"1.DR模式的限制:RS跟Director Server必须有一个网卡在同一个物理网络中"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2.TUNNEL必须在所有的realserver上绑定VIP"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"3.NAT:RS的网关必须指向LVS所在服务器"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而FULLNAT模式通过对入报文做了DNAT+SNAT,即将报文的目的地址改为RS的地址,源地址改为DPVS设备地址;RS上不需要配置路由策略,回包到达DPVS设备后做SNAT+DNAT,即将报文的源地址改为DPVS设备上的VIP地址,目的地址改为真实的用户地址,从而规避上面三种方式的限制,使其组网更加的灵活,适应性更强,因此CLOUD-DPVS网关使用的是FULLNAT模式。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"引入VPC的概念"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DPVS社区最初设计的应用场景是IDC的经典网络,并不适用于云化场景。云化场景与IDC经典网络最核心的区别是:经典网络提供的是多用户共享的网络,而VPC提供的是用户专属的网络。VPC内用户可以任意定义云主机的IP地址,而在经典网络模式下,大家挤在一个二层网络里面,IP地址首先要保证不能重合。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/4e\/4e7033ea41fbc461499f3f27aa7f81b1.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"图5"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如上图5所示,VIP:PORT可以唯一表示一个服务,服务后面挂载了多个实例,但在云化场景下,其VIP地址是可以重复的,所以仍然使用VIP:PORT来表示一个具体服务就行不通了。因此CLOUD-DPVS转发层面不能继续沿用DPVS原有的处理逻辑,为了解决这个问题研发团队引入了租户VPC的概念,把服务和VPC关联起来,不同的VPC可以有相同的VIP:PORT,这也与实际使用方式相吻合,改造后就变成了VXLAN+VIP:PORT来表示一个服务,具体如下图6所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/8d\/8df4c05cb94bcd3e962dc0e73c976c20.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"图6"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"cloud-dpvs转发原理"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"为实现其转发功能,在cloud-dpvs上新增了服务器信息表和虚机信息表,其中服务信息表由vxlan和VIP:PORT以及RS:PORT等信息组成,表格式如表1所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"embedcomp","attrs":{"type":"table","data":{"content":"

Vxlan

VIP

vPort

RS-IP

Port

96

172.16.25.13

80

10.182.10.13

80

96

172.16.25.13

80

10.182.10.23

80

101

172.16.25.13

8080

10.182.20.2

80"}}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"   表1:服务信息表"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中Vxlan+VIP+vPort代表VPC内的一个公共服务,用户VPC私有网络内的客户端可以通过访问VIP+vPort来使用其公共服务,客户端感知到VIP是私有网络内的私网IP地址,由cloud-dpvs实现私网VIP到IDC网络地址的映射,其映射关系和后端Real-Sever对用户无感知。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"虚拟信息表由vxlan、虚机IP、虚机MAC以及宿主机IP地址组成,表格式如表2所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"embedcomp","attrs":{"type":"table","data":{"content":"

Vxlan

虚拟IP

虚拟MAC

宿主机IP

96

172.16.25.30

fa:16:3f:5d:6c:08

10.162.10.13

96

172.16.25.23

fa:16:3f:5d:7c:08

10.162.10.23

101

172.16.25.30

fa:16:3f:5d:6c:81

10.192.20.2"}}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"   表2:虚拟信息表"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在虚拟化VPC网络中,由于用户可以按需自定其私有IP地址,所有不同VPC内的虚机IP地址可能存在重复,故无法以IP地址来唯一确定一台服务器或者虚机,在cloud-dpvs的实现中采用了Vxlan+虚机IP的方式来表示虚机或者服务器。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"VPC内的虚机访问VIP:vPort的流量通过OVS内流表规则引流到Vxlan隧道后流量到达cloud-dpvs网关,网关根据查询服务信息表查询服务挂在的后端Real-Server,然后再根据一定的调度算法把流量分发给具体的Real-Server服务器,具体路径如下图7所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/4f\/4f31edda8f3c58e0302c2271fb4d2ddc.png","alt":"图片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"图7"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"流量路径说明: "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"embedcomp","attrs":{"type":"table","data":{"content":"

1

VPC内Client发起对VIP:vPort的访问请求

2

流量经过OVS时,会根据流表规则把流量引入Vxlan隧道进行vxlan封装,其外层目的IP地址为Cloud-dpvs网关的IP,外层源IP地址为虚机所在的宿主机IP地址

3

流量到达Cloud-dpvs网关后,对隧道报文进行解封装,提取报文的vxlanId及内层的目的IP和Port,由三者即可确定一个VPC内的公共服务

4

Cloud-dpvs根据公共服务上配置的调度算法选择其上挂载的Real-Server服务器,然后对报文进行映射修改报文的目的IP地址和端口并对映射关系进行记录,最终把流量引流到IDC网络的一台物理服务器

5

后端Real-Server对报文处理完成后,按照源流量路径会把流量返回给Cloud-dpvs网关

6

Cloud-dpvs收到后端Real-Server返回的报文,首先查找映射关系表,获取会话所属的公共服务,并对报文的源目IP头进行修改,修改后的目的IP即为虚机IP地址

7

根据公共服务里的VxlanId和报文的目的IP地址可以确定唯一一台虚机,根据虚机信息表可以获知虚机MAC地址以及虚机所在的宿主机IP

8

根据获取虚机MAC和宿主机IP对报文进行封装Vxlan头操作,虚机MAC地址为内层目的MAC,宿主机IP为外层目的IP,源IP地址使用Cloud-dpvs的IP地址,封装后的vxlan隧道报文经过underlay网络的转发,报文会到达虚机所在宿主机

9

vxlan隧道报文在虚机所在宿主机上按照OVS流表进行解封装,并按照OVS流表的转发规则把报文送入对应的VPC内的Client"}}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"增加VXLAN模块"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"云化场景下,所有来自VPC的请求都进行了VXLAN封装。CLOUD-DPVS在转发层实现了VXLAN模块,用于解封VPC发来的VXLAN数据包,保证转发给IDC中服务器的为解封后的数据包,使得后端服务器对于VPC无感知。当后端服务器回包到达CLOUD-DPVS后,再由CLOUD-DPVS封装好VXLAN头部并转发给VPC。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"CLOUD-DPVS后续工作"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前CLOUD-DPVS打通了VPC到IDC的路径,后续将继续实现IDC到VPC的打通。目前VXLAN模块是在转发层软件实现,下一步会计划使用智能网卡实现OFFLOAD相关工作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"参考文章:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"1. "},{"type":"text","text":"https:\/\/github.com\/iqiyi\/dpvs"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2."},{"type":"text","text":" https:\/\/yq.aliyun.com\/articles\/497058"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"3. "},{"type":"text","text":"https:\/\/tools.ietf.org\/html\/rfc5880#section-6.8.5"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文转载自:360技术(ID:qihoo_tech)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文链接:"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/cAwnpEnaCOBv5NWP3dGfsA","title":"xxx","type":null},"content":[{"type":"text","text":"基于DPDK实现VPC和IDC间互联互通的高性能网关"}]}]}]}

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章