基於DPDK實現VPC和IDC間互聯互通的高性能網關

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着雲計算和網絡技術的不斷髮展,越來越多的業務有着上雲的需求。上雲後,業務能夠使用雲上已有的服務提升開發效率,也可以利用雲平臺的彈性伸縮特性,及時應對業務的負載變化。實際生產環境中,用戶的服務一部分部署在雲平臺上,另一部分部署在自己的IDC機房。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用戶有從VPC訪問IDC中服務的需求,且IDC內的服務需要支持負載均衡。爲了實現IDC的平滑上雲,必須打通VPC網絡到IDC機房經典網絡間的互聯互通,其中最核心的設備是VXLAN網關,用來完成VXLAN網絡和VLAN網絡間的映射。雖然可以通過交換機完成VXLAN到VLAN的轉換,但是業務的負載均衡需求無法滿足。因此,360虛擬化團隊根據業務需求,決定自研CLOUD-DPVS設備支持負載均衡、VXLAN隧道、BFD探活等功能,來實現VPC網絡到IDC網絡的互聯互通。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"CLOUD-DPVS網關整體架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CLOUD-DPVS工作在VXLAN網絡和VLAN網絡的中間層,來自VPC網絡的用戶請求被引流到CLOUD-DPVS網關,進行VXLAN解封裝和SNAT\/DNAT處理後,請求被髮送至IDC內服務所在的機器上。回包同樣會經過CLOUD-DPVS進行SNAT\/DNAT後,進行VXLAN封裝發送至VPC,如下圖1所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/9c\/9c2c632acd2f919a27e238d93d4ffff2.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖1"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"CLOUD-DPVS網關整體架構選型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了滿足高性能,多主部署和負載均衡等需求,360虛擬化團隊在經過調研後決定在DPVS的基礎上進行開發。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DPVS是一個基於DPDK軟件庫加速LVS的高性能負載均衡器,通過使用網卡用戶態驅動、零拷貝、大頁內存和隊列綁定等技術解決了LVS的性能瓶頸,同時保留LVS的負載均衡邏輯。基於DPVS,我們只需要在現有邏輯的基礎上增加VPC屬性,支持VXLAN封裝解封裝等功能,就可以爲每個VPC業務提供虛擬IP來訪問IDC內的服務。選型完成後隨即啓動了cloud-dpvs的項目,其核心架構如下圖2所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/00\/00b2e897167699879c49bab20f1aed47.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖2"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"CLOUD-DPVS網關方案概述"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.高可用方案的改進"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"傳統的高可用方案大都採用 BGP+ECMP 的模式構建集羣,用 ECMP 將數據包散列到集羣中各個節點上,通過 BGP 協議保證單臺機器故障後將這臺機器的路由動態剔除出去,由此做到了動態 failover,組網結構如下圖3所示:"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/a4\/a423bcaa4cd87ced5bc52dc6a3bee3b5.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖3"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"服務器通過 BGP 將 VIP 發佈到網絡中,交換機學習到 VIP,形成 BGP 等價多路徑路由(ecmp),然後根據哈希因子計算得到 hash lb key,進行 ECMP 下一跳鏈路數(Member-count)求餘計算,再與 ECMP 基值進行加法運算得出轉發下一跳 index,即確定了下一跳轉發路由到達服務器。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這種方式要求服務器必須部署在三層網絡中,需要與交換機建立BGP連接。另外服務器集羣出現問題後,其恢復時間受限於BGP協議的收斂時間,這個時間值一般是秒級,根據以往在現網環境中的經驗,集羣的收斂時間一般在6~9秒左右。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了提高收斂時間和減少對環境的依賴,360虛擬化團隊對上面提到的兩點進行了改進,引入BFD協議將收斂時間減少到毫秒級別,並在VPC網絡中增加調度器使其不再依賴底層網絡就可以把流量hash到各臺服務器上,如下圖4所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/ac\/ac72b3257a343215548e9b9183ef177d.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖4"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"支持BFD探測"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"BFD"},{"type":"text","text":"(Bidirectional Forwarding Detection) 雙向轉發檢測協議,提供了一個通用的標準化的介質無關和協議無關的快速故障檢測機制。具有以下優點:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"1.對網絡設備間任意類型的雙向轉發路徑進行故障檢測,包括直連物理鏈路、虛電路、隧道、MPLS LSP、多條路由路徑以及單向鏈路等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2.可以爲不同的上層應用服務,提供一致的快速故障檢測時間。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"3. 提供小於1s的檢測時間,從而加快網絡收斂速度,減少應用中斷時間,提高網絡的可靠性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"利用BFD的優點,VPC內的機器會週期性向CLOUD-DPVS各個節點發送BFD探測報文,根據探測結果動態更新hash計算結果,選擇可用的CLOUD-DPVS服務器,從而保證服務的高可用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在CLOUD-DPVS中,我們實現了BFD協議處理模塊,並將其掛載在INET_HOOK_PRE_ROUTING。當數據包進入CLOUD-DPVS後,優先判斷其是否爲BFD報文,並回復相應狀態的BFD報文,如STATE_INIT\/STATE_UP。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2.負載均衡轉發層適配"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"FULLNAT模式"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DPVS支持NAT、Tunnel、DR和FULLNAT等幾種負載均衡模式,但是 NAT、DR、Tunnel這三種模式都有一定的限制,需要一定的環境的支持,才能正常的工作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"1.DR模式的限制:RS跟Director Server必須有一個網卡在同一個物理網絡中"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2.TUNNEL必須在所有的realserver上綁定VIP"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"3.NAT:RS的網關必須指向LVS所在服務器"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而FULLNAT模式通過對入報文做了DNAT+SNAT,即將報文的目的地址改爲RS的地址,源地址改爲DPVS設備地址;RS上不需要配置路由策略,回包到達DPVS設備後做SNAT+DNAT,即將報文的源地址改爲DPVS設備上的VIP地址,目的地址改爲真實的用戶地址,從而規避上面三種方式的限制,使其組網更加的靈活,適應性更強,因此CLOUD-DPVS網關使用的是FULLNAT模式。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"引入VPC的概念"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DPVS社區最初設計的應用場景是IDC的經典網絡,並不適用於雲化場景。雲化場景與IDC經典網絡最核心的區別是:經典網絡提供的是多用戶共享的網絡,而VPC提供的是用戶專屬的網絡。VPC內用戶可以任意定義雲主機的IP地址,而在經典網絡模式下,大家擠在一個二層網絡裏面,IP地址首先要保證不能重合。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/4e\/4e7033ea41fbc461499f3f27aa7f81b1.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖5"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如上圖5所示,VIP:PORT可以唯一表示一個服務,服務後面掛載了多個實例,但在雲化場景下,其VIP地址是可以重複的,所以仍然使用VIP:PORT來表示一個具體服務就行不通了。因此CLOUD-DPVS轉發層面不能繼續沿用DPVS原有的處理邏輯,爲了解決這個問題研發團隊引入了租戶VPC的概念,把服務和VPC關聯起來,不同的VPC可以有相同的VIP:PORT,這也與實際使用方式相吻合,改造後就變成了VXLAN+VIP:PORT來表示一個服務,具體如下圖6所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/8d\/8df4c05cb94bcd3e962dc0e73c976c20.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖6"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"cloud-dpvs轉發原理"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲實現其轉發功能,在cloud-dpvs上新增了服務器信息表和虛機信息表,其中服務信息表由vxlan和VIP:PORT以及RS:PORT等信息組成,表格式如表1所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"embedcomp","attrs":{"type":"table","data":{"content":"

Vxlan

VIP

vPort

RS-IP

Port

96

172.16.25.13

80

10.182.10.13

80

96

172.16.25.13

80

10.182.10.23

80

101

172.16.25.13

8080

10.182.20.2

80"}}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"   表1:服務信息表"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中Vxlan+VIP+vPort代表VPC內的一個公共服務,用戶VPC私有網絡內的客戶端可以通過訪問VIP+vPort來使用其公共服務,客戶端感知到VIP是私有網絡內的私網IP地址,由cloud-dpvs實現私網VIP到IDC網絡地址的映射,其映射關係和後端Real-Sever對用戶無感知。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"虛擬信息表由vxlan、虛機IP、虛機MAC以及宿主機IP地址組成,表格式如表2所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"embedcomp","attrs":{"type":"table","data":{"content":"

Vxlan

虛擬IP

虛擬MAC

宿主機IP

96

172.16.25.30

fa:16:3f:5d:6c:08

10.162.10.13

96

172.16.25.23

fa:16:3f:5d:7c:08

10.162.10.23

101

172.16.25.30

fa:16:3f:5d:6c:81

10.192.20.2"}}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"   表2:虛擬信息表"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在虛擬化VPC網絡中,由於用戶可以按需自定其私有IP地址,所有不同VPC內的虛機IP地址可能存在重複,故無法以IP地址來唯一確定一臺服務器或者虛機,在cloud-dpvs的實現中採用了Vxlan+虛機IP的方式來表示虛機或者服務器。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"VPC內的虛機訪問VIP:vPort的流量通過OVS內流表規則引流到Vxlan隧道後流量到達cloud-dpvs網關,網關根據查詢服務信息表查詢服務掛在的後端Real-Server,然後再根據一定的調度算法把流量分發給具體的Real-Server服務器,具體路徑如下圖7所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/4f\/4f31edda8f3c58e0302c2271fb4d2ddc.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖7"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"流量路徑說明: "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"embedcomp","attrs":{"type":"table","data":{"content":"

1

VPC內Client發起對VIP:vPort的訪問請求

2

流量經過OVS時,會根據流表規則把流量引入Vxlan隧道進行vxlan封裝,其外層目的IP地址爲Cloud-dpvs網關的IP,外層源IP地址爲虛機所在的宿主機IP地址

3

流量到達Cloud-dpvs網關後,對隧道報文進行解封裝,提取報文的vxlanId及內層的目的IP和Port,由三者即可確定一個VPC內的公共服務

4

Cloud-dpvs根據公共服務上配置的調度算法選擇其上掛載的Real-Server服務器,然後對報文進行映射修改報文的目的IP地址和端口並對映射關係進行記錄,最終把流量引流到IDC網絡的一臺物理服務器

5

後端Real-Server對報文處理完成後,按照源流量路徑會把流量返回給Cloud-dpvs網關

6

Cloud-dpvs收到後端Real-Server返回的報文,首先查找映射關係表,獲取會話所屬的公共服務,並對報文的源目IP頭進行修改,修改後的目的IP即爲虛機IP地址

7

根據公共服務裏的VxlanId和報文的目的IP地址可以確定唯一一臺虛機,根據虛機信息表可以獲知虛機MAC地址以及虛機所在的宿主機IP

8

根據獲取虛機MAC和宿主機IP對報文進行封裝Vxlan頭操作,虛機MAC地址爲內層目的MAC,宿主機IP爲外層目的IP,源IP地址使用Cloud-dpvs的IP地址,封裝後的vxlan隧道報文經過underlay網絡的轉發,報文會到達虛機所在宿主機

9

vxlan隧道報文在虛機所在宿主機上按照OVS流表進行解封裝,並按照OVS流表的轉發規則把報文送入對應的VPC內的Client"}}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"增加VXLAN模塊"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雲化場景下,所有來自VPC的請求都進行了VXLAN封裝。CLOUD-DPVS在轉發層實現了VXLAN模塊,用於解封VPC發來的VXLAN數據包,保證轉發給IDC中服務器的爲解封后的數據包,使得後端服務器對於VPC無感知。當後端服務器回包到達CLOUD-DPVS後,再由CLOUD-DPVS封裝好VXLAN頭部並轉發給VPC。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"CLOUD-DPVS後續工作"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前CLOUD-DPVS打通了VPC到IDC的路徑,後續將繼續實現IDC到VPC的打通。目前VXLAN模塊是在轉發層軟件實現,下一步會計劃使用智能網卡實現OFFLOAD相關工作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"參考文章:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"1. "},{"type":"text","text":"https:\/\/github.com\/iqiyi\/dpvs"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2."},{"type":"text","text":" https:\/\/yq.aliyun.com\/articles\/497058"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"3. "},{"type":"text","text":"https:\/\/tools.ietf.org\/html\/rfc5880#section-6.8.5"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文轉載自:360技術(ID:qihoo_tech)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/cAwnpEnaCOBv5NWP3dGfsA","title":"xxx","type":null},"content":[{"type":"text","text":"基於DPDK實現VPC和IDC間互聯互通的高性能網關"}]}]}]}

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章