網易輕舟對 CIlium 容器網絡的探索和實踐

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2019 年網易輕舟使用 sockmap+sk redirect 來優化輕舟 Service Mesh 的延遲,2020 年開始,我們逐步對 eBPF Service、Cilium NetworkPolicy,Cilium 容器網絡進行實踐,到 2021 年中旬,網易輕舟對外部客戶提供了 Cilium 整體解決方案。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文會深入介紹 Cilium,並澄清一些認知誤區,然後給出網易輕舟是如何使用 Cilium 的。目前國內這方面深入解析材料較少,如果您也正在探究,希望這篇文章能給您帶來幫助。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"eBPF 容器網絡探索背景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前容器網絡方案仍然呈現着百花齊放的態勢,這主要是由於不同的 CNI 各有優勢。開源的原生 CNI 具備較廣的使用場景,即插即用。其中支持 overlay 方式的幾個方案(openshift-sdn、flannel-vxlan、calico-ipip),能更好地適配 L2\/L3 的網絡背景,而 underlay 方式的幾個方案(calico-bgp、kube-router、flannel-hostgw),則在性能上更趨近於物理網絡,在小包場景中性能明顯好過 overlay 類型方案。此外,許多主流雲廠商基於還會自建 VPC 能力,實現 VPC-based CNI ,這類 CNI 與集羣交付場景深度耦合,但也賦予了容器網絡 VPC 的屬性,具備更強的安全特性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此外,業界還有大量針對 CNI 或四層負載均衡的優化措施,例如:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"優化封裝協議,基於 vxlan 有更好的兼容性,基於 ipip 協議則能提升一定的帶寬能力(部分雲廠商如 AWS 並不支持 ipip 協議的包)"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用 IPVS 替換 kube-proxy 解決大規模環境裏,Service 帶寬降低和數據路徑下發時間變長"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用 eBPF 技術來加速 IPVS,進一步降低延遲,提升 Service 數據路徑性能"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"利用 multus 組合 CNI 多網絡平面,並利用 SR-IOV 技術做定向硬件加速"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"虛擬網卡時使用 ipvlan 替代 vethpair,提升性能"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"總的來說,通過方案整合和調優的方式進一步滿足了業務發展的需求,但是當前流行的 CNI 支持場景要麼是不夠通用,要麼在高 IO 業務下存在性能問題,要麼不具備安全能力,要麼因 Kubernetes service 性能問題,集羣規模上不去。因此我們想使用 eBPF 技術實現一套無硬件依賴的高性能、高兼容性的容器網絡解決方案,這個方案解決的問題如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"解決 kube-proxy 性能問題"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"優化數據路徑降低業務端到端的延時解決高 IO 業務的性能瓶頸"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"降低 Service Mesh 路徑延遲"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"支持鏈路加密和高性能安全策略"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"支持無成本集成到主流的容器網絡方案"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於 eBPF 技術做的比較好的是 Cilium,所以 2019 年末開始,我們就開始了相關的探索。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Cilium 功能解析"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Cilium 功能解析,從功能列表,亮點功能,功能限制,三個角度進行闡述。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(1)功能列表"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/02\/024d9cbd6dc6304ccf33a71a03c30a57.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(2)亮點功能"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Cilium 支持的功能更像一個容器網絡功能的複合體,其具有 Underlay 路由模式、Overlay 模式,也具有鏈路加密能力,此外其獨特能力如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"帶寬管理"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"集羣聯邦"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"IPVLAN 模式"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"七層安全策略"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":5,"align":null,"origin":null},"content":[{"type":"text","text":"DSR 模式保留源地址,節省帶寬,避免 SNAT 端口耗盡問題"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":6,"align":null,"origin":null},"content":[{"type":"text","text":"Kubernetes Service 性能調優,支持 Maglev 算法"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":7,"align":null,"origin":null},"content":[{"type":"text","text":"XDP 性能調優,高性能南北向 Service"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":8,"align":null,"origin":null},"content":[{"type":"text","text":"Local Redirect Policy"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":9,"align":null,"origin":null},"content":[{"type":"text","text":"流量可見性,細粒度流量控制,用於審計"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":10,"align":null,"origin":null},"content":[{"type":"text","text":"高性能更能適配高帶寬的網卡(eg: 50G、100G)"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從功能範疇看,其亮點功能有安全、性能、功能三個方面,其有一部分亮點能力是 Cilium 社區對容器網絡最佳實踐的功能支持,有一部分能力是來源於 eBPF 能力的創新,所以 Cilium 是一個複合品;社區思路可以一句話概括爲 “在容器網絡領域,大家有的功能我也有,而且我性能更好,此外我藉助 eBPF 能力從內核側創新了大部分功能”。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(3)功能限制"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Cilium 帶來的技術紅利讓不少國內外公司去實踐,國內公司來說,除網易輕舟外,我們看到騰訊、阿里、愛奇藝、攜程等同行在生產環境中落地 Cilium。雖然 Cilium 通過 eBPF 給容器網絡提供了創新機會,但是也帶來了兩點問題,這兩點問題讓絕大部分用戶持觀望態度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"問題 1: eBPF 相比 iptables 調試難度更大,技術要求較高"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"iptables 存在內核已經存在幾十年了,相關技術棧的開發者多,已經被廣泛的接受,而 2014 年 eBPF 概念才被提出,2018 年依託於 eBPF 的容器網絡 Cilium 誕生"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"eBPF 技術研發難度會大些(報錯調試日誌不明顯、整體技術把控的人才少)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從靈活性角度來看,Iptables 配合策略路由只需要幾條命令就可以編排數據路徑,更簡單些,eBPF 需要編寫代支持新的數據鏈路需求複雜度會高些"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"面臨上述問題,但是我們想法是:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Iptables 技術棧的確是更加通用的,但也搞不定某些業務場景,比如:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"安全性較高的 Kubernetes 集羣,需要強流量日誌審計的功能的業務"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"存在業務高 IO,低延時業務,或者 Service 規模大,且業務短鏈接較多,集羣規模較大,需要節省南北向網絡帶寬等業務"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們認爲 eBPF 技術棧處於發展期、周邊工具在逐步完善中,高複雜度會帶來額外的人力成本,如果業務需求強於克服 eBPF 的技術的人力成本,外加上有技術廠商支撐,某些對性能和安全要求較高的場景選型 eBPF 技術問題也不大。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"問題 2: 內核版本要求高"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根據 CIlium 社區,基於 eBPF Kubernetes 容器網絡需要內核允許最小版本 4.9,Calico 支持的 eBPF 數據面要求內核至少 5.3 版本。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"面臨上述問題,我們經歷是,前兩年內核版本的限制屬於很大的阻力,當前於網易輕舟支撐容器雲平臺來說,5.4 內核逐步在鋪開使用了,因爲我們需要更強的 eBPF 能力進行監控和更完整乾淨的 CGROUPv2 特性,我們希望對業務更細粒度的分資源優先級,更強的資源隔離能力(IO 隔離),防止業務之間互相影響,增強容器平臺的可用性。而 5.4 內核作爲基礎內核跑 Cilium eBPF 功能夠用。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Cilium 數據面"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"看完 CIlium 功能後,我們看下這些功能的數據面是如何實現的,因此後文,會先深入解析 Cilium 數據面的實現,並揭露其高性能的原因,最後再澄清一些常常被誤解的說法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(1)數據面路徑"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先我們知道:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Cilium 數據面和 Kernel 版本有一定關係,不同內核支持的數據面路徑不同,上層的數據面實現也有一定差異。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Cilium 數據面和功能開關有一定關係,不同的功能開關的開啓和關閉,會影響數據路徑的轉發行爲。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此,選定 Kernel5.4,Kernel5.10 來進行探討,同時打開 kube-proxy 卸載,目前只開啓 IPv4 協議,其它數據面採用 CIlium 的默認值,選擇 Cilium1.10 版本,下文主要以圖示和說明的方式進行表述, 因篇幅限制,選擇路由模式、VETH 接口類型進行探討。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/36\/36cfae77db3e2d872b3d723ede048374.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1. 跨節點 Pod -> Pod"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pod 發出流量後,經過宿主機的 veth lxc 接口的 tc ingress hook 的 eBPF 程序處理後,送給 kernel stack 進行路由查找,然後經過物理口 eth0 的 tc egress 發出到達 Node2 的 eth0"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Node2 接收流量後,先經過 Node2 物理口 eth0 的 tc ingress hook 的 eBPF 處理,然後送給 kernel stack 進行路由查找,發給 cilium_host 接口,流量經過 cilium_host 的 tc egress hook 點後,最後 redirect 到目的 Pod 的宿主機 lxc 接口上,最終送給目的 pod。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2. 同節點 pod->pod"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"pod 發出流量後,流量會發給本 Pod 宿主機 lxc 接口的 tc ingress hook 的 eBPF 程序處理,eBPF 最終會查找目的 Pod,確定位於同一個節點,直接通過 redirect 方法將流量直接越過 kernel stack,送給目的 Pod 的 lxc 口,最終將流量送給目的 Pod。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3. 訪問 NodePort Local Endpoint"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Client->NodePort,流量從 Node2 的 eth0 口進入,經過 tc ingress hook eBPF Dnat 處理後,繼續將流量發給 kernel,kernel 查路由轉發給 cilium_host 接口,cilium_host 接口的 tc ingress 收到流量後直接 redirect 流量到目的 pod 的宿主機 lxc 接口上,最終送給目的 pod。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"回程流量從 Pod 出來,經過 veth 宿主機側接口口 lxc 的 tc ingress hook eBPF 反 Dnat 處理,該 eBPF 程序直接將流量從定向到物理口,該過程不經過 kernel,最終在經過物理口的 tc egress hook 點後發給 Client。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4. 訪問 NodePort Remote EndPoint"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Client 訪問 NodePort,流量發給 Node1 接口 eth0 ,經過 tc ingress hook eBPF 先進行 Dnat 將目的地址變成目的 Pod 的 IP 和 Port,後進行 Snat 將源地址變成 Node1 物理口的 IP 和 Port,最後將流量 redirect 到 Node1 的 eth0 物理口上,流量會經過 tc egress hook 點後發出。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"5.Pod 訪問外網"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Pod 發出流量後,經過宿主機的 veth 的 lxc 口的 tc ingress hook 的 eBPF 程序處理後,送給 kernel stack 進行路由查找,確定需要從 eth0 發出,因此流量回發給物理口 eth0,而經過物理口 eth0 的 tc egress eBPF 程序做個 Snat,將源地址轉換成 Node 節點的物理接口的 IP 和 Port 發出。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從外網回來的反向流量,經過 Node 節點物理口 eth0 tc ingress eBPF 程序進行 Snat 還原,然後將流量發給 kernel 查路由,流量流到 cilium_host 接口後,經過 tc egress eBPF 程序,直接識別出是發給 Pod 的流量,將流量直接 redirect 到目的 Pod 的宿主機 lxc 接口上,最終反向流量回到 Pod。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"6. 主機訪問 Pod"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"主機訪問 Pod 流量使用 cilium_host 口發出,所以在 tc egress hook 點 eBPF 程序直接 redirect 到目的 Pod 的宿主機 lxc 接口上,最終發給 Pod。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"反向流量,從 Pod 發出到宿主機的接口 lxc,經過 tc ingress hook eBPF 識別送給 kernel stack,回到宿主機上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"針對於 XDP 情況沒有標註到上面流程中,其實 XDP 過程和 tc ingress 處理方式一致,XDP Hook 要比 tc ingress 更靠近收包流程,這個時候 skb 還沒有形成,因此性能更高,使用 xdp 特別適合 NodePort、HostPort、External IP 等這種南北向 Service 流量加速,從社區測試結果看,大概 3-4 倍 PPS 提升。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面介紹是基於 5.4 內核的,在 5.10 內核以後,Cilium 新增了 eBPF Host-Routing 功能,該功能更加速了 eBPF 數據面性能,新增了 bpf_redirect_peer 和 bpf_redirect_neigh 兩個 redirect 方式,bpf_redirect_peer 可以理解成 bpf_redirect 的升級版,其將數據包直接送到 veth pair Pod 裏面接口 eth0 上,而不經過宿主機的 lxc 接口,這樣實現的好處是少進入一次 cpu backlog queue 隊列,該特性引入後,路由模式下,Pod -> Pod 性能接近 Node -> Node 性能,同時 Cilium 數據面路徑發生了較大的變化,如下圖,具體變化直接對比看圖即可,篇幅有限,重點說明兩點變化:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/e2\/e2ce1710e281e311cdda4c93d4638531.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1. 除了 host->Pod 外,所有路徑都是經過接口跳着走的,最大變化是不再經過 Kernel 轉發處理,也意味着來回流量路徑不再經過 kernel 的 Netfilter 框架,kernel tc 等模塊,大大提升了轉發性能;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2.redirect_peer 特性專用於 veth pair 類型接口,因爲流量被直接重定向到 Pod 裏面的接口,所以在宿主機 lxc 口上無法抓到進入 Pod 的流量,因 tcpdump 抓包點在 tc ingress hook 之前,所以可以抓到出 Pod 未經過 eBPF 處理的流量。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(2)高性能的原因"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"高性能可分爲如下兩個方面"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據路徑延遲低,帶寬高"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據路徑低延遲,高帶寬原因是,其採用技術創新短路傳統了 Kernel 框架,減少了數據包轉發所需的指令。這些技術包括,eBPF 夾持下的 redirect、redirect_peer、redirect_neigh、XDP、sockmap、sk redirect 等技術。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kubernetes service 數據面性能高,Service 規模擴大,控制面下發時間變化不大。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kubernetes service 數據面性能高,分爲集羣內訪問的 socket LB 技術,該技術使得不需要做數據包 skb 更改,直接將容器平臺內部訪問 service 流量轉變成直接訪問 Pod 效果,訪問 Pod 路徑也是經過加速的。南北向 Service(NodePort、external IP、HostPort)性能高的原因是其支持 Native XDP 加速,外加上路徑跳轉優化,對於 Remote Pod 的 Service 場景,還支持了 DR 模式,從邏輯上減少一跳路徑。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kubernetes service 控制面下發性能,這塊主要採用了 eBPF map 技術,該技術實質會採用 Hash 方法,擺脫了變動 Service 就要全量下發 iptables 的問題,因此控制面下發時間隨着 Service 規模的變大,策略更新延時變化不大。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(3)概念澄清"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"IPtables 實現 Service 的性能高低是要看場景的"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於 Iptables 實現的 Service,KubeCon Talk “Scale Kubernetes to Support 50,000 Services” 有一個特定的測量結果:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這是 Kubernetes 服務轉發的瓶頸,部署 5,000 個 Services(40k rules)時吞吐量降低了約 30%,而部署 10,000 個服務則降低了 80%(性能相差 6 倍)。同樣 5,000 個 Services(40k rules)的規則更新花費了 11 分鐘。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是 Iptables 是線性規則匹配和規則跳查的,但是這個慢速過程只發生在新建階段,一旦會話建立好,後續流量走 CT 即可,所以 Service 到達一定規模,對於短鏈接業務延遲會有較大的影響,從容忍度角度看,這個性能降低的影響,對於某些業務是可容忍的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此處不討論 5000 個 Service 是否在生產環境真的需要 11 分鐘(這個和環境,endpoint 規模有關),只討論業務容忍度問題,業務對於 Service 規則更新分鐘級別生效的容忍度會差些,因爲 Service 規則更新慢會導致經過 Service 流量仍然發給異常的 Pod,導致業務失敗。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kubernetes Service 所帶來的性能問題,也不是說一定會成爲集羣的性能瓶頸,因爲如果選擇不使用 Kubernetes Service 數據路徑,或者甚至不使用 CNI,也就沒有這個問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Cilium XDP 是有限制的"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先通用的 XDP 性能提升有限,Native XDP 性能較符合預期,但 XDP 使用是有限制的,當前其還無法支持 Bond 口,這塊網易輕舟的實踐方法是物理機器不做 Bond,直接將兩個接口通過 ecmp 等價路由方向做 HA,使用 XDP 對物理口加速南北向的 Service,並選定若干臺 Node 節點,直接通過 BGP 將 Service IP 直接發佈出去,直接省去了南北向集中式的四層 LB。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Cilium 控制面"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Cilium 控制面,從控制面框架,數據面下發過程來展開,控制面框架能對 CIlium 控制面有一個全局意識,數據面構建過程可以明確數據面如何搭建的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/83\/83d22f76d040bab7e92c6ccb10fb6a5a.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先明確事情是,CIlium agent 負責所有數據路徑 eBPF 加載,網絡連通的配置管理,負責將數據面搭建起來,是控制面的核心組件。而 Cilium agent 工作可按照加載時機分爲兩部分,一部分在於啓動時候加載,一部分在於創建 Pod 後或者說讀取到 Pod 所對應 endpoint 對象後加載。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"啓動時候加載的對象一般是數據面骨架,這部分和數據面框架有關,如下"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"cilium_host \/cilium_net 接口建立,以及 eBPF 程序加載"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"全局 socket LB、vxlan 口、物理口的 eBPF 規則的編譯下發,少量的 IPtables 規則的下發"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MTU 設置,物理口識別,開啓接口的 Forward 轉發、Kernel eBPF 能力探測,並更具探測能力以及預配置選項進行 eBPF 功能開啓等"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"創建 Pod 加載,用於給創建 Pod 編譯加載 eBPF 程序,一般用於加載位於宿主機側 veth 口的 tc ingress"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了控制 eBPF 程序的編譯和加載,Cilium 使用自己的 Cilium endpoint 進行管理,Pod 建立過程中,Kubelet 會調用 Cilium cni 創建 Pod veth 口,其中 Cilium cni 會通過 unix socket 調用 cilium agent 來創建 endpoint 對象,這些對象含有豐富相關信息(eg mac 地址、接口名稱、加載策略名稱、policy 規則等),這些對象會通過 API Server 被下發到 etcd 中記錄起來,Cilium agent 根據這些對象編譯加載 eBPF 程序。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了適配支持不同內核,Cilium agent 會自己探測是否開啓某功能,如果默認內核不支持則起自動回退到能支持的內核特性,爲了更大的功能靈活性,Cilium eBPF 有設計了許多數據面的配置開關,非常零碎,這些配置選項通過 cilium-config map 管理,Cilium agent 會將此 map 掛載成路徑,同時根據配置的開關,動態形成 eBPF 程序的編譯宏,通過該編譯宏來編譯 eBPF 程序,進而生效和關閉某些數據面功能。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"網易輕舟 Cilium 的實踐"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"閱讀上文後心中會有一個整體框架,再看網易輕舟對 CIlium 實踐思路,以及實踐過程中遇到的一些問題和解決辦法,就比較清晰了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(1)網易輕舟基於 CIlium 的實踐思路"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/59\/5959d813cc2493528a007c832e9bab03.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"簡單概括我們的實踐思路就是 “3+1”,其中“3”指 3 個插件,“1” 指 1 個 Cilium eBPF 容器網絡 CNI,我們的思考是:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Sockops 主要是加速輕舟服務網格產品、輕舟 Serverless 產品的 Sidecar 場景,降低服務網格延時 9%,QPS 提升 20%,提升輕舟 Serverless 產品 Knative QueueProxy 和業務容器之間的訪問 QPS 20%"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"NetworkPolicy 插件,支持插入主流的 CNI 和 Netease 容器網絡,該插件主要是使用 eBPF 實現安全策略,性能相比較與 IPtables 更好,同時配合輕舟 Hubble 組件來做集羣的流量審計,主要用於金融業務。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"eBPF Service 插件,該插件用於替換 kube-proxy,適用於有 kube-proxy 性能需求的業務場景,當前該組件我們優先支持了 Cluster IP,南北向的 Service 和容器網絡類型關聯度較大,我們只支持了 5.10 內核的路由場景。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一開始我們想直接推動老的集羣使用使用 Cilium 容器網絡,但是發現這塊難度較大,因爲並不是所有的業務都需要高性能的容器網絡,從 0-1 的替換也不合適,但是我們的業務方對不同的功能特性是有需求的,所以我們將相關功能摳出來插入到當前容器網絡場景,滿足業務需求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於新的 Kubernetes 集羣,我們推動使用 CIlium 容器網絡,2021 年上半年,我們將 Cilium CNI 落地了到了大規模使用 Kafka 和 ES 的高 IO 場景的新業務集羣,我們使用 kernel5.4,並且將 eBPF Host-Routing 所需要的 bpf_redirect_peer 和 bpf_redirect_neigh eBPF helper 移植到 Kernel5.4。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(2)實踐過程中遇到的問題和解決辦法舉例"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Cilium Chaining 的適配問題"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CIlium Chaining 方式做 kube-proxy 卸載,低於 5.10 的 Kernel 內核情況下,對於南北向類型的 Service 來說,NodePort 流量不通,這個問題是由於 Node ->Pod 經過 kernel 轉發,Pod -> Node 直接走 eBPF redirect 到物理口,導致了來回路徑不一致,導致三次握手的 ACK 數據包被內核的 iptables 狀態檢查( --ctstate INVALID -j DROP)丟棄,而 5.10 內核沒有該問題, 因爲 5.10 內核是從接口之間來回跳轉的,不會經過 kernel 轉發路徑,我們的做法是通過高版本內核來解決此問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據鏈路還不夠靈活"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在私有云場景,其數據鏈路的需求是多種多樣的,業務 Pod 某些流量有定製的數據路徑要求,但是 eBPF 程序往往需要做額外的開發才能支持,這對業務來說都是時間成本,因此我們抽象了此種場景,支持將目標流量直接打上 mark 送給 Kernel,Kernel 協議棧通過 IPtables 和策略路由配合起來快速實現業務定製路徑,通過此種方法滿足業務快速迭代的需求,而後我們再根據通用性程度選擇是否完全通過 eBPF 技術實現。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"網絡問題定位困難"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CIlium 場景下,特別是使用 eBPF 後,其數據包流量是來回跳轉的,針對於非常特殊場景,有時候走內核,有時候不走內核,非常容易導致網絡不通,所以我們通過 eBPF 做了一個 tcp packet 抓取工具,該工具通過 kprobe 到內核關鍵收發包函數和 iptables ,來進行流量抓取,同時豐富了 eBPF 數據包 trace 點,配合 Hubble 來快速定位網絡問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"生產環境內核版本低"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Sockops 的支持"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們在 2019 年開始使用 socket 短路方式降低服務網格延遲,但是當時內核 4.19.87 還沒有完全支持 sockmap+sk redirect,所以我們將內核 patch 回合了一些,達到我們的使用需求,這裏額外說明的是,針對於 Sockops 功能,特別是用於 Service Mesh 場景,該場景下我們採用 127.0.0.1 進行業務容器和 Sidecar 的互通,Cilium 的實現的 local redirect policy 默認情況下是沒有做 NS 隔離的,這樣做會導致數據面衝突,所以對於 sock_key 我們額外加入了 NS 信息,這樣同的 NS 內部,sock 重定向業務的五元組相同也不會出現衝突。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"高版本內核特性和合入"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了使用高性能數據路徑,我們將 bpf_redirect_peer 和 bpf_redirect_neigh patch 移植到 kernel5.4,開啓 eBPF Host-Routing 功能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"作者簡介"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"劉勤龍,網易數帆輕舟產品線資深開發工程師,雲原生社區 Knative SIG 發起者,8 年服務端開發和優化經驗,負責網易輕舟四層負載均衡數據面設計,參與輕舟服務網格性能優化,目前專注於輕舟雲原生 Serverless 平臺的開發和優化工作,支撐 Serverless 在網易雲音樂、網易嚴選、網易傳媒的落地工作。主要關注 Kubernetes、Istio、Knative、Cilium 等技術領域。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"黃揚,網易數帆輕舟產品線資深系統開發工程師,具備多年容器網絡設計開發經驗和 Kubernetes 開發運維經驗。目前負責網易數帆容器網絡解決方案,以及自動化診斷平臺的維護。"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章