京東數科統一接入網關JDDLB性能優化之QAT加速卡

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"京東數科JDDLB作爲京東數科最重要的公網流量入口,承接了很多重要業務的公網流量。目前,已成功接替商業設備F5所承載的流量,並在數次618、11.11大促中體現出優越的功能、性能優勢。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"一、京東數科JDDLB 整體架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/a9\/95\/a9844eca67d4f3b9d3c09aa5fcac9a95.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖1 京東數科JDDLB 整體架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"JDDLB 整體架構的核心包括:基於DPDK自主研發的四層負載SLB,定製開發擴展功能的NGINX,以及統一管控運維平臺。其主要特點爲:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"高性能"},{"type":"text","text":":具備千萬級併發和百萬級新建能力。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"高可用"},{"type":"text","text":":通過 ECMP、會話同步、健康檢查等,提供由負載本身至業務服務器多層次的高可用。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"可拓展"},{"type":"text","text":":支持四層\/七層負載集羣、業務服務器的橫向彈性伸縮、灰度發佈。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"四層負載能力"},{"type":"text","text":":通過ospf 向交換機宣告vip;支持ECMP、session 同步;支持均衡算法如輪詢、加權輪詢、加權最小連接數、優先級、一致性哈希;FullNAT轉發模式方便部署等。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"七層負載能力"},{"type":"text","text":":支持基於域名和URL的轉發規則配置;支持均衡算法如輪詢、基於源 IP 哈希、基於cookie等。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"SSL\/TLS能力"},{"type":"text","text":":證書、私鑰、握手策略的管理配置;支持 SNI 配置;支持基於多種加速卡的SSL卸載硬件加速等。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"流量防控"},{"type":"text","text":":提供一定的 Syn-Flood 防護能力;與應用防火牆結合後提供 WAF 防護能力;提供網絡流量控制手段如 Qos 流控、ACL 訪問控制等。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"管控平臺"},{"type":"text","text":":支持多種維度的網絡和業務指標監控和告警。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此外,藉助於JDDLB現有能力特性,方便擴展其他新功能。例如,藉助於NGINX的SSL\/TLS硬件優化性能以及連接高併發處理能力,可以實現基於MQTT協議的、支持SSL\/TLS協議、安全的推送長連接網關等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文針對JDDLB中七層負載的SSL\/TLS性能優化方法之一——將耗CPU計算資源的加解密算法計算卸載到QAT加速卡——進行概述性介紹。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"二、優化方案性能提升對比"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1、測試方法"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"執行機部署適配QAT引擎後的nginx,發包測試機進行壓測灌包,在CPU負載達到100%後比較得出nginx在進行QAT優化後的新建connection速率對比。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2、測試場景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/79\/02\/79be898c30f16ffa68644abe5260eb02.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"3、本地測試數據對比"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"▷ 使用單張加速卡性能對比"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/7a\/61\/7a69717b4ce2c501af39347912b77b61.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"▷ 使用雙加速卡性能對比"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/50\/e0\/50db7cb7e34b32d416ec7951eb36dee0.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此優化方案,通過NGINX進行HTTPS新建速率實測,與軟件加解密場景做對比:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用單加速卡,rsa平均新建速率提升3倍,ecdh新建速率提升2.5倍"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用雙加速卡:rsa 平均新建速率提升6 倍,ecdh新建速率提升5.5倍"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此優化方案所帶來的性能提升主要依賴於:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對比fsl 加速卡、qat 採用用戶態驅動的方式,實現了內核態到用戶態到內存零拷貝。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"NGINX採用異步模式調用OpenSSL API,代替傳統的同步模式調用。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"驅動支持多加速卡同時進行卸載加速。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"三、硬件加解密異步處理"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1、異步框架概述"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"JDD-LB基於nginx 原生的異步處理框架上拓展出針對異步硬件引擎的異步事件處理機制,整體框架如下圖所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/2c\/6a\/2cfbde71a0507ca51905f7yyce09af6a.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"硬件加解密的異步框架整體依賴nginx的epoll異步框架處理機制和 openssl 協程機制。原生的nginx epoll 框架負責網絡I\/O 的異步事件處理,在此基礎上jdd-lb重新增加了async fd 異步監聽任務,在原有的連接結構體中增加新的異步來源用來接收異步引擎的通知,當執行OpenSSL相關操作時,把返回的事件fd加載到jdd-lb的異步事件框架中,openssl 當檢測到硬件執行完相關操作後就會喚醒相關事件進行後續操作的執行。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"關於協程的詳細介紹可以參考"},{"type":"link","attrs":{"href":"http:\/\/mp.weixin.qq.com\/s?__biz=MzI0MDc5NzQ2MQ==&mid=2247496741&idx=1&sn=3753e29db756d667f7e41b1dc6add814&chksm=e917e65fde606f496bee23186e8b96b566d6ad0787d0da8d7511d162bd11167834bdaacece94&scene=21#wechat_redirect","title":"","type":null},"content":[{"type":"text","text":"《UAG性能優化之freescale加速卡》"}]},{"type":"text","text":",裏面有關於協程實現的基本語義、切換機制等詳細介紹,這裏不在贅述,總而言之,openssl協程機制實現在線程內的多個任務切換管理。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏涉及到兩個問題:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"異步任務的上下文切換以及通知機制;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Nginx如何獲取async fd,並加入epoll監聽隊列中,並與openssl 以及qat engine協同完成一次ssl 握手的;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2、交互流程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以ssl握手爲例,nginx、openssl、qat引擎各個模塊之間的交互流程如下圖所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/84\/fd\/8456311220d1543a28edc2fe0b2e98fd.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ASYNC_start_job:nginx 調用ssl lib庫接口SSL_do_handshake, 開啓一個異步任務。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Rsa\/ecdh 加解密操作。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Qat 引擎將加密消息發送給驅動,創建異步事件監聽fd,將fd綁定到異步任務的上下文中。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"qat_pause_job: 調用該接口保存異步任務執行的堆棧信息,任務暫時被掛起,等待硬件加解密操作完成。同時進程堆棧切換到nginx io調用主流程,ssl返回WANT_ASYNC,nginx開始處理其他等待時間。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"nginx io 處理框架獲取保存在異步任務上下文中的asyncfd,並添加到epoll隊列中啓動監聽。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"加速卡處理任務完成,qat 引擎調用qat_wake_job接口喚醒任務(也就是將async fd 標記爲可讀),這裏有一個問題,這個qat_wake_job是什麼觸發執行的?qat 爲nginx 提供了多種輪訓方式去輪訓加速卡響應隊列,目前jdd-lb 採用的是啓發式輪訓的方式,具體參數可以在配置文件中定義。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Nginx處理異步事件重新調用異步任務框架的ASYNC_start_job接口,這時候程序切換上下文,堆棧執行後跳回之前pase job的地方。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"四、用戶態驅動實現內存零拷貝"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"區別於上期我們介紹的"},{"type":"link","attrs":{"href":"http:\/\/mp.weixin.qq.com\/s?__biz=MzI0MDc5NzQ2MQ==&mid=2247496741&idx=1&sn=3753e29db756d667f7e41b1dc6add814&chksm=e917e65fde606f496bee23186e8b96b566d6ad0787d0da8d7511d162bd11167834bdaacece94&scene=21#wechat_redirect","title":"","type":null},"content":[{"type":"text","text":"freescale加速卡方案"}]},{"type":"text","text":"、qat採用用戶態驅動的實現方式,利用linux uio+mmap技術實現了內存零拷貝。接下來介紹qat 實現內存零拷貝的基本原理。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在弄清楚這個問題前,我們先介紹一下uio技術的基本原理,簡而言之:UIO(Userspace I\/O)是運行在用戶空間的I\/O技術,Linux系統中的驅動設備一般都是運行在內核空間,UIO則是將驅動的很少一部分運行在內核空間,而在用戶空間實現驅動的絕大多數功能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/a5\/36\/a502c386552ca81fde9d7350b243c836.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"工作原理圖"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們先明確一個用戶態程序如何操作一個pci設備?首先需要找到它的寄存器並對其進行配置,而找到寄存器的前提是拿到外設的基地址,即:通過“基地址+寄存器偏移” 就能找到寄存器所在的地址,然後就可以配置了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"綜上,qat的內核模塊可以分爲兩塊基本內容:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(1)PCI設備驅動,獲取pci設備的配置空間信息,申請內核資源;也就是拿到加速卡的基址寄存器地址,映射設備的地址空間。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(2)註冊UIO設備,實現用戶態驅動的主要功能:映射寄存器地址到用戶空間,在內核態屏蔽設備中斷;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"利用UIO框架實現將設備的寄存器地址映射至用戶空間後,QAT還需要完成用戶態進程與硬件之間消息傳遞。QAT中消息隊列的創建是在進程運行時創建,同樣利用mmap 原理將一段物理地址映射至用戶態進程,爲了避免內核在運行過程中出現內存碎片導致性能下降,QAT 新增了usdm內核模塊用於管理和維護頁表。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如下圖所示,顯示了QAT如何將設備地址及消息隊列映射到用戶空間:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/fc\/ba\/fc5abf78c8e893dde43a204dfdfb6fba.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"綜上所示,QAT通過UIO+ mmap的方式實現了內核態到用戶態到內存零拷貝,相比於freescale 和cavium等其他加速卡廠商,QAT在這一方面存在着一定優勢。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"關於內存映射的問題,jdd-lb nginx多進程之間必須要保證物理地址的對齊偏移量一致,jdd-lb在實際部署和上線過程中遇到過類似問題:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"現象描述"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"裝載qat加速卡機器出現宕機,我們抓取宕機的crash日誌,如下圖:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/df\/3f\/df30b776176007bfbaef8f7c0ae4c33f.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"系統掛之前操作消息隊列(send_null_msg),內核被踩了,另外,在掛之前截圖內核日誌,發現大量內存申請失敗的信息:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原因分析"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於以上現象分析,機器掛之前的主要行爲:1、nginx正在頻繁reload;2、預分配的大頁內存池不夠了,新起的worker進程採用ioctl的方式從內核申請2m頁,也就是我們在修復問題二之前採取的內存分配方式,這點可以從內核日誌中分配內存失敗的打印推斷出來,因爲只有通過ioctl 方式申請纔會出現userMemAlloc failed的打印。3、worker 退出前訪問了消息隊列,這個是導致宕機的直接原因。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"總結一下三條行爲的因果關係,頻繁reload > 老的worker還沒徹底shutdown,新的worker已經起來了, 同一時間出現大量nginx > 預分配的大頁內存池被佔滿 > 新的worker採用ioctl的方式去申請> 有個worker釋放時訪問消息隊列 > 機器掛了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根據以上分析,在機器掛之前nginx 的worker 進程是同時存在兩種內存分配方式,但是qat是同時支持兩種內存分配方式的,如果ioctl方式申請內存失敗最多出現nginx qat加速卡卡降級,在問題二中已經說明了,並不會出現宕機的行爲,難道是兩種方式不能共存?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"問題驗證"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"帶着上述疑問,採取人爲手段在測試環境復現該場景,先是繼續在nginx拉起前人爲的消耗完2m頁,在nginxworker代碼中加入調試控制,shutting down之前sleep100秒(測試環境下nginx shutdown的速度非常快,並不能完全模擬線上環境的場景),然後運行一個nginx 不斷reload的腳本,最後,線上環境的問題在測試環境復現,crash的日誌與線上環境一致。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"問題到這一步,只能說明兩種內存分配方式共存的場景下,nginx釋放消息隊列時會引起宕機,同樣的測試場景,如果只採用一種內存分配的方式申請,都不會出現宕機。總之,知其然不知其所以然。然後經過閱讀源代碼加點日誌分析判斷,最終找到點眉目,下面放結論。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"usdm內核模塊會維護物理地址和虛擬地址的映射關係,進程在釋放隊列後會去操作加速卡的CSRs,而這個物理地址是通過創建的消息隊列的物理地址偏移後計算出來的,雖然兩種內存申請方式按2m的大小申請的,但是通過大頁內存的方式申請的物理頁是按2m的大小保存的,而通過ioctl方式申請的物理頁是按照4k大小保存的,因此,如果通過同時存在兩種內存分配方式的話,計算得到的CSR的物理地址就不一致了。畫個圖說明一下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/68\/a1\/6812c9a04de7cayy4e477e9ce22999a1.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"五、進程級別的加速引擎調度"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"QAT 的引擎啓動主要完成3個步驟:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(1)完成應用程序註冊:打開字符串設備\/dev\/qat_dev_processs,將app配置信息寫入驅動,App的信息必須與驅動配置中定義的配置塊一致,也就是\/etc\/dh895xcc_dev0.conf這個配置文件中的配置信息。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(2)完成服務的註冊(qat 支持cy和dc兩種服務),並完成加速卡硬件相關資源的初始化操作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(3)獲取服務實例,這裏的instances可以理解爲應用程序與加速卡相互通信的通道,實際上就是與底層的加速卡和消息隊列建立綁定關係。所以8950 加速卡最多支持128 個instance(考慮硬件隊列最多就256個)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"需要說明的是:qat 的instance資源是進程級別,那麼每個worker在啓動之前都會調用qat_engine_init函數,qat用了一個鉤子使得master進程在fork 之後子進程調用qat_engine_init函數,同時在fork之前釋放引擎(master本身不需要引擎資源),流程如下圖所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/c3\/eb\/c36e4645be03d71de4a18ae7fa0330eb.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"六、QAT 組件框架概覽"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/cf\/54\/cf7b1286cf2fbd37c3c5f8ca70448f54.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Application"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"應用層主要包含兩塊內容:(1)qat 異步框架的patch,該patch提供對異步模式的支持;(2)qat 引擎,engine是openssl本身支持的一種機制,用以抽象各種加密算法的實現方式,intel 提供了qat 引擎的開源代碼用以專門支持qat 加速。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"SAL(service access layer)"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"服務接入層,給上層Application提供加速卡接入服務,目前qat主要提供crypto 和compression兩種服務,每一種服務都相互獨立,接入層封裝了一系列實用的接口,包括創建實例,初始化消息隊列、發送\\接受請求等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"ADF(acceleration driver framework)"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"加速卡驅動框架,提供SAL需要的驅動支持,如上圖,包括intel_qat.ko、8950pci驅動、usdm內存管理驅動等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"七、總結與思考"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"截止目前,JDD-LB在硬件加速領域,已經同時支持飛思卡爾與intel QAT硬件方案,爲有效替代f5提供了性能保證,成功實現核心網絡組件自主可控,爲構建金融級的網關架構賦能行業打下堅實的基礎。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"未來JDD-LB將持續構建接入層網關能力體系。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"安全與合規"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"作爲京東數科統一流量接入入口,JDD-LB將持續構建金融級的通信安全基礎設施,打造全方位的安全防護體系。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"多協議支持"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"JDD-LB在高效接入能力建設方面將持續投入,通過引入QUIC 協議,將提升用戶在弱網場景下的用戶支付體驗。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過MQTT協議可以通過非常小的接入成本實現新設備和協議接入,積極擁抱萬物互聯。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"文章轉載自: 京東數科技術說(ID:JDDTechTalk)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/NmgT2p_Yq-j9HpswtISvpA","title":"xxx","type":null},"content":[{"type":"text","text":"京東數科統一接入網關JDDLB性能優化之QAT加速卡"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章