模型端側加速哪家強?一文揭祕百度EasyEdge平臺技術內核

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"近年來,深度學習技術在諸多領域大放異彩,因此廣受學術界和工業界的青睞。隨着深度學習的發展,神經網絡結構變得越來越複雜。複雜的模型固然具有更好的性能,但是高額的存儲空間與計算資源消耗使其難以有效地應用在各硬件平臺上。因此深度學習模型在端上部署和加速成爲了學術界和工業界都重點關注的研究領域。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一方面,有許多深度學習框架可以讓開發者和研究者用於設計模型,每個框架具備各自獨特的網絡結構定義和模型保存格式。AI工程師和研究者希望自己的模型能夠在不同的框架之間轉換,但框架之間的差距阻礙了模型之間的交互操作。另一方面,由於深度學習模型龐大的參數量,直接在邊緣端部署模型會產生較高的時延。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"百度EasyEdge端與邊緣AI服務平臺可以很好地解決上述問題。EasyEdge可以支持多種主流深度學習框架的模型輸入,提供了方便的部署功能,針對業內各類主流芯片與操作系統進行了適配,省去了繁雜的代碼過程,可以輕鬆將模型部署到端設備上。EasyEdge在集成了多種加速技術的同時對外提供多個等級的加速服務,以平衡模型推理時間和精度,一方面可以最大限度的減小模型在端上部署的延時,另一方面可以匹配更廣泛的使用場景。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"EasyEdge支持多種不同類型深度學習模型的部署,包括常見的模型類型包括圖像分類、檢測、分割以及部分人臉檢測、姿態估計。目前EasyEdge支持的經典網絡種類超過60種以及多種自定義的網絡類型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同時EasyEdge支持接入多種深度學習框架,包括飛槳PaddlePaddle、Pytorch、Tensorflow、MxNet等。爲了更方便的實現部署,"},{"type":"text","marks":[{"type":"strong"}],"text":"目前EasyEdge支持部分深度學習框架模型的互轉換"},{"type":"text","text":",如圖1所示。例如用戶想要在Intel的CPU上使用OpenVINO部署一個Pytorch模型,EasyEdge可以實現經過多次模型轉換,將torch模型格式轉換成OpenVINO IR格式,最後基於OpenVINO部署框架完成模型部署。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/7a\/9f\/7a33d79684f72cfccba4cefyy071ba9f.jpg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖1 EasyEdge支持多種模型框架轉換"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"EasyEdge對於端設備的支持也是很廣泛的,既支持常見的通用芯片CPU、GPU以及通用arm設備,也支持市面上主流的專用芯片,如Intel Movidius系列,海思NNIE等,如圖2所示,EasyEdge目前已建設爲業界適配最廣泛的端與邊緣服務平臺。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/0e\/8f\/0e3163fbe4ce9fb9192d61d88d98398f.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖2 EasyEdge支持多種硬件設備部署"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":" "}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"解析EasyEdge中的模型壓縮技術"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了能實現多種網絡在不同芯片的高效部署,"},{"type":"text","marks":[{"type":"strong"}],"text":"EasyEdge後臺提供了多種優化操作,如模型格式轉換、圖優化、芯片優化、模型低精度計算優化、模型裁剪和蒸餾等"},{"type":"text","text":"。其中模型壓縮技術是至關重要的"},{"type":"text","marks":[{"type":"strong"}],"text":"一環,"},{"type":"text","text":"EasyEdge中用到的模型壓縮技術包括常見的"},{"type":"text","marks":[{"type":"strong"}],"text":"模型低精度計算"},{"type":"text","text":","},{"type":"text","marks":[{"type":"strong"}],"text":"結構化裁剪"},{"type":"text","text":"以及"},{"type":"text","marks":[{"type":"strong"}],"text":"模型蒸餾"},{"type":"text","text":"等。如圖3所示,爲了更好的適配端設備,EasyEdge 集成了多種模型壓縮庫,可根據實際部署情況靈活調用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/ac\/10\/acf94d1c69571f4fa89bd59cce8f8910.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖3 EasyEdge中的模型壓縮技術"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"模型低精度計算旨在通過少量的比特去表示原本32bit的浮點數據。一方面是爲了壓縮模型體積大小,對於較大的模型可以使端側設備更快地將模型load到內存中,減小IO時延,另一方面,通常處理器對於定點的計算能力會強於浮點,因此量化後的模型往往可以被更快的推理計算。如圖4所示,分佈不規則的浮點數據被量化到少數幾個定點。"},{"type":"text","marks":[{"type":"strong"}],"text":"EasyEdge支持包括常見低精度類型包括FP16和INT8"},{"type":"text","text":",其中INT8量化技術能提供最大限度的無損壓縮。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/24\/29\/2485669891b5a937b60211b635df8229.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖4 模型量化"},{"type":"sup","content":[{"type":"text","text":"["}]},{"type":"sup","content":[{"type":"text","text":"1]"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"INT8量化技術的實現方法大致分爲兩種,訓練後量化和訓練中量化"},{"type":"text","marks":[{"type":"strong"}],"text":"。"},{"type":"text","text":"顧名思義訓練後量化就是在已經訓練好的FP32模型中插入量化節點,通過統計學方法儘可能通過少量定點數去還原原來的浮點數據,而訓練中量化會在訓練之前就插入模擬量化節點,在訓練過程中就模擬量化後的數據去計算各個節點的output,這樣模型最終會擬合收斂到模型量化後最優。如圖5所示。相比之下訓練中量化具有更好的精度,但是需要耗費更長的時間。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/37\/78\/37ea582882cc45165b49ea87091c7178.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖5 訓練量化原理"},{"type":"sup","content":[{"type":"text","text":"["}]},{"type":"sup","content":[{"type":"text","text":"2]"}]},{"type":"sup"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"EasyEdge同時具備訓練中量化和離線訓練量化的能力,並且會根據不同的實際情況選擇不一樣的量化方法"},{"type":"text","text":"。深度學習模型中,分類模型最終一般會以計算最終Layer的topK最爲最終的輸出結果,這種性質就決定了模型更注重最終輸出的排序關係而非數值本身的大小,因此分類模型相比於基於數值迴歸的檢測模型具有更強的量化魯棒性。基於這一點,"},{"type":"text","marks":[{"type":"strong"}],"text":"EasyEdge的量化策略會根據模型類型靈活調整"},{"type":"text","text":",在分類相關任務中會傾向於使用離線量化技術,以縮短髮布時長,而基於anchor迴歸的一系列檢測模型中則更傾向於通過再訓練來保證精度。另一方面,根據端側設備、部署框架不同,EasyEdge採取的量化策略也會有所區別。例如在使用PaddleFluid框架將模型部署到CPU上時,較敏感的OP在量化之後會極大的影響最終精度,因此在EasyEdge中這些OP的輸入輸出數據類型採用FP32,而其餘OP的計算會採用INT8。"},{"type":"text","marks":[{"type":"strong"}],"text":"這種Layer級別的混合精度量化策略可以很好的平衡推理速度和精度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在離線量化過程中,會出現部分outlier數據點距離中心分佈太遠的情況,這會導致傳統的量化策略會過大的預估量化range,而導致最終量化精度較低,如圖13所示。爲了應對這種情況,"},{"type":"text","marks":[{"type":"strong"}],"text":"EasyEdge集成了後校準技術"},{"type":"text","text":",通過多次迭代以尋找更合適的閾值,使量化後INT8數據分佈和量化前FP32數據分佈具有最小的KL散度,以此來降低量化誤差。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"模型裁剪通常指的是結構化裁剪。結構化裁剪是通道級別的裁剪,如圖6所示,旨在刪除多餘的計算通道。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/76\/cd\/76ec737f4abd755fb1yy712e920400cd.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":" 圖6 模型結構化裁剪"},{"type":"sup","content":[{"type":"text","text":"["}]},{"type":"sup","content":[{"type":"text","text":"3]"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於某一個卷積核的裁剪,如圖7所示,在中間的kernel同時裁剪掉input和output的一個通道時,其輸入輸出tensor對應的通道將減小,這帶來兩方面好處,一方面是在減小卷積核大小之後,模型體積得以減小,減少了推理過程中的IO時間,另一方面tensor本身體積被壓縮,因此相比壓縮之前只需要更少的內存開銷。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"EasyEdge目前採取的就是這種通道裁剪技術"},{"type":"text","text":"。同時在裁剪通道的選擇上,封裝了基於"},{"type":"text","marks":[{"type":"strong"}],"text":"L1-norm、L2-norm和FPGM"},{"type":"sup","content":[{"type":"text","marks":[{"type":"strong"}],"text":"[8]"}],"marks":[{"type":"strong"}]},{"type":"text","text":"等多種方法,並且會根據實際情況靈活選擇。另一方面,裁剪後的模型由於更改了部分Layer的shape,因此可能會影響到網絡拓撲結構的合理性,"},{"type":"text","marks":[{"type":"strong"}],"text":"EasyEdge平臺集成了通道調整方法"},{"type":"text","text":",實現通過廣度優先查找算法,逐個矯正通道數,並且對於部分特殊難以調整的block會配置跳過,保證裁剪算法的合理性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/0a\/fc\/0a21886f7e7b3082bd05986b364cyyfc.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":" 圖7 針對一個卷積核的結構化裁剪"},{"type":"sup","content":[{"type":"text","text":"["}]},{"type":"sup","content":[{"type":"text","text":"4]"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"對於部分模型的裁剪,EasyEdge採用通道敏感度分析技術"},{"type":"text","text":",通過在每個Layer上多次裁剪推理計算最終精度損失來分析各個Layer對於通道裁剪的敏感度。另一方面,"},{"type":"text","marks":[{"type":"strong"}],"text":"EasyEdge還集成了Layer級別的配置裁剪策略"},{"type":"text","text":",通過閾值過濾的方法,在相同壓縮率目標下,儘可能多的保留更敏感的層,以達到最小的精度影響。舉個例子,如圖8所示,一個ResNet50網絡中,通過敏感度分析得出結論,起始層和終止層對裁剪更敏感,因此實施更低的裁剪率,而中間層具有更多的冗餘,因此採用更高的裁剪率。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不僅如此,"},{"type":"text","marks":[{"type":"strong"}],"text":"EasyEdge在上層融合了一些簡單的超參搜索的技術,一方面需要儘可能保留敏感Layer的參數信息,另一方面需要找出最匹配設定壓縮率的模型"},{"type":"text","text":"。例如一個120M大小的模型,在配置裁剪率爲50%的時候,可以精確裁剪到60M左右,這種技術使EasyEdge平臺在模型壓縮層面可以提供更差異化的服務。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/39\/e2\/39502cedcc639498d0dd5ab248d7dee2.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖8 基於敏感度的裁剪,精準的裁剪率控制"},{"type":"sup","content":[{"type":"text","text":"["}]},{"type":"sup","content":[{"type":"text","text":"5]"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於部分模型的加速,"},{"type":"text","marks":[{"type":"strong"}],"text":"EasyEdge使用了基於Hinton"},{"type":"sup","content":[{"type":"text","marks":[{"type":"strong"}],"text":"[9]"}],"marks":[{"type":"strong"}]},{"type":"text","marks":[{"type":"strong"}],"text":"的蒸餾技術"},{"type":"text","text":"。模型蒸餾的目的是利用大模型學習到的知識去調教更小的模型,目的是爲了讓小模型的精度能夠逼近大模型的精度。如圖9所示,一般蒸餾的方法是在同一個session中,將大模型的某些層輸出和小模型的部分輸出以一定的形式關聯,這樣在訓練小模型的過程中,大模型所學到的知識會作用於小模型的梯度反向傳播,促進小模型的收斂。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/86\/71\/868109710c5250bb12fec37c0eac1f71.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":" 圖9 知識蒸餾功能"},{"type":"sup","content":[{"type":"text","text":"["}]},{"type":"sup","content":[{"type":"text","text":"6]"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本次新上功能,主要功能基於模型壓縮框架"},{"type":"text","marks":[{"type":"strong"}],"text":"PaddleSlim"},{"type":"text","text":"開發,EasyEdge平臺基於其中的壓縮功能做了進一步的封裝和優化。想了解更多相關信息可以登錄github搜索PaddleSlim。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(在CPU GPU ARM上的加速效果展示,最後總結下模型壓縮效)我們分別在三種最常用的端設備,即CPU、GPU和ARM上發佈了超高精度檢測模型,具體設備型號如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CPU: "},{"type":"link","attrs":{"href":"http:\/\/ark.intel.com\/products\/92981\/Intel-Xeon-Processor-E5-2630-v4-25M-Cache-2_20-GHz","title":null,"type":null},"content":[{"type":"text","text":"Intel® Xeon® Processor E5-2630 v4"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"GPU: NVIDIA Tesla V100"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ARM: "},{"type":"link","attrs":{"href":"https:\/\/www.t-firefly.com\/product\/rk3399.html","title":null,"type":null},"content":[{"type":"text","text":"Firefly-RK3399"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如圖10所示,其中直方圖中acc1-acc3分別代表不同加速等級,加速等級越高模型被裁剪的通道數越多,縱座標是網絡對單張圖片的推理延時。可以觀察到EasyEdge的模型壓縮能力在三種端上速度收益都很明顯,直觀上看通用CPU上加速效果最好,可以達到超過一倍的速度提升,這也跟EasyEdge平臺在不同端設備上採取的加速方法相關,當多種加速技術同時使用時會取得較大的提升。其中GPU本身擁有更強的算力,因此減小FLOPS對於GPU的加速效果而言略弱於CPU和通用ARM。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/7d\/96\/7d936a51f1c807aa41b20a7307789596.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖10 不同端設備上的加速情況"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼接下來對比一下同一個硬件設備上,不同類型的模型的加速效果。我們實驗了幾種不同精度的模型在Jetson (jetson4.4-xavier)上的推理效果,包括MobileNetv1-SSD、MobileNetv1-YOLOv3和YOLOv3。如圖11所示,acc1-acc3的含義同上,總體來說,新上的模型壓縮功能在犧牲少量精度的情況下最多可以獲得"},{"type":"text","marks":[{"type":"strong"}],"text":"40%"},{"type":"text","text":"左右的速度收益,效果明顯。另一方面,高性能模型的加速效果相比之下會略差一點,因爲高性能模型本身具備一定的加速特質,例如更小的模型體積和更少的FLOPS,因此再提升的空間相比之下更小。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/c8\/42\/c8d6066d13639dff234d6470ec99c342.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖11 不同的檢測模型在Jetson上的推理延時"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實際使用過程中具體的速度提升會根據端側設備和模型類型的不同而有所區別,EasyEdge平臺的模型壓縮能力在後續迭代中也會持續優化和更新。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現在可以體驗一下新功能,在發佈模型的時候可以根據自身需求選擇合適的加速方案,如圖12所示。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/c5\/30\/c5c2efb51ff29287999db73facb51130.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖12 EasyEdge提供多種加速方案"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"發佈模型後可以在評測頁面觀看sdk在端上的推理效果,如圖13所示,最快的加速方案伴隨着較少的精度損失,可將模型速度提升30%。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/9a\/58\/9a66a4df4yy9cb7ed189e33f09d6eb58.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖13 EasyEdge提供模型評測功能"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"EasyEdge的能力也全面集成於飛槳企業版EasyDL和BML中,使用這兩大平臺,可以一站式完成數據處理、模型訓練、服務部署全流程,實現AI模型的高效開發和部署。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"近期,飛槳企業版開展了2021萬有引力計劃"},{"type":"link","attrs":{"href":"https:\/\/ai.baidu.com\/easydl\/universal-gravitation","title":"xxx","type":null},"content":[{"type":"text","text":"https:\/\/ai.baidu.com\/easydl\/universal-gravitation"}]},{"type":"text","text":",爲企業提供AI基金,可用於購買飛槳企業版EasyDL和BML公有云的線上服務,最高可兌換:6000+ 小時的自定義模型訓練時長;590+ 小時的腳本調參;公有云部署400+ 小時配額;或者兌換50 個設備端的 SDK。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"註釋:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"[1] Fang J, Shafiee A, Abdel-Aziz H, et al. Near-lossless post-training quantization of deep neural networks via a piecewise linear approximation[J]. arXiv preprint arXiv:2002.00104, 2020."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"[2] Jacob B, Kligys S, Chen B, et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference[C]\/\/Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 2704-2713."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"[3] Han S, Pool J, Tran J, et al. Learning both weights and connections for efficient neural networks[J]. arXiv preprint arXiv:1506.02626, 2015."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"[4] Li H, Kadav A, Durdanovic I, et al. Pruning filters for efficient convnets[J]. arXiv preprint arXiv:1608.08710, 2016."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"[5] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]\/\/Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"[6] Gou J, Yu B, Maybank S J, et al. Knowledge distillation: A survey[J]. International Journal of Computer Vision, 2021, 129(6): 1789-1819."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"[7] Wu H, Judd P, Zhang X, et al. Integer quantization for deep learning inference: Principles and empirical evaluation[J]. arXiv preprint arXiv:2004.09602, 2020."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"[8] He Y, Liu P, Wang Z, et al. Filter pruning via geometric median for deep convolutional neural networks acceleration[C]\/\/Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 2019: 4340-4349."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"[9] Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network[J]. arXiv preprint arXiv:1503.02531, 2015."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":" "}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章