模型端側加速哪家強？一文揭祕百度EasyEdge平臺技術內核

原創

2021-07-15 13:03

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"近年來，深度學習技術在諸多領域大放異彩，因此廣受學術界和工業界的青睞。隨着深度學習的發展，神經網絡結構變得越來越複雜。複雜的模型固然具有更好的性能，但是高額的存儲空間與計算資源消耗使其難以有效地應用在各硬件平臺上。因此深度學習模型在端上部署和加速成爲了學術界和工業界都重點關注的研究領域。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一方面，有許多深度學習框架可以讓開發者和研究者用於設計模型，每個框架具備各自獨特的網絡結構定義和模型保存格式。AI工程師和研究者希望自己的模型能夠在不同的框架之間轉換，但框架之間的差距阻礙了模型之間的交互操作。另一方面，由於深度學習模型龐大的參數量，直接在邊緣端部署模型會產生較高的時延。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"百度EasyEdge端與邊緣AI服務平臺可以很好地解決上述問題。EasyEdge可以支持多種主流深度學習框架的模型輸入，提供了方便的部署功能，針對業內各類主流芯片與操作系統進行了適配，省去了繁雜的代碼過程，可以輕鬆將模型部署到端設備上。EasyEdge在集成了多種加速技術的同時對外提供多個等級的加速服務，以平衡模型推理時間和精度，一方面可以最大限度的減小模型在端上部署的延時，另一方面可以匹配更廣泛的使用場景。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"EasyEdge支持多種不同類型深度學習模型的部署，包括常見的模型類型包括圖像分類、檢測、分割以及部分人臉檢測、姿態估計。目前EasyEdge支持的經典網絡種類超過60種以及多種自定義的網絡類型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同時EasyEdge支持接入多種深度學習框架，包括飛槳PaddlePaddle、Pytorch、Tensorflow、MxNet等。爲了更方便的實現部署，"},{"type":"text","marks":[{"type":"strong"}],"text":"目前EasyEdge支持部分深度學習框架模型的互轉換"},{"type":"text","text":"，如圖1所示。例如用戶想要在Intel的CPU上使用OpenVINO部署一個Pytorch模型，EasyEdge可以實現經過多次模型轉換，將torch模型格式轉換成OpenVINO IR格式，最後基於OpenVINO部署框架完成模型部署。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/7a\/9f\/7a33d79684f72cfccba4cefyy071ba9f.jpg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖1 EasyEdge支持多種模型框架轉換"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"EasyEdge對於端設備的支持也是很廣泛的，既支持常見的通用芯片CPU、GPU以及通用arm設備，也支持市面上主流的專用芯片，如Intel Movidius系列，海思NNIE等，如圖2所示，EasyEdge目前已建設爲業界適配最廣泛的端與邊緣服務平臺。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/0e\/8f\/0e3163fbe4ce9fb9192d61d88d98398f.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖2 EasyEdge支持多種硬件設備部署"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":" "}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"解析EasyEdge中的模型壓縮技術"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了能實現多種網絡在不同芯片的高效部署，"},{"type":"text","marks":[{"type":"strong"}],"text":"EasyEdge後臺提供了多種優化操作，如模型格式轉換、圖優化、芯片優化、模型低精度計算優化、模型裁剪和蒸餾等"},{"type":"text","text":"。其中模型壓縮技術是至關重要的"},{"type":"text","marks":[{"type":"strong"}],"text":"一環，"},{"type":"text","text":"EasyEdge中用到的模型壓縮技術包括常見的"},{"type":"text","marks":[{"type":"strong"}],"text":"模型低精度計算"},{"type":"text","text":"，"},{"type":"text","marks":[{"type":"strong"}],"text":"結構化裁剪"},{"type":"text","text":"以及"},{"type":"text","marks":[{"type":"strong"}],"text":"模型蒸餾"},{"type":"text","text":"等。如圖3所示，爲了更好的適配端設備，EasyEdge 集成了多種模型壓縮庫，可根據實際部署情況靈活調用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/ac\/10\/acf94d1c69571f4fa89bd59cce8f8910.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖3 EasyEdge中的模型壓縮技術"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"模型低精度計算旨在通過少量的比特去表示原本32bit的浮點數據。一方面是爲了壓縮模型體積大小，對於較大的模型可以使端側設備更快地將模型load到內存中，減小IO時延，另一方面，通常處理器對於定點的計算能力會強於浮點，因此量化後的模型往往可以被更快的推理計算。如圖4所示，分佈不規則的浮點數據被量化到少數幾個定點。"},{"type":"text","marks":[{"type":"strong"}],"text":"EasyEdge支持包括常見低精度類型包括FP16和INT8"},{"type":"text","text":"，其中INT8量化技術能提供最大限度的無損壓縮。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/24\/29\/2485669891b5a937b60211b635df8229.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖4 模型量化"},{"type":"sup","content":[{"type":"text","text":"["}]},{"type":"sup","content":[{"type":"text","text":"1]"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"INT8量化技術的實現方法大致分爲兩種，訓練後量化和訓練中量化"},{"type":"text","marks":[{"type":"strong"}],"text":"。"},{"type":"text","text":"顧名思義訓練後量化就是在已經訓練好的FP32模型中插入量化節點，通過統計學方法儘可能通過少量定點數去還原原來的浮點數據，而訓練中量化會在訓練之前就插入模擬量化節點，在訓練過程中就模擬量化後的數據去計算各個節點的output，這樣模型最終會擬合收斂到模型量化後最優。如圖5所示。相比之下訓練中量化具有更好的精度，但是需要耗費更長的時間。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/37\/78\/37ea582882cc45165b49ea87091c7178.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖5 訓練量化原理"},{"type":"sup","content":[{"type":"text","text":"["}]},{"type":"sup","content":[{"type":"text","text":"2]"}]},{"type":"sup"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"EasyEdge同時具備訓練中量化和離線訓練量化的能力，並且會根據不同的實際情況選擇不一樣的量化方法"},{"type":"text","text":"。深度學習模型中，分類模型最終一般會以計算最終Layer的topK最爲最終的輸出結果，這種性質就決定了模型更注重最終輸出的排序關係而非數值本身的大小，因此分類模型相比於基於數值迴歸的檢測模型具有更強的量化魯棒性。基於這一點，"},{"type":"text","marks":[{"type":"strong"}],"text":"EasyEdge的量化策略會根據模型類型靈活調整"},{"type":"text","text":"，在分類相關任務中會傾向於使用離線量化技術，以縮短髮布時長，而基於anchor迴歸的一系列檢測模型中則更傾向於通過再訓練來保證精度。另一方面，根據端側設備、部署框架不同，EasyEdge採取的量化策略也會有所區別。例如在使用PaddleFluid框架將模型部署到CPU上時，較敏感的OP在量化之後會極大的影響最終精度，因此在EasyEdge中這些OP的輸入輸出數據類型採用FP32，而其餘OP的計算會採用INT8。"},{"type":"text","marks":[{"type":"strong"}],"text":"這種Layer級別的混合精度量化策略可以很好的平衡推理速度和精度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在離線量化過程中，會出現部分outlier數據點距離中心分佈太遠的情況，這會導致傳統的量化策略會過大的預估量化range，而導致最終量化精度較低，如圖13所示。爲了應對這種情況，"},{"type":"text","marks":[{"type":"strong"}],"text":"EasyEdge集成了後校準技術"},{"type":"text","text":"，通過多次迭代以尋找更合適的閾值，使量化後INT8數據分佈和量化前FP32數據分佈具有最小的KL散度，以此來降低量化誤差。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"模型裁剪通常指的是結構化裁剪。結構化裁剪是通道級別的裁剪，如圖6所示，旨在刪除多餘的計算通道。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/76\/cd\/76ec737f4abd755fb1yy712e920400cd.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":" 圖6 模型結構化裁剪"},{"type":"sup","content":[{"type":"text","text":"["}]},{"type":"sup","content":[{"type":"text","text":"3]"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於某一個卷積核的裁剪，如圖7所示，在中間的kernel同時裁剪掉input和output的一個通道時，其輸入輸出tensor對應的通道將減小，這帶來兩方面好處，一方面是在減小卷積核大小之後，模型體積得以減小，減少了推理過程中的IO時間，另一方面tensor本身體積被壓縮，因此相比壓縮之前只需要更少的內存開銷。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"EasyEdge目前採取的就是這種通道裁剪技術"},{"type":"text","text":"。同時在裁剪通道的選擇上，封裝了基於"},{"type":"text","marks":[{"type":"strong"}],"text":"L1-norm、L2-norm和FPGM"},{"type":"sup","content":[{"type":"text","marks":[{"type":"strong"}],"text":"[8]"}],"marks":[{"type":"strong"}]},{"type":"text","text":"等多種方法，並且會根據實際情況靈活選擇。另一方面，裁剪後的模型由於更改了部分Layer的shape，因此可能會影響到網絡拓撲結構的合理性，"},{"type":"text","marks":[{"type":"strong"}],"text":"EasyEdge平臺集成了通道調整方法"},{"type":"text","text":"，實現通過廣度優先查找算法，逐個矯正通道數，並且對於部分特殊難以調整的block會配置跳過，保證裁剪算法的合理性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/0a\/fc\/0a21886f7e7b3082bd05986b364cyyfc.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":" 圖7 針對一個卷積核的結構化裁剪"},{"type":"sup","content":[{"type":"text","text":"["}]},{"type":"sup","content":[{"type":"text","text":"4]"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"對於部分模型的裁剪，EasyEdge採用通道敏感度分析技術"},{"type":"text","text":"，通過在每個Layer上多次裁剪推理計算最終精度損失來分析各個Layer對於通道裁剪的敏感度。另一方面，"},{"type":"text","marks":[{"type":"strong"}],"text":"EasyEdge還集成了Layer級別的配置裁剪策略"},{"type":"text","text":"，通過閾值過濾的方法，在相同壓縮率目標下，儘可能多的保留更敏感的層，以達到最小的精度影響。舉個例子，如圖8所示，一個ResNet50網絡中，通過敏感度分析得出結論，起始層和終止層對裁剪更敏感，因此實施更低的裁剪率，而中間層具有更多的冗餘，因此採用更高的裁剪率。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不僅如此，"},{"type":"text","marks":[{"type":"strong"}],"text":"EasyEdge在上層融合了一些簡單的超參搜索的技術，一方面需要儘可能保留敏感Layer的參數信息，另一方面需要找出最匹配設定壓縮率的模型"},{"type":"text","text":"。例如一個120M大小的模型，在配置裁剪率爲50%的時候，可以精確裁剪到60M左右，這種技術使EasyEdge平臺在模型壓縮層面可以提供更差異化的服務。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/39\/e2\/39502cedcc639498d0dd5ab248d7dee2.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖8 基於敏感度的裁剪，精準的裁剪率控制"},{"type":"sup","content":[{"type":"text","text":"["}]},{"type":"sup","content":[{"type":"text","text":"5]"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於部分模型的加速，"},{"type":"text","marks":[{"type":"strong"}],"text":"EasyEdge使用了基於Hinton"},{"type":"sup","content":[{"type":"text","marks":[{"type":"strong"}],"text":"[9]"}],"marks":[{"type":"strong"}]},{"type":"text","marks":[{"type":"strong"}],"text":"的蒸餾技術"},{"type":"text","text":"。模型蒸餾的目的是利用大模型學習到的知識去調教更小的模型，目的是爲了讓小模型的精度能夠逼近大模型的精度。如圖9所示，一般蒸餾的方法是在同一個session中，將大模型的某些層輸出和小模型的部分輸出以一定的形式關聯，這樣在訓練小模型的過程中，大模型所學到的知識會作用於小模型的梯度反向傳播，促進小模型的收斂。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/86\/71\/868109710c5250bb12fec37c0eac1f71.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":" 圖9 知識蒸餾功能"},{"type":"sup","content":[{"type":"text","text":"["}]},{"type":"sup","content":[{"type":"text","text":"6]"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本次新上功能，主要功能基於模型壓縮框架"},{"type":"text","marks":[{"type":"strong"}],"text":"PaddleSlim"},{"type":"text","text":"開發，EasyEdge平臺基於其中的壓縮功能做了進一步的封裝和優化。想了解更多相關信息可以登錄github搜索PaddleSlim。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"（在CPU GPU ARM上的加速效果展示，最後總結下模型壓縮效）我們分別在三種最常用的端設備，即CPU、GPU和ARM上發佈了超高精度檢測模型，具體設備型號如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CPU: "},{"type":"link","attrs":{"href":"http:\/\/ark.intel.com\/products\/92981\/Intel-Xeon-Processor-E5-2630-v4-25M-Cache-2_20-GHz","title":null,"type":null},"content":[{"type":"text","text":"Intel® Xeon® Processor E5-2630 v4"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"GPU: NVIDIA Tesla V100"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ARM: "},{"type":"link","attrs":{"href":"https:\/\/www.t-firefly.com\/product\/rk3399.html","title":null,"type":null},"content":[{"type":"text","text":"Firefly-RK3399"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如圖10所示，其中直方圖中acc1-acc3分別代表不同加速等級，加速等級越高模型被裁剪的通道數越多，縱座標是網絡對單張圖片的推理延時。可以觀察到EasyEdge的模型壓縮能力在三種端上速度收益都很明顯，直觀上看通用CPU上加速效果最好，可以達到超過一倍的速度提升，這也跟EasyEdge平臺在不同端設備上採取的加速方法相關，當多種加速技術同時使用時會取得較大的提升。其中GPU本身擁有更強的算力，因此減小FLOPS對於GPU的加速效果而言略弱於CPU和通用ARM。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/7d\/96\/7d936a51f1c807aa41b20a7307789596.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖10 不同端設備上的加速情況"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼接下來對比一下同一個硬件設備上，不同類型的模型的加速效果。我們實驗了幾種不同精度的模型在Jetson (jetson4.4-xavier)上的推理效果，包括MobileNetv1-SSD、MobileNetv1-YOLOv3和YOLOv3。如圖11所示，acc1-acc3的含義同上，總體來說，新上的模型壓縮功能在犧牲少量精度的情況下最多可以獲得"},{"type":"text","marks":[{"type":"strong"}],"text":"40%"},{"type":"text","text":"左右的速度收益，效果明顯。另一方面，高性能模型的加速效果相比之下會略差一點，因爲高性能模型本身具備一定的加速特質，例如更小的模型體積和更少的FLOPS，因此再提升的空間相比之下更小。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/c8\/42\/c8d6066d13639dff234d6470ec99c342.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖11 不同的檢測模型在Jetson上的推理延時"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實際使用過程中具體的速度提升會根據端側設備和模型類型的不同而有所區別，EasyEdge平臺的模型壓縮能力在後續迭代中也會持續優化和更新。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現在可以體驗一下新功能，在發佈模型的時候可以根據自身需求選擇合適的加速方案，如圖12所示。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/c5\/30\/c5c2efb51ff29287999db73facb51130.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖12 EasyEdge提供多種加速方案"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"發佈模型後可以在評測頁面觀看sdk在端上的推理效果，如圖13所示，最快的加速方案伴隨着較少的精度損失，可將模型速度提升30%。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/9a\/58\/9a66a4df4yy9cb7ed189e33f09d6eb58.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖13 EasyEdge提供模型評測功能"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"EasyEdge的能力也全面集成於飛槳企業版EasyDL和BML中，使用這兩大平臺，可以一站式完成數據處理、模型訓練、服務部署全流程，實現AI模型的高效開發和部署。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"近期，飛槳企業版開展了2021萬有引力計劃"},{"type":"link","attrs":{"href":"https:\/\/ai.baidu.com\/easydl\/universal-gravitation","title":"xxx","type":null},"content":[{"type":"text","text":"https:\/\/ai.baidu.com\/easydl\/universal-gravitation"}]},{"type":"text","text":"，爲企業提供AI基金，可用於購買飛槳企業版EasyDL和BML公有云的線上服務，最高可兌換：6000+ 小時的自定義模型訓練時長；590+ 小時的腳本調參；公有云部署400+ 小時配額；或者兌換50 個設備端的 SDK。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"註釋："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"[1] Fang J, Shafiee A, Abdel-Aziz H, et al. Near-lossless post-training quantization of deep neural networks via a piecewise linear approximation[J]. arXiv preprint arXiv:2002.00104, 2020."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"[2] Jacob B, Kligys S, Chen B, et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference[C]\/\/Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 2704-2713."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"[3] Han S, Pool J, Tran J, et al. Learning both weights and connections for efficient neural networks[J]. arXiv preprint arXiv:1506.02626, 2015."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"[4] Li H, Kadav A, Durdanovic I, et al. Pruning filters for efficient convnets[J]. arXiv preprint arXiv:1608.08710, 2016."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"[5] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]\/\/Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"[6] Gou J, Yu B, Maybank S J, et al. Knowledge distillation: A survey[J]. International Journal of Computer Vision, 2021, 129(6): 1789-1819."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"[7] Wu H, Judd P, Zhang X, et al. Integer quantization for deep learning inference: Principles and empirical evaluation[J]. arXiv preprint arXiv:2004.09602, 2020."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"[8] He Y, Liu P, Wang Z, et al. Filter pruning via geometric median for deep convolutional neural networks acceleration[C]\/\/Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 2019: 4340-4349."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"[9] Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network[J]. arXiv preprint arXiv:1503.02531, 2015."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":" "}]}]}

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

相關文章

京東廣告研發——效率爲王：廣告統一檢索平臺實踐

1、系統概述實踐證明，將互聯網流量變現的在線廣告是互聯網最成功的商業模式，而電商場景是在線廣告的核心場景。京東服務中國數億的用戶和大量的商家，商品池海量。平臺在兼顧用戶體驗、平臺、廣告主收益的前提推送商品具有挑戰性。京東廣告檢索平臺

2024-04-25 23:17:47

Stable Diffusion中的embedding

Stable Diffusion中的embedding 嵌入，也稱爲文本反轉，是在 Stable Diffusion 中控制圖像樣式的另一種方法。在這篇文章中，我們將學習什麼是嵌入，在哪裏可以找到它們，以及如何使用它們。什麼是嵌入embe

2024-04-25 21:31:13

大模型區域落地再加速！百度“文心中國行”西部首站落地成都錦江

4 月 24 日，“文心中國行”西部地區首站落地成都錦江。成都市錦江區白鷺灣新經濟總部功能區、錦江區投資促進局與百度飛槳攜手合作，打造成都人工智能的新產業、新模式、新業態。來自成都政產學研各界的領導、專家、企業嘉賓，共同探討如何降低 AI

2024-04-25 11:41:53

文心中國行走進成都！4 月 24 日一起把握大模型時代的產業新機遇

4 月 24 日，文心中國行將走進成都。屆時，政府、企業與高校的相關專家和業界同仁將現場分享生成式人工智能與大模型最新進展，從人工智能政策解讀、大模型技術，到產業創新應用的實踐案例，讓參會者全方位瞭解大模型時期的發展與創新機遇。大會還特別

2024-04-23 11:41:07

文心大模型“你說我畫”：PaddleHub與PaddleSpeech的協同實踐

在人工智能領域中，自然語言處理和計算機視覺是兩個非常活躍的研究方向。隨着深度學習技術的發展，這兩個領域之間的交叉融合產生了許多令人興奮的應用場景。其中，“你說我畫”就是這樣一個結合自然語言處理和計算機視覺技術的創新應用。 “你說我畫”的核心

2024-04-22 11:29:20

探索時間序列大模型：TimeGPT的魅力與實踐

在數據科學的各個領域中，時間序列分析一直扮演着重要角色。無論是預測股票價格、氣候變化，還是分析醫療數據，時間序列模型都發揮着不可或缺的作用。然而，傳統的時間序列分析方法在處理複雜數據時常常面臨諸多挑戰，如數據稀疏性、非線性關係等。爲了應對這

2024-04-22 11:29:17

京東廣告研發——AIGC在京東廣告創意的技術應用

一、前言電商廣告圖片不僅能夠抓住消費者的眼球，還可以傳遞品牌核心價值和故事，建立起與消費者之間的情感聯繫。然而現有的廣告圖片大多依賴人工製作，存在效率和成本的限制。儘管最近AIGC技術取得了卓越的進展，但其在廣告圖片的應用還存在缺乏

2024-04-22 11:16:30

Create 2024 分論壇：百度大模型安全解決方案護航開發者一起創造未來

4月16日，百度Create AI開發者大會在深圳國際會展中心（寶安）舉行，大會以“創造未來”爲主題，匯聚了當前科技和產業革命中的開發者先鋒力量。自去年3月16日發佈知識增強大語言模型文心一言以來，百度不斷推動文心大模型的升級迭代，每一次版

2024-04-19 21:33:25

AI大模型應用架構（ALLMA）白皮書解讀

隨着人工智能技術的不斷髮展，AI大模型成爲推動生產、生活方式變革，助推產業智能化轉型升級，驅動數字經濟高質量發展等社會經濟發展方面的新引擎。爲了全面展示AI大模型的發展全貌，爲各界提供新思路，本文將對AI大模型應用架構（ALLMA）白皮書進

2024-04-19 11:29:39

文心大模型ERNIE-Tiny：輕量化技術的全面解讀

隨着人工智能技術的日益成熟，大模型成爲了衆多領域的研究熱點。大模型通過龐大的數據量和複雜的網絡結構，實現了對數據的深度挖掘和高效處理。然而，大模型的龐大體積和高計算成本也限制了其在一些實際場景中的應用。爲了解決這一問題，文心大模型ERNIE

2024-04-18 11:29:53

文檔圖像大模型

隨着信息技術的快速發展，文檔處理已經成爲日常生活和工作中不可或缺的一部分。傳統的文檔處理方法往往需要人工參與，效率低下且易出錯。近年來，隨着深度學習技術的突破，文檔圖像大模型在智能文檔處理領域嶄露頭角，爲提升文檔處理性能提供了新的解決方案。

2024-04-18 11:29:52

王海峯：百度 500 萬 AI 人才培養目標已提前達成

4 月 16 日，以“創造未來”爲主題的 Create 2024 百度 AI 開發者大會在深圳國際會展中心成功舉辦。百度首席技術官王海峯以“技術築基，星河璀璨”爲題，發表演講，解讀了智能體、代碼、多模型等多項文心大模型的關鍵技術和最新進展。

2024-04-17 23:41:11

提高 RAG 應用準確度，時下流行的 Reranker 瞭解一下？

檢索增強生成（RAG）是一種新興的 AI 技術棧，通過爲大型語言模型（LLM）提供額外的“最新知識”來增強其能力。基本的 RAG 應用包括四個關鍵技術組成部分： Embedding 模型：用於將外部文檔和用戶查詢轉換成 Embeddi

2024-04-17 21:20:19

從零開始學習大模型

隨着人工智能技術的快速發展，大模型已成爲許多領域的熱門話題。然而，大模型的創建並不是一件容易的事情。在本文中，我們將從零開始學習如何創建一個大模型，幫助讀者掌握大模型的創建過程。一、數據收集創建大模型的首要任務是收集數據。數據是大模型的

2024-04-16 11:29:26

倒計時4天！百度Create AI開發者大會“大模型與深度學習技術”論壇亮點搶鮮看！

作爲人工智能的核心基礎技術，深度學習具有很強的通用性，大模型技術在深度學習的基礎上，通過構建更加龐大神經網絡模型和應用transformer等更加領先的算法，使模型的處理能力產生質的飛躍。飛槳（PaddlePaddle）以百度多年的深度學習

2024-04-12 21:33:07

24小時熱門文章

Golang爬蟲代理接入的技術與實踐

最新文章

最新評論文章