你竟然是這樣的端智能?

原創

字节跳动技术团队

2021-08-09 14:39

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"很久以前，我還是個保潔員，直到有一天上帝說不了解端智能的保潔員不是好保潔員，於是我向隔壁小哥偷學了端智能這項技術，寫下了這篇文章，如有錯誤，請找隔壁小哥~"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文將談談端智能以及端智能在西瓜視頻的發展。你可能已經聽說過端智能，這猶抱琵琶半遮面的樣子真是令人心癢，今天咱就扯下它的面紗看看。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"從雲智能到端智能"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可能有很多人納悶不是要聊端智能麼？怎麼還牽扯到邊緣計算了，莫不是爲了湊字數？實際上從雲智能到端智能的延伸實際上也是雲計算到邊緣計算的擴展。我們知道雲計算自身有着許多的優點，比如龐大的計算能力，海量的數據存儲能力等等。現在很多對外提供服務的 APP 本質上都是依賴各種各樣的雲計算，比如直播平臺、電商等等。但新型用戶體驗追求更加及時，穩定性的服務，這就促使我們想要將服務儘可能的部署在物理設備所在位置，最終推動了邊緣計算的發展。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從雲智能到端智能，本質也是如此。以無人駕駛爲例，總不能在駕駛過程由於網絡波動導致汽車停止對路況的計算吧，或者說在萬分之一秒內還不確定是要剎車還是加速吧？"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/08\/08593602ea653107aafde9ddc245218d.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那究竟什麼是邊緣計算呢？邊緣計算與所謂的端智能又有什麼聯繫？繼續往下看~"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"邊緣計算"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"邊緣計算是一種致力於使計算儘可能靠近數據源，以減少延遲和帶寬使用的網絡概念。通常來說邊緣計算意味着在雲端運行更少的進程，將更多的進程移動到本地設備，比如用戶的手機，IOT 設備或者一些邊緣服務器等。這樣做的好處就是，將計算放到網絡邊緣可以最大程度減少客戶端和服務器之間的通信量及保持服務穩定性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不難發現，邊緣計算本質是一種服務，類似於我們現在的雲計算服務，但這種服務非常靠近用戶，能夠給用戶提供更快速的響應，簡單點說就是幹啥啥快。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/96\/96002122db13d10061359d3972c0bbcb.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"需要注意的是，邊緣計算並非用來取代雲計算的，如果說雲計算更注重把控全局，那邊緣計算則聚焦於局部，本質上邊緣計算是對雲計算的一種補充和優化，可以幫我們從數據源頭近乎實時地解決問題，凡是在需要減少時延或追求實時目標的業務場景下，都有它的用武之地，比如：計算密集型工作，人工智能等場景。這裏的人工智能的場景就涉及到我們下面要說的端智能了。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"端智能的本質"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於人工智能，想必我們已經見怪不怪了，尤其是以頭條\/抖音\/快手爲代表的應用，都是將機器學習用到極致的產品，此外像掃地機器，無人車等硬件設備也是人工智能落地的最佳示例。那我們要說的端智能又是什麼角色呢？"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"和從雲計算到邊緣計算的發展類似，人工智能的發展也經歷從雲到端的過程，我們常說的端智能實際上就是把機器學習放在端側去做。這裏的端側是想相對於雲端而言的，除了我們常見的智能手機外，端側設備也包括各種 IOT 設備，嵌入式設備等，如語言翻譯器、監控攝像頭等，當然無人車也屬於該領域。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/3c\/3c1d1a213377ceeb7cdd1cf0d209870b.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從 2006 年開始，人工智能進入第三次發展階段，並以 AlphaGo 先後戰勝李世石和柯潔宣告新時代的到來，這背後得益於："}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大數據的發展及硬件算力提升，CPU、GPU 及專用計算單元；"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"深度學習算法及框架不斷演進，從 Torch、Caffe 到 TensorFlow、PyTorch 等。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"與此同時，端側設備同樣在算力，算法及框架有了突飛猛進的發展："}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"算力"},{"type":"text","text":"：CPU、GPU 性能提升，手機算力不斷增長，以及專門用於 AI 的神經網絡處理芯片逐漸成爲標配，已經能夠不同程度上支持算法模型的運行；"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"框架"},{"type":"text","text":"：面向移動端的機器學習框架的誕生讓我們能更輕鬆的在端側應用機器學習。在手機側，Apple 的 Core ML，Google 的 NNAPI 提供了系統級別的支持，除此之外業內也有很多優秀的端側框架：TensorFlow Mobile\/Lite、Caffe2、NCNN、TNN、MACE、Paddle&Paddle Lite、MNN 及 Tengine 等；"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"算法"},{"type":"text","text":"：模型壓縮技術不斷髮展，其中量化已經非常成熟，基本能夠實現在不降低精度的情況下將模型縮小爲原來的 1\/4~1\/3；此外，針對端側的算法模型在不斷的優化過後，架構設計變得越來越成熟，對於端側設備的兼容性也愈加友好。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當然，我們都知道最終驅動雲智能和端智能發展的根本動力是實實在在的產品需求。從端智能的角度來看，由需求驅動的 AI 場景應用已經在軟硬件多方面成爲產品的主要賣點，比如手機攝像能力的提升，除了攝像頭等硬件的發展外，各種圖片處理算法的發展也功不可沒；再像抖音、快手等應用的各種模型特效，在簡化創作的同時也極大改善了產品的形態。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"爲什麼要做端智能?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"要知道爲什麼做端智能本質就是需要搞清楚當前雲智能面臨什麼樣的問題。正如之前所說，雲智能的發展建立在現代需求之上，它主要的優勢是："}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"面向海量數據，這意味我們能通過不斷的累積數據，尋求問題的最優解；"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"設備資源充足，算力強大；"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"算法規模龐大，空間不受限制；"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但這種依賴雲端的架構方式也有着相應的缺陷："}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"響應速度：依賴於網絡傳輸，穩定性和響應速度無法得到保障，對於實時性要求高的情況而言，該問題近乎於無解；"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"非實時性，脫離即時 Context，用戶敏感度差；"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而端智能的優勢恰好能作爲雲智能劣勢的補充："}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"響應速度快：在端側直接獲取數據進行處理，不依賴網絡傳輸，穩定性和響應度 OK；"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實時性高：端側設備實時觸達用戶，不存在中間商賺差價，能夠實時感應用戶狀態；"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除此之外，由於數據和生產和消費都是在端側完成，對於敏感數據可以保證隱私性，規避法律風險；另外我們可以進行更精細化策略，有可能實現真正的實現千人千面。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"端智能的侷限"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"前文我們說了端側智能能夠補充雲智能的幾個缺點，但它也有着自己的問題："}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"和雲側硬件設備相比，端側資源&算力相對有限，無法進行大規模持續計算，模型不宜過於複雜；"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"端側往往只是單用戶數據，數據規模小，全局最優解難尋。此外由於端側應用生命週期不可控，數據週期往往較短。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在這兩個條件的制約下，端智能往往意味着是隻做推理，即雲端訓練模型下發到本地後做推理。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/95\/95a441ade58d7137546c5f438a9abccd.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當然隨着技術的演進，現在我們也開始探索"},{"type":"text","marks":[{"type":"strong"}],"text":"聯邦學習"},{"type":"text","text":"，它本質上是一種分佈式機器學習技術，主要希望在保證數據隱私安全及合法合規的情況下，可以有效解決數據孤島問題，讓參與方在不共享數據基礎上實現聯合建模，提升模型的效果。在這裏，端上能做的就不僅僅是推理了還有學習過程。"}]}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"5G 與端智能"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"之前有同學提到過既然 5G 又快又好，那完全沒必要搞端智能啊。這裏簡單談談我自己的理解。(如有不對之處，請找隔壁小哥)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"5G 的高速連接和超低時延，目的在於幫助規模化實現分佈式智能，也就是"},{"type":"codeinline","content":[{"type":"text","text":"雲-端"}]},{"type":"text","text":"智能一體化。來具體解釋下：5G 給端和雲之間的連接提供了更穩定，更無縫的的支撐，此時網絡就不再是端雲互通的瓶頸了，但伴隨萬物互聯時代的到來，數據規模將會進一步暴增，進入超大數據規模的時代，此時服務端的算力就會成爲瓶頸，那麼此時在端側對於數據的預處理和理解也就更加重要，這就需要端智能的介入。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"舉個安防領域的例子，現在很多家庭攝像頭都支持雲存儲，如果我們希望只在有人活動的時候纔會將視頻保存到雲端，該怎麼做呢？最好的方式是先在端側藉助 AI 進行圖像識別，然後將包含人的視頻片段上傳到雲端，而不是先將所有視頻上傳到雲端再進行圖像處理（裁剪掉無用的視頻片段），前者既可以節省流量帶寬，又可以節省雲側算力，這裏端側對視頻的處理也就是我們剛纔說的對數據的預處理和理解。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/c3\/c32e7af21fc79f4d5b94fdf611580b99.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於 AR 來說，端智能爲 AR 提供交互能力，5G 則能滿足 AR 需要的網絡傳輸能力，在這兩者的加持下，具備互動性，高品質的 AR 技術會促進真實世界與虛擬環境的融合。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/d8\/d87d60e0d248ad2eaf7c92f5b98566e9.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"西瓜爲什麼做端智能?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上述內容純粹就是簡單介紹下端智能的前世今生，口水內容，簡單看一分鐘也就行。光說不練假把式，下面就簡單介紹下西瓜在端智能上的探索及落地，很多同學知道我們在搞，但不知道搞了個啥，今天咱就揭開這道神祕的面紗。首先來看我們西瓜爲啥要做端智能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在後文展開之前必須得先回答一個問題：西瓜爲什麼要做端智能？或者說，西瓜對於端智能的定位是如何的？"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"西瓜視頻作爲一款服務於廣大中長視頻消費用戶的產品，爲了滿足不同細分羣體的需求必然要進行更精細化的產品運營策略，但之前我們的的策略往往侷限於服務端策略或者實驗場景下得出的策略池，整個過程中客戶端更多的作爲策略的接受者，被動響應且實時性不夠；"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着西瓜視頻品質化專項的不斷推進，在很多場景下也取得了顯著的收益，伴隨而來問題就是簡單易行的策略已經不再能滿足我們的胃口，要想再取得收益需要更細緻靈活的策略手段。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"西瓜視頻一直在致力於爲用戶提供更優價值的內容，而播放作爲西瓜核心功能，致力於爲用戶打造更加智能化的播放器，創造更好的視聽環境，這無疑對我們客戶端播放策略上提出了更高的要求。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於視頻類產品而言，帶寬成本不容小視，我們希望在不影響播放體驗的同時，能儘量的減少無用帶寬的浪費。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"介於上述幾個因素，我們在用傳統客戶端技術解決問題的同時也對端側 AI 能力有了興趣，並於 20 年 8 月份開始調研，最終我們確立下兩大目標：策略精細化和開源節流。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/45\/45f28f62a6fbbbecd17342eae7e3e073.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在這兩個目標的指導下，進一步結合西瓜的需求，我們主要聚焦在下述兩個領域："}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"業務領域：充分發掘相關消費側(播放器)和創作側可優化決策點，降低業務成本，打造更智能化的決策體系；個別場景藉助端側 AI 能力打造更好的產品功能；"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"技術領域：嘗試端側 AI 能力和穩定性\/編譯優化結合，自動分析並追蹤相關異常鏈；嘗試建立端側特徵庫，實現熱點功能\/插件調度決策。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"西瓜端智能"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"關於西瓜爲什麼下手端智能，大概是說明白了。但總體該怎麼下手落地呢?從表面上看無非就是將原有服務端的體系搬到客戶端來做，變化的只是前後端而已，但實際上從雲到端的遷移，不但涉及端雲技術鏈路的變化，而且更加強調不同領域工程師的合作：從客戶端角度出發，我們關注工程架構及交互，但從算法角度來看，更加關注的數據和模型，這兩者關注點的不同就決定了在端智能落地過程會存在比較大的困難。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"基本流程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先來簡單瞭解下端智能從雲到端的幾個階段：從雲到端，主要涉及算法、模型訓練、模型優化、模型部署以及端側推理預測。按照參與方來看主要涉及算法工程師和客戶端工程師，按照環境則可以分爲雲端工程和客戶端工程，如下所示："}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/a7\/a72fece111a1789150a7207d302f3051.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"簡單概括一下流程就是：首先算法工程師針對特定場景設計算法並訓練模型，然後對該模型進行優化，在降低模型體積的同時提高算法&模型執行效率，接下來在模型部署階段將優化過的模型轉換爲端側推理引擎支持的格式並部署到移動設備上，客戶端工程師針對該場景做算法移植及業務改造，並在合適的時機通過推理引擎加載模型並執行推理預測。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"三座大山"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上述我們用一句話概括了端智能落地的流程，但實際過程中要遠比上圖提到的更復雜："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/9b\/9bc7641c35abae403443df8cc5059f2a.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如上圖所示，不難發現端智能落地鏈路總體比較長，任何一個節點出現問題都會涉及比較長的排查路線；需要算法工程師和客戶端工程師的高度協作，但由於算法和客戶端兩側的技術棧差異比較大，知識領域也存在比較大的差異，因此協作成本其實蠻大："}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"客戶端工程師對機器學習的算法和原理相對較少，這就意味着在很多時候客戶端把這塊當做黑盒，無法針對端側設備的情況和業務邏輯給出一定的參考意見，也無法有效發現模型推理異常問題，對總體的運行結果不可預期；而如果涉及到算法移植這塊，問題就更多了，雞同鴨講的情況也不是不存在；"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"算法工程師缺少對端側設備的瞭解，端側設備環境不如服務端環境單一，設備環境複雜度，如何針對端上設計出更高效的模型，避免側推理影響到端側性能指標是重中之重；算法移植到端側是否符合預期也有待考量。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除此之外，端側推理引擎如何兼容複雜的端側設備，保證高可用性，實現一體化監控和模型部署等等也是端智能在落地過程要重點解決的問題。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/14\/14f47c9946403d87c2028bbf921f6165.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"端雲一體部署"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"前文我們提到端智能在落地過程中的三個問題：鏈路長，協作成本高，推理部署複雜。來具體分析看下問題及相應的解決策略："}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於算法設計到模型訓練，公司內的 MLX 平臺提供了模型訓練到服務端部署的能力，但並不支持雲上模型訓練到部署端上的鏈路，我們需要將端上和 MLX 平臺進行打通能夠實現模型部署及模型格式轉換；"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"傳統上端智能在端側依賴算法和模型兩部分數據，之前算法需要在雲測設計完成後移植到端側，存在一定的移植成本，且缺乏快速部署、更新的能力；同時算法和模型屬於分離狀態，在算法和模型多次迭代後，版本管理相對複雜，兼容性難以保障，如果算法和模型能夠綁定動態下發，在減少協作成本的同時也可以簡化鏈路；"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"端側環境複雜(尤其是 Android 設備，各種配置不一，硬件環境也更爲複雜)，推理引擎至關重要。在開始落地之前，我們調研了 TensorFlow Lite、TNN、MNN 及 Paddle-Lite 以及公司內部的 ByteNN，最終我們選擇 ByteNN。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲什麼是 ByteNN?"}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"ByteNN 作爲抖音、TikTok 的統一 AI 基礎引擎庫，已經在很多產品接入了，不需要額外增加體積；"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"針對 ARM 處理，Adreno\/Mali GPU，Apple GPU 進行定向性能調優，支持多核並行加速，性能 OK；"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"終端兼容性廣，覆蓋了全部 Android 設備及 iOS，可用性較高；"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"支持 GPU、GPU、NPU、DSP 等處理器，有廠商加持，通用性不成問題。"}]}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"阻礙在面前的三座大山已然明瞭，要做的事情也很清晰。對於端智能，我們和平臺 Client Ai 團隊的想法不謀而合，開始共同推動端智能在字節業務策略上的探索及落地。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"藉助 Pitaya 方案，我們可以將更多的精力放在算法設計和業務場景挖掘中。Pitaya 是啥呢？簡單來說，它是將雲上 MLX 環境和端側環境打通，將算法和模型統一以算法包的形式部署到端上，並對端側特徵工程予以支持；同時端側內部集成運行容器，並驅動推理引擎(ByteDT、ByteNN)工作的一整套方案，其基本架構大概如下："}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/d2\/d278bdbd3159becbc5e86ff51025b822.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現在整體鏈路簡化了不少，我們可以更聚焦於業務場景的算法設計了，同時藉助於高效的運行容器，可以實現算法包的快速部署，提高整體的迭代效率，端雲之間的聯動性更強："}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/f6\/f6b77f599617c3dcd9480e42bdf1e274.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"西瓜視頻智能預加載"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"西瓜視頻橫屏內流在交互上可以理解爲是一個橫屏版的抖音，是西瓜的核心消費場景，如下圖所示："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/4d\/4d31684dc494c125d10bdb2d4a5b0baa.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"作爲西瓜視頻的核心場景，其播放體驗至關重要。爲了實現較好的視頻起播效果，該場景上了視頻預加載策略：在當前視頻起播後，預加載當前視頻後的 3 個視頻，每個 800K。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"800K 是實驗值，旨在不影響卡頓的情況下儘可能的降低成本，衆所周知，預加載在顯著提高播放體驗的同時也會帶來帶寬成本上的提升。對於 800K 大家可能沒有直觀感受，以 720P 視頻爲例，假設平均碼率是 1.725Mbps，那麼 800K 的視頻大小也就是 4 秒左右。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"方便起見，我們通過一個圖來展示 3*800 的流程："}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/df\/df489d4ba704d231ac80bd9155a79ce3.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不難發現，只有在首個視頻的情況下才會一次性預加載三個視頻，再往後就變成了增量預加載 1 個視頻的操作，端上始終保留 3 個視頻的預加載。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"爲什麼是端智能的方案?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上述方案足夠粗暴簡單，但也存在不少侷限，即帶寬和播放體驗平衡不夠："}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"某些情況下用戶不會看完 800K 的緩存視頻，簡單瀏覽標題或前幾秒內容後就劃到下一個，造成帶寬浪費；"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"或者當用戶想要認真觀看視頻時，有可能因爲沒有足夠的緩存，容易導致起播失敗或者卡頓，進而影響用戶體驗。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此外上線預加載方案的時候，我們的數據分析師也提到："},{"type":"text","marks":[{"type":"strong"}],"text":"減少預加載大小雖然可以降低成本，但是卡頓劣化嚴重，短期採用 3*800K 的固定方案，長期推動動態調整預加載大小方案上線"},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了傳統對預加載任務管理機制的優化外，我們開始重新思考如何衡量預加載的有效任務率？"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最理想的情況是預加載的大小和播放大小一致，這樣視頻預加載的效率就是 100%。如果我們預加載了 1000K，但用戶只看了 500K，那預加載的效率就是 50%，這浪費的 500K 可都是金燦燦的 💰；最糟糕的情況是我們預加載 1000K，但用戶壓根沒看，不好意思，此時預加載的效率就是鴨蛋了，我們白白浪費了大把的鈔票。現在我們來具體定義下用來評估預加載效率的公式："}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/d7\/d7ab342f322e0326d354cd77f62b6321.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不難發現，最理想的預加載策略就是使預加載大小和播放大小盡量的匹配，用戶在起播階段會看多少，我們就預加載多少，這樣既能提高播放體驗，又能減少帶寬浪費。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那如何來做呢？我們希望基於端側 AI 的能力，對用戶操作行爲進行實時分析，進而實現調整預加載策略，在提高用戶播放體驗的同時，避免帶寬浪費。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"優化方案"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如上述公式所示，提高視頻預加載效率的關鍵就是預測用戶播放時長，使視頻預加載的大小和視頻播放大小盡可能的接近。但預估每個視頻用戶會看多久是一件很難的事情，會有很多個因素，比如用戶對視頻的感興趣程度，心理狀態等等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"拋開這些複雜的因素和變量，我們來重新思考視頻預加載的行爲和用戶的瀏覽行爲之間的相關性。我們假設存在兩種典型行爲的用戶："}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"快速滑動，短暫瀏覽"},{"type":"text","text":": 用戶處於蜻蜓點水狀態，後續每個視頻只會簡單瞄一眼，主動切換視頻的頻率較高；"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"慢速滑動，仔細觀看"},{"type":"text","text":": 用戶傾向於觀看每個視頻，主動切換視頻的頻率較低。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果我們在端上能夠根據用戶的瀏覽行爲，通過模型判斷出用戶在接下來的瀏覽中是傾向於那種行爲的話，那是不是可幫助我們進一步來優化視頻預加載策略了呢？此時預加載的流程就變成了如下所示："}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/06\/06bca1d791c47b33d5b69ebebec73784.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了通過端智能的方案提高預加載的效率，實現更精準的預加載調度外，在端上也有配套的業務調整和優化方案，比如聚合推理參數，模型觸發時機調整，預加載任務管理完善等手段，兩者互爲補充。其中涉及到端智能的流程如下所示："}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/41\/419361991aace5e2b82f6903d3b0f7a5.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"特徵工程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們通過線上用戶埋點數據，提取橫屏內流相關埋點數據進行標記並進行相關性分析，多次迭代後最終篩選出與用戶瀏覽行爲的相關特徵。此外，藉助 Pitaya 端上特徵的能力，可以方便的從不同的數據源中獲取實時特徵，並將其轉化成模型輸入的數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"算法選型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"機器學習的算法很多，從傳統的的決策樹、隨機森林到深度神經網絡，都可以用來實現端側 AI 的能力。但需要根據端側設備的情況，在能滿足場景效果和需求的前提下，重點考慮模型的體積和性能：模型體積越小越好，性能越高越好。比如在該場景下，用 DT 和 NN 都可以對用戶模式進行分類，但要綜合評估模型體積，性能和效果，最後決定採用那種算法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"推理預測"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先來看決策的時機，每滑到一個視頻並且在該視頻(非首刷視頻)起播後觸發算法包，等待決策結果並調度後續預加載任務。簡要流程如下所示："}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/c8\/c85dd55e6ebae8652d9d0bbc29c5925d.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在觸發推理時，主要依賴兩部分數據：一種是已經觀看過視頻的數據，另外則是後續未觀看視頻的詳情數據。前者是 Pitaya 主動通過埋點日誌獲取，後者則是需要業務方聚合後主動透傳。同時端上也需要考慮容錯機制，以便在算法包異常和推理失敗的情況及時回滾策略。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"預加載任務調度"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"決策結果給出後，我們會根據最近兩次推理結果進行調度，主要是實現對預加載任務數量的調整(保留有效任務)及每個預加載視頻大小的調整(增量調整)。此處涉及業務細節較多，就不做過多解釋了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"端側性能監控"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了端側常規的監控外(性能、業務指標)，還涉及模型指標的監控，比如執行成功率、推理耗時、PV\/UV，當然也包括 accuracy 等模型的效果指標。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/81\/817459afb64e1013a02a0127662cfea3.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"問題&挑戰"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"西瓜從去年 12 月份開始落地該場景，到明確拿到收益並全量上線，最終大概歷 1.5 個雙月，加上春節假期得有 2 個雙月了，這是爲啥呢?從開始的實驗大規模負向到正向收益，我們遇到了那些問題？"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"推理耗時"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"前期多次實驗中(v1.0~v3.0)我們發現智能預加載組比線上策略在首幀耗時等指標上有劣化，比如首幀耗時上升 2ms 等。在通過各種技術手段及數據分析後，我們最終定位出端上聚合推理參數、推理耗時等因素會間接影響大盤指標。爲此，我們在端上提出了多種優化手段，如步長推理、異步調度、首推預熱、模型優化等方案，最終打平了相關指標，並確定顯著收益。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"帶寬成本"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"帶寬成本覈算受多個因素影響，容易存在偏差，尤其是在像西瓜視頻這麼複雜的業務場景下，如何準確衡量該方案帶來的帶寬收益非常複雜。我們在播放器及視頻架構團隊多位同學的支援下，最終明確實驗組（95 分位）帶寬成本比線上組平均降低 1.11%。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/f5\/f5c6e5fcde42f928e7f7a0cd51ec6c29.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"峯值模型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"很多情況下，用戶的習慣是呈現階段性，爲了我們也定向提出峯值模型，用於在用戶使用的高\/低峯時間段內做出更細緻化的決策。（限於業務相關性太大，具體就不做解釋了。有想了解的同學可以加入字節一起做同事！）"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"實驗收益"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過一系列的週期實驗和策略驗證，最終明確了智能預加載的收益:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"總帶寬 -1.11%，預加載帶寬 -10%；"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"中視頻播放失敗率 -3.372%，未起播失敗率 -3.892%，卡頓率 -2.031%，百秒卡頓次數 -1.536%，卡頓滲透率 -0.541%。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除此之外，實驗組其它指標也有正向收益，但限於不夠顯著，我們在此就不一一說明了。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"現狀&總結"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"西瓜端智能現狀"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當前智能預加載在 Android 端已經全量上線，同時 iOS 端也在接入過程中。對於西瓜視頻而言，智能預加載只是個起點，我們正在探索更多場景的可能性，爲用戶提供更智能化的播放體驗。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"無論是雲智能還是端智能，最終帶給我們的是解決問題的新思路，坦率的說我也不知道端智能對未來意味着什麼，但當下確定的是我們可以更好的將它和業務場景結合在一起，尋求更好業務成果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"端智能也不是銀彈：不是說我們用了它就完事大吉，就一定可以拿到很好的結果。要想真正的發揮業務價值，更好的服務用戶，需要我們更多的思考，將端智能放在該放的地方。"}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"新思路，新嘗試"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於很多客戶端工程師而言，可能會存在疑問，我們應該如何面對端智能，或者說它和我們之前解決問題的方式有什麼不同？端智能不是一個從 0 到 1 的新生物，它只是雲智能發展到現在的自然延伸而已，其背後的體系還是機器學習那一套，對於客戶端而言，它提供了一種全新的思考方式：從規則驅動到數據驅動，再到模型。我們可以藉助它在適當的業務場景尋求結合點，最終獲得更好的成果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那對於客戶端工程師而言，該如何切到這個領域呢，或者說，如果我們想了解端智能可以做點什麼?"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"瞭解基礎"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"正如我們上面很多次提到的一樣，端智能只是雲智能發展的自然延伸，其背後仍然基本的機器學習，對於大多數的客戶端工程師而言，我們的目標不是成爲機器學習領域的專家，可以不去深入研究各種算法模型，但系統的學習下理論只是還是很有必要的："}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"瞭解人工智能發展的過程，知道爲什麼當前深度學習成爲主流方向；"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"學習曾經流行的一些經典算法及能夠解決的問題領域，你會發現之前一些問題的思考方式在當前深度學習中仍然是貫通的；"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"瞭解常見的神經網絡架構，知道不同網絡層(比如 CNN 中的卷積層\/池化層等)的作用，整體又是怎麼組織起來的。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"模型移植與實踐"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"系統性的知識基本上可以通過閱讀該領域中的經典書籍而獲得。除此之外還要多動動手，我們可以先找幾個示例，一步一步把訓練好的模型通過量化處理後移植到客戶端上去跑跑，也可以嘗試在開源項目的基礎上去修改算法&模型，比如嘗試手寫數字識別，進行人臉檢測等，當然也可以嘗試下量化交易，用自己的做的模型預測股票價格(當然，賠了別找我 😁)。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"性能優化"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"端側設備，尤其以手機爲例更注重交互，且受硬件限制較多，比如電量，存儲空間，算力等，這就意味着我們的算法模型必須要具有很高的執行效率，同時模型體積不益過大，對應而來的就是性能優化在端智能領域同樣非常重要。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於大部分客戶端工程師而言，除了傳統的基於業務場景的優化方案之外，還可能涉及到推理優化和部署。對於推理優化，除涉及到模型、框架外，也有需要根據硬件進行優化的情況，比如 Neon\/SSE\/AVX 指令集優化、高通(Qualcomm GSP\/GPU)、華爲(Huawei NPU)等硬件加速技術。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"擁抱 AIOT"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"無論是邊緣計算還是端智能，其載體不僅僅是手機，還包含各種各樣的嵌入式設備。比如像安防領域的指紋鎖、監控攝像頭、無人機等等。對於我們而言，可以拿來練手的設備也很多，可以是樹莓派，也可以是搭載 TPU 芯片 Coral 開發版以及英特爾 ®Movidius™ 神經計算棒等等，利用他們可以做出很多好玩的東西。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文轉載自：字節跳動技術團隊（ID：toutiaotechblog）"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接："},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/5i2J-b70YRynsobEQ-mydQ","title":"xxx","type":null},"content":[{"type":"text","text":"你竟然是這樣的端智能?"}]}]}]}

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

相關文章

從前端到全棧 -- 最全面向對象總結

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragr

程序员海军

2021-12-21 10:54:01

跨語言的多模態、多任務檢索模型MURAL解讀

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-12-21 10:54:01

谷歌發佈生態系統RLDS，可在強化學習中生成、共享和使用數據集

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-12-20 10:53:54

程序員如何建立第二大腦

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-12-20 10:43:54

改善十年應用的部署體驗

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-12-21 11:13:52

智慧家庭場景的推薦系統的發展歷程和方向 | InfoQ《公開課》

直播概要：隨着計算機的蓬勃發展，互聯網進入大數據和人工智能時代，爲了解決信息過載和長尾商品，推薦系統成爲唯一選擇，而面對不同的業務場景，爲了解決業務痛點，會根據不同的場景特點尋找不同的方法和手段來解決推薦中實際遇到的問題。在智慧家庭領域，

InfoQ 中文站

2021-12-21 10:54:01

一場數據架構變革正在來臨

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null

2021-12-21 10:54:01

Log4j2 維護者：沒工資還捱罵；阿里每週可選一天靈活辦公；亞馬遜 CTO 預測2022年五大技術趨勢；蘋果正式推出“數字遺產”...

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-12-21 10:53:51

一篇帶你用 VuePress + Github Pages 搭建博客

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"前言","attrs

2021-12-21 10:53:51

【HZERO微服務平臺3】源碼分析之oauth服務token生成、校驗、獲取信息、傳遞

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"headin

2021-12-20 11:08:55

BPF 和 Go: Linux 中的現代內省形式

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null

2021-12-20 11:08:55

從混合包開發到100%純鴻蒙應用還有多遠？優酷鴻蒙版的開發實踐與思考｜卓越技術團隊訪談錄

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-12-19 12:03:53

「如何從零到一實現一個玩具瀏覽器🌏」

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null

2021-12-18 13:28:55

Facebook 如何做大規模服務的自主測試

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragr

2021-12-21 10:54:01

K8s 安全指南

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-12-17 17:58:58

24小時熱門文章

最新文章

最新評論文章