QCon 實時音視頻專場:實時互動的最佳實踐與未來展望

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/ff\/3c\/ff7aae94c57b6f73f9aee5a2c57fc33c.jpg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"互動直播、線上會議、在線醫療和在線教育是實時音視頻技術應用的重要場景,而這些場景對高可用、高可靠、低延時有着苛刻的要求,很多團隊在音視頻產品開發過程中會遇到各種各樣的問題。例如:流暢性,如果在視頻過程中頻繁卡頓,基本上就很難有良好的互動;回聲消除,經過環境反射被麥克風重新採集並傳輸,這也會影響互動效果;國內外互通,越來越多的產品選擇出海,海內外互通也是技術上需要解決的點;海量併發,這對音視頻產品的抗壓能力而言是很大的挑戰。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"5 月 29 日,在 「QCon 北京全球軟件開發大會」上,由聲網 Agora 技術 VP 馮越作爲專題出品人發起的「實時音視頻專場」,邀請到了來自新東方、伴魚英語、聲網 Agora 的技術專家,與大家分享了下一代視頻引擎架構、大規模實施音視頻系統的難點與跳轉、語音測評及本地化實踐、前端音視頻播放器的研究與實踐等話題。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"1 聲網下一代視頻引擎架構探索"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着音視頻技術快速發展,音視頻實時互動在多個領域(社交娛樂、在線直播、醫療等)中都得到了廣泛的應用。同時伴隨着 AI 技術在圖象處理中的快速發展,融合了 AI 算法的高級視頻前處理功能也得到了越來越多的應用。場景的豐富多變對下一代視頻靈活可擴展功能提出了很高的要求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"聲網 Agora 負責下一代視頻引擎架構設計的架構師李雅琪首先爲大家帶來了關於《聲網下一代視頻引擎架構探索與實踐》的分享。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/ca\/8b\/caf8cbe1e0c7d43d79a09d2e86df038b.jpg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了能更好地滿足對於視頻體驗的場景豐富性、用戶差異性以及對直播體驗的需求,聲網將下一代視頻處理引擎設計原則和目標總結爲以下四個方面:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1、"},{"type":"text","marks":[{"type":"strong"}],"text":"要滿足不同的用戶對集成的差異化需求;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2、"},{"type":"text","marks":[{"type":"strong"}],"text":"要做到靈活可擴展"},{"type":"text","text":",可以快速的支撐各種新業務和新技術場景落地;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3、"},{"type":"text","marks":[{"type":"strong"}],"text":"要做到快速可靠"},{"type":"text","text":",對於視頻處理引擎核心系統要提供豐富強大的可能,且能夠極大地降低開發人員心智負擔。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4、"},{"type":"text","marks":[{"type":"strong"}],"text":"要做到性能優越可監控"},{"type":"text","text":",要持續優化視頻直播處理引擎性能,同時提高監控手段,實現質量數據透明。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"針對上述四個設計目標,聲網具體採用了哪些軟件設計的方法呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於引擎的使用者是天然分層的,一部分使用者要求低代碼快速上線,需要引擎儘可能提供貼近他業務的功能的 API;而另外一部分用戶,希望引擎可以爲他們提供更多的核心視頻處理能力,在這之上可以按照自己的需求定製視頻處理業務。因此,根據這個用戶形態聲網也採取了"},{"type":"text","marks":[{"type":"strong"}],"text":"業務組合加核心功能的分層業務設計"},{"type":"text","text":",High Level API 面向業務提供易用性,Low Level API 提供核心功能和靈活性。爲了把靈活編排能力作爲視頻處理引擎的能力開放給開發者,讓開發者可以通過靈活自由的 API 組合,根據不同的業務需求進行靈活編排,聲網的視頻處理引擎核心架構採用了 Microkernel Architecture 的架構模式,分離了整個引擎的變量和不變量。通過微內核的架構模式實現靈活可擴展的目標: 各個模塊功能可以快速擴展,視頻處理管線也可以通過搭積木式的組合來實現業務的靈活編排。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/61\/68\/6179fa832c70786725cf62cf1efa7268.jpg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果我們沒有一個穩定可靠的核心繫統,一個開發人員要從零開始在視頻處理管線上開發一個美顏插件,需要考慮其自身業務邏輯以外的很多問題: 模塊的位置、數據格式轉換、線程模型、內存管理、屬性配置等問題,針對這一系列工程相關的集成問題聲網將解決方案固化到底層核心系統當中,爲用戶提供了豐富強大的基礎功能。這套視頻引擎核心系統包括了基礎視頻處理單元、管線搭建和控制、視頻基礎格式算法支持以及系統基礎設施等功能。有了這個核心系統,集成就會變得非常簡單,插件只要按照核心系統接口協議約定,實現相關的封裝接口就可以了。"},{"type":"text","marks":[{"type":"strong"}],"text":"豐富強大的核心繫統功能極大地降低了模塊開發者的心智負擔,從而幫助開發者提升整體的研發效能"},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在性能優越可監控部分,聲網優化了移動端數據處理鏈路,分離了控制面和數據面,提升了整體數據視頻的傳輸效率。另外還構建了視頻處理特性相關的內存池來降低系統資源消耗。最終實現了全鏈路視頻質量監控機制,使視頻優化性能達到閉環反饋的效果。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"2 自研大規模實時音視頻系統的難點與挑戰"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"來自聲網 Agora 的行業架構師董海冰作爲 RTC 領域的長期深耕者在大會中爲大家普及了 RTC 相關的基礎概念,同時也詳細分析了 RTC 的場景特點以及在自研過程中的架構設計和難點。最後,對於 RTC 未來的發展方向也分享了他自己的看法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"相較於傳統互聯網應對大規模、高併發已經較爲成熟的解決方案:緩存、異步、分佈式,"},{"type":"text","marks":[{"type":"strong"}],"text":"實時音視頻領域所面臨的挑戰其實會更爲複雜"},{"type":"text","text":"。“實時”要控制在 1 秒以內才能叫做“實時”。比如做緩存,其時間都是秒級別的,或者分鐘級別的,很少出現毫秒級別。"},{"type":"text","marks":[{"type":"strong"}],"text":"實時音視頻(RTC)在應對大規模、高併發場景時,需要考慮到音視頻質量、流暢性、低時延、可伸縮以及可用性等問題,這是做實時音視頻和傳統互聯網很不一樣的地方"},{"type":"text","text":",也意味着其解決方案也會更爲複雜。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/77\/9f\/7765e8d09a7fbbe5828dae9a9faba59f.jpg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在開發過程中,用戶常見的挑戰有開發成本、網絡搭建、質量監控、音頻處理和最後測試等問題。在分享中,董海冰就舉了一個音頻自研的例子。首先,"},{"type":"text","marks":[{"type":"strong"}],"text":"音頻傳輸最關鍵要解決的問題有 3 個:無聲 \/ 聲音小、回聲、噪聲 \/ 雜音"},{"type":"text","text":"。其次,"},{"type":"text","marks":[{"type":"strong"}],"text":"弱網對抗能力也非常重要"},{"type":"text","text":",在網絡發生變化的時候,怎麼通過碼率和幀率調整能夠緩解變化,同時要解決在智能路由算法裏面實現最優路徑的選擇與傳輸等問題。"},{"type":"text","marks":[{"type":"strong"}],"text":"另一個挑戰就是多維度的質量評估"},{"type":"text","text":",而且要做到實時化的評估,同時和動態調整形成一個閉環,這樣纔是最好的方式,能夠在弱網對抗裏面起到比較好的作用。而對於使用開源服務端的難點,董海冰也對幾個常見的方案(Jitsi\/Jitsi VideoBridge、Kurento、Licode\/Erizo、Pion、Janus)進行了探討與分享。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了服務端的開發,"},{"type":"text","marks":[{"type":"strong"}],"text":"實時音視頻的運維及質量監控也與傳統的互聯網方式有些不同"},{"type":"text","text":"。比如在運維方面,除了常見的容災規劃、容器化部署、自動化運維、性能分析及日誌系統外,實時音視頻中的運維還需要面對全球網絡(跨區域、跨運營商)、Lastmile 策略等挑戰。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果用戶選擇了自研的方式,可能還會面對大規模連麥、RTC 錄製 \/ 回放方案、運營成本的控制等問題。但即便我們需要面對和解決如此多的困難和挑戰,不能忽略的是實時音視頻技術正在被應用在越來越多的場景下,也擁有着越來越多的可能性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MetaVerse 譯作元宇宙,是近期比較熱的一個概念。在現實生活中我們可以把它理解爲是一種角色轉換,在虛擬世界中是另一種全新體驗,實現多種虛擬世界角色的切換。VRCHAT 也是類似的,通過 VR 來做社交或者娛樂,幫助大家進行更好的線上交互,這很可能是未來互聯網的發展和探索的方向。董海冰提到,"},{"type":"text","marks":[{"type":"strong"}],"text":"作爲自研團隊不能閉門造車,要緊跟時代脈搏和行業發展趨勢,儘可能把自己的力量投入在自己核心業務和擅長的方面,大家一起把實時音視頻這個領域做得更好"},{"type":"text","text":"。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"3 新東方雲教室 Web 端音視頻播放器實踐"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"線上教育應該是近兩年大家最爲熟悉的實時音視頻應用場景之一,此次專場,我們邀請到了"},{"type":"text","marks":[{"type":"strong"}],"text":"來自新東方雲教室前端交互架構師李便茹爲大家分享新東方是如何實現線下到線上快速遷移的最佳實踐"},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/d7\/f5\/d79985f8b783953ecc34ed248b0ca0f5.jpg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"新東方在 18 年底開始做自己的雲教室,2020 年過年期間一個禮拜,做到了從支撐萬級的併發躍進到了支撐 30 萬併發。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"新東方雲教室是一套完整的在線上課解決方案,提供 saas 服務,其顯著的一個特點就是更新迭代的節奏非常快。如果在端上做原生開發,比如與 PC、Windows、移動端與安卓和 iOS,那麼更新迭代一定是趕不上節奏的,因此他們將策略定爲客戶端內嵌 H5 頁面,除實時音視頻外,交互功能基本由 H5 實現。Web 適配到各個端,這就是最快的開發模式。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實時音視頻(RTC)延時是百毫秒級,最多不會超過 500 毫秒,人耳是基本感知不到的。在線上教育中會有小班課和大班課兩種不同的場景。小班課對於低延時的實時互動要求就會比較高,但對於一些大學的課程和講座,或是名師公開演講的大班課場景如果用 RTC 的話,成本其實相對就會較高一些了。"},{"type":"text","marks":[{"type":"strong"}],"text":"針對大班課,新東方雲課堂採用了 H5 超大班型的方法"},{"type":"text","text":",支持百萬人同時上課,老師端用 RTMP 推流,學生端依然走 HTTP 拉流。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/7a\/29\/7a2a00536501228d298759b7c0c18229.jpg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"Web 直播播放器架構圖"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於未來可擴展部分,如果雲教室的視頻編碼採用 H.265 的標準,那麼壓縮就會比 H.264 小一半,網絡壓力就減少了很多。H5 擁有應用範圍廣泛且支持跨平臺的優勢,能夠實現同一套方案適配不同客戶端,快速開發一套產品,就能夠快速上線。自研通用播放器可以更改輸入源流,定製化或者快開發。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"4 語音測評和本地化"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了可以更好地提供教育服務,近兩年在線教育平臺也結合深度學習實現了許多新的功能,語音測評就是其中一項,尤其在英語教育中少兒口語的測評次數需求量巨大。"},{"type":"text","marks":[{"type":"strong"}],"text":"如何降低測評時延,提升評測服務的體驗,同時降低服務器壓力和成本?來自伴魚技術中臺 AI 算法負責人黃智超分享了《語音測評和本地化》"},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/73\/2d\/731e1bda41b1097d0d4151af2b3f662d.jpg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"語音評測是通過機器替代人工,爲少兒口語發音進行智能打分的一項功能。語音測評在伴魚的實踐,主要包括算法和框架選擇、聲學模型訓練、效果和速度的優化。算法方面,伴魚選擇的是用深度神經網絡和隱馬兒可夫,主要原因是深度學習框架目前非常成熟。而框架選擇是 kaldi,語音界使用人數最多,而且資料齊全。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/12\/88\/12d0cf91b4ed2e7f4d7752d4d6504588.jpg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"深度神經網絡和隱馬爾可夫算法 (dnn + hmm) 的測評過程如上圖所示。首先要訓練一個 dnn 聲學模型,訓練 hmm 拓撲參數,訓練完之後,我們會對輸入的文本進行構圖,對音頻進行特徵提取,然後經過聲學模型。經過一個打分模型後,得出句子得分。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在這個過程中,數據篩選、聲學模型的訓練、評測準確率的優化都是關鍵。黃智超在之後的分享中還詳細分享了伴魚的語音評測在本地化的過程中模型體積優化、測評服務魯棒性,以及如何解決異常 Case 分析困難等問題與經驗。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文鏈接"},{"type":"text","text":":"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/U0wOpODuVcR9ub3zFSdbwQ","title":"","type":null},"content":[{"type":"text","text":"QCon 實時音視頻專場:實時互動的最佳實踐與未來展望"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章