聲網 Agora 音頻互動 MoS 分方法:爲音頻互動體驗進行實時打分

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在業界,實時音視頻的 QoE(Quality of Experience) 方法一直都是個重要的話題,每年 RTE 實時互聯網大會都會有議題涉及。之所以這麼重要,其實是因爲目前 RTE 行業中還沒有一個很好的可用於評價實時互動場景的 QoE 評價方法。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"聲網基於在全球大規模商用的客觀實時數據和實踐總結,正式推出自研的用於評價實時音頻用戶體驗的無參考客觀評價方法——聲網Agora 實時音頻 MoS 方法。這套方法,已集成於聲網 Agora 音頻/視頻 SDK 的 3.3.1 及更新的版本中,目前僅提供了下行(編解碼-傳輸-播放)鏈路的分數,後續還會開放提供上行質量打分接口。開發者在調用該方法後,可實時地客觀判斷當前用戶的音頻互動體驗,給自身業務、運營的優化提供重要的參考數據。點","attrs":{}},{"type":"link","attrs":{"href":"https://docs.agora.io/cn?utm_source=AgoraWeChat&utm_medium=referral&utm_campaign=qoe","title":"","type":null},"content":[{"type":"text","text":"「閱讀原文」","attrs":{}}]},{"type":"text","text":"搜索“mosValue”,可瀏覽該方法的詳細文檔。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼有人可能會問,MoS 分、QoE 是什麼?聲網的這套 MoS 方法原理是什麼?相比已有的開源方法有什麼不同?","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"從“喂喂喂”到 QoS、QoE","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當語音通話出現時,還沒有 QoS (Quality of Service)。人們只能靠“喂喂喂”的個數來判斷通話質量的好壞。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"後來基於網絡的語音互動面對着同樣的問題。QoS 在這樣的背景下誕生。其目的是針對各種業務的需求特徵,提供端到端的服務質量保證。QoS 的機制主要是面向運營商、網絡建立的,關注的是網絡性能、流量的管理等,而不是終端用戶體驗。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"人們逐漸發現,以 QoS 爲核心構建的傳統評價體系,始終難以和用戶的體驗相匹配。於是,更加關注用戶體驗的 QoE(Quality of Experience)被提了出來。在此後很長一段時間裏,基於 QoE 的評價體系開始逐漸發展。在通信領域,逐漸出現了若干種與 QoE 強相關的評價方法,這些評價方法可以分爲主觀評價方法、客觀評價方法。這些方法都會通過 MoS 分來表達目前用戶體驗的高低的。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"現有 QoE 方法的缺陷","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"主觀評價方法","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"主觀評價方法,是將人的主觀感受映射到質量評分,受限於聽者的專業性與個體差異性。在業界,音頻主觀測試並沒有可以統一遵循的標準。雖然ITU對音頻主觀測試有一些建議和指引,但是每個測試都有自身的側重點設計和執行也不盡相同。一般比較常用的做法是請足夠多的人來採集有統計意義的樣本,然後對測試人員做一定的聽音培訓。最後根據信號失真度,背景侵入度,和總體質量等方面來對音頻通話打分。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所以,","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"想得到相對準確的主觀語音質量評分,往往需要大量的人力和時間,所以業內一般很少使用主觀測試對通信質量進行評估。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"客觀評價方法","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"客觀評價方法分爲有參考評價方法和無參考評價方法。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中,","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"有參考評價方法","attrs":{}},{"type":"text","text":"能夠在有參考信號(無損信號)的前提下,量化受損信號的損傷程度,並給出與主觀語音質量評分接近的客觀語音質量評分。在2001年,P.862標準(P.862 是 ITU 國際電信聯盟標準)定義了有參考客觀評價算法 PESQ,該算法主要用來評估窄帶及寬帶下的編解碼損傷。該算法在過去的二十年中,被廣泛的應用於通信質量的評定。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着技術的發展,PESQ 的應用範圍變得越來越窄,於是在2011年,P.863 標準定義了一套更全面、更準確的有參考客觀評價算法 POLQA。相比PESQ,POLQA 可評估的帶寬更廣,對噪聲信號和延時的魯棒性更好,其語音質量評分也更接近主觀的評分。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"無參考的客觀評價方法","attrs":{}},{"type":"text","text":"不需要參考信號,僅通過對輸入信號本身或參數的分析即可得到一個質量評分。比較著名的無參考客觀評價方法有 P.563、ANIQUE+、E-model、P.1201等。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中,P.563 於 2004 年提出,主要面向窄帶語音的質量評估;ANIQUE+於 2006 年提出,也是面向窄帶語音,其評分準確度據作者稱超過了有參考的評價方法 PESQ,不過 PESQ 的測量不能反應網絡的延時、丟包等,並不完美適用於如今基於互聯網傳輸的實時互動場景;E-model 於 2003 年提出,不同於上述兩種方法,這是一個基於 VoIP 鏈路參數的損傷定量標準,不會直接基於信號域進行分析;P.1201 系列於 2012 年提出,對於音頻部分,該標準也不對音頻信號直接進行分析,而是基於網絡狀態和信號狀態對通信質量進行評分。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"AI 算法改善有限&實時場景難落地","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"近些年,也有相關使用深度學習對語音信號進行評分的論文,其擬合的輸出往往是待測語音對應 PESQ或其他有參考客觀評價方法的輸出。但這種方法有兩個明顯的缺點:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一是其準確性依賴於模型算力,而在產品落地時,因爲無法直接改善用戶體驗,非質量改進的功能的複雜度和包體積要求往往是非常高的;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"二是這種方法的魯棒性在RTE的多場景特性下會受到嚴格的考驗,比如說帶有背景音樂或特效的語聊房場景,就會給這種基於深度學習的方法帶來很大的挑戰。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有參考客觀評價方法因爲需要無損的參考語料,更多的價值是在算法、App 或場景上線前對其做質量驗證,如果你的 App 或場景已經上線了,則無法對其語音互動體驗進行評價。而對於產品發佈後的體驗評價,業內則期望無參考客觀評價方法能夠提供一些幫助。但是很難遺憾,受限於場景的多樣性或算法的複雜度,上述無參考客觀評價方法難以全面應用到 RTE 領域。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以無參考客觀評價方法 P.563 爲例,它能測評的有效頻譜只有 4kHz,而且僅能測評語音信號,對不同語料的魯棒性是非常差的:我們早期曾將 P.563 的核心算法實時化並移植到 SDK 中,但測試下來其對不同類型語料的評分誤差的方差過大,最終沒有產品化。而基於深度學習的方法,理論上可以訓練出比 P.563 魯棒性更好、誤差更小的端到端評價算法,但它的算法複雜度,以及較低的投入回報率,仍是兩塊絆腳石。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"面向實時音頻互動場景的新 QoE 評價方法","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"綜上分析,如果我們需要一個部署在端上實時反饋通話的質量的評價方法,上述任何一種方法都是不合適的。我們需要另闢蹊徑,設計一個新的評價系統,這個系統需要具備以下幾個特點:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"需要對多種實時互動場景下的語料(音樂/語音/混合)具有魯棒性,不會出現明顯的評估誤差。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"需要具備多采樣率(窄帶/寬帶/超寬帶/全帶)的評估能力。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"複雜度要足夠低,能夠在任意設備上對多人通話中對每一路的語音質量進行評估,且不引入明顯性能增長。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"線上的質量評分能夠和線下的測試結果對齊,即同一段通話,該評估方法對當前線上發生的通話的評分,與事後用有參評價方法分析這段通話的得分,兩者應該幾乎一致。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當這套 QoE 評價體系滿足以上特點後,便等同於讓你在產品上線後都可以進行以往所做的“上線前的質量評價”,你可以隨時看到當前你的用戶的通話體驗評分。這不僅是評價體系能力的提升,更能幫助你有的放矢地大幅提升用戶體驗。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"聲網基於實時互動的特性,設計了一種基於隱狀態的實時語音質量評估方法——聲網 Agora 音頻互動 MoS 方法。該方法結合了信號處理、心理學和深度學習,能在極低算法複雜度下,對通話的語音質量進行實時評分。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c2/c2aa39acad04e675e79deab3590d23bf.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖:Agora 音頻互動 MoS 方法與行業現有評價方法對比","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該方法主要分爲兩部分:第一部分是在發送端做的上行質量評估,主要用來評估採集、信號處理的質量得分;第二部分是在接收端做的下行質量評估,主要用來評估經過編解碼損傷和網絡損傷後的得分。整體的架構圖可以參考這張圖:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a4/a406857d722dc73b39de0e326df9c9ec.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這篇文章主要講一下下行質量評估,也就是影響實時互動體驗最重要的部分。這個部分我們把在發送端的編碼模塊也考慮了進來。因此,這部分就包含了編碼-發送-傳輸-解碼-後處理-播放這條鏈路。不同於以往的基於網絡狀態進行分數擬合的方法,我們把重心放在了監測SDK中各模塊的狀態。這種設計的核心思想很簡單,如果在完全無損傷的網絡中,這條下行鏈路在播放前僅包含編碼損傷,各個弱網對抗算法模塊也不會被觸發。一旦網絡出現了波動,各弱網對抗模塊就會開始運作,其每次啓動或多或少的都會對最終播放的音質產生影響。因此,構建一個下行鏈路的質量評估算法的核心就變成了得到SDK各模塊和音質的映射關係。當然,實際下行質量評估算法設計中還有若干其他影響因子,比如編碼器架構、編碼不同語料的效率、有效碼率、網絡損傷模型等,這些都會明顯的影響最終的聽感和質量評分。一般來說,the state of art的評估方法,在多弱網環境下的打分平均誤差(RMSE)在0.3左右,我們設計的評估方法在多弱網環境下能將平均誤差控制在 0.2 以內。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於這套下行質量評估算法,我們構建了一個全球音頻網絡質量地圖,用戶可以實時監控發生在世界各個角落的通話質量,下圖是地圖一角,該圖中橫軸、縱軸分別表示在不同地區的用戶,表格中的 MoS 分則體現了他們當前通話的 QoE:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/27/275a23319c6eafedd3a1ea13769b6c25.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這套應用於全球範圍的聲網音頻互動 MoS 方法已經在 Agora 音頻/視頻 SDK 3.3.1 及更新的版本中對外開放接口,大家可通過 AgoraRtcRemoteAudioStats 中的 mosValue 實時獲取每條通話的質量評分,目前僅提供了下行(編解碼-傳輸-播放)鏈路的分數,後續會開放提供上行質量打分接口。詳細的接口參數說明,請點擊","attrs":{}},{"type":"link","attrs":{"href":"https://docs.agora.io/cn?utm_source=AgoraWeChat&utm_medium=referral&utm_campaign=qoe","title":"","type":null},"content":[{"type":"text","text":"「閱讀原文」","attrs":{}}]},{"type":"text","text":"進入聲網文檔中心,搜索“mosValue”參考詳細文檔。","attrs":{}}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章