声网 Agora 音频互动 MoS 分方法:为音频互动体验进行实时打分

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在业界,实时音视频的 QoE(Quality of Experience) 方法一直都是个重要的话题,每年 RTE 实时互联网大会都会有议题涉及。之所以这么重要,其实是因为目前 RTE 行业中还没有一个很好的可用于评价实时互动场景的 QoE 评价方法。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"声网基于在全球大规模商用的客观实时数据和实践总结,正式推出自研的用于评价实时音频用户体验的无参考客观评价方法——声网Agora 实时音频 MoS 方法。这套方法,已集成于声网 Agora 音频/视频 SDK 的 3.3.1 及更新的版本中,目前仅提供了下行(编解码-传输-播放)链路的分数,后续还会开放提供上行质量打分接口。开发者在调用该方法后,可实时地客观判断当前用户的音频互动体验,给自身业务、运营的优化提供重要的参考数据。点","attrs":{}},{"type":"link","attrs":{"href":"https://docs.agora.io/cn?utm_source=AgoraWeChat&utm_medium=referral&utm_campaign=qoe","title":"","type":null},"content":[{"type":"text","text":"「阅读原文」","attrs":{}}]},{"type":"text","text":"搜索“mosValue”,可浏览该方法的详细文档。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那么有人可能会问,MoS 分、QoE 是什么?声网的这套 MoS 方法原理是什么?相比已有的开源方法有什么不同?","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"从“喂喂喂”到 QoS、QoE","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"当语音通话出现时,还没有 QoS (Quality of Service)。人们只能靠“喂喂喂”的个数来判断通话质量的好坏。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"后来基于网络的语音互动面对着同样的问题。QoS 在这样的背景下诞生。其目的是针对各种业务的需求特征,提供端到端的服务质量保证。QoS 的机制主要是面向运营商、网络建立的,关注的是网络性能、流量的管理等,而不是终端用户体验。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"人们逐渐发现,以 QoS 为核心构建的传统评价体系,始终难以和用户的体验相匹配。于是,更加关注用户体验的 QoE(Quality of Experience)被提了出来。在此后很长一段时间里,基于 QoE 的评价体系开始逐渐发展。在通信领域,逐渐出现了若干种与 QoE 强相关的评价方法,这些评价方法可以分为主观评价方法、客观评价方法。这些方法都会通过 MoS 分来表达目前用户体验的高低的。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"现有 QoE 方法的缺陷","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"主观评价方法","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"主观评价方法,是将人的主观感受映射到质量评分,受限于听者的专业性与个体差异性。在业界,音频主观测试并没有可以统一遵循的标准。虽然ITU对音频主观测试有一些建议和指引,但是每个测试都有自身的侧重点设计和执行也不尽相同。一般比较常用的做法是请足够多的人来采集有统计意义的样本,然后对测试人员做一定的听音培训。最后根据信号失真度,背景侵入度,和总体质量等方面来对音频通话打分。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所以,","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"想得到相对准确的主观语音质量评分,往往需要大量的人力和时间,所以业内一般很少使用主观测试对通信质量进行评估。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"客观评价方法","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"客观评价方法分为有参考评价方法和无参考评价方法。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中,","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"有参考评价方法","attrs":{}},{"type":"text","text":"能够在有参考信号(无损信号)的前提下,量化受损信号的损伤程度,并给出与主观语音质量评分接近的客观语音质量评分。在2001年,P.862标准(P.862 是 ITU 国际电信联盟标准)定义了有参考客观评价算法 PESQ,该算法主要用来评估窄带及宽带下的编解码损伤。该算法在过去的二十年中,被广泛的应用于通信质量的评定。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"随着技术的发展,PESQ 的应用范围变得越来越窄,于是在2011年,P.863 标准定义了一套更全面、更准确的有参考客观评价算法 POLQA。相比PESQ,POLQA 可评估的带宽更广,对噪声信号和延时的鲁棒性更好,其语音质量评分也更接近主观的评分。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"无参考的客观评价方法","attrs":{}},{"type":"text","text":"不需要参考信号,仅通过对输入信号本身或参数的分析即可得到一个质量评分。比较著名的无参考客观评价方法有 P.563、ANIQUE+、E-model、P.1201等。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中,P.563 于 2004 年提出,主要面向窄带语音的质量评估;ANIQUE+于 2006 年提出,也是面向窄带语音,其评分准确度据作者称超过了有参考的评价方法 PESQ,不过 PESQ 的测量不能反应网络的延时、丢包等,并不完美适用于如今基于互联网传输的实时互动场景;E-model 于 2003 年提出,不同于上述两种方法,这是一个基于 VoIP 链路参数的损伤定量标准,不会直接基于信号域进行分析;P.1201 系列于 2012 年提出,对于音频部分,该标准也不对音频信号直接进行分析,而是基于网络状态和信号状态对通信质量进行评分。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"AI 算法改善有限&实时场景难落地","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"近些年,也有相关使用深度学习对语音信号进行评分的论文,其拟合的输出往往是待测语音对应 PESQ或其他有参考客观评价方法的输出。但这种方法有两个明显的缺点:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一是其准确性依赖于模型算力,而在产品落地时,因为无法直接改善用户体验,非质量改进的功能的复杂度和包体积要求往往是非常高的;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"二是这种方法的鲁棒性在RTE的多场景特性下会受到严格的考验,比如说带有背景音乐或特效的语聊房场景,就会给这种基于深度学习的方法带来很大的挑战。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有参考客观评价方法因为需要无损的参考语料,更多的价值是在算法、App 或场景上线前对其做质量验证,如果你的 App 或场景已经上线了,则无法对其语音互动体验进行评价。而对于产品发布后的体验评价,业内则期望无参考客观评价方法能够提供一些帮助。但是很难遗憾,受限于场景的多样性或算法的复杂度,上述无参考客观评价方法难以全面应用到 RTE 领域。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以无参考客观评价方法 P.563 为例,它能测评的有效频谱只有 4kHz,而且仅能测评语音信号,对不同语料的鲁棒性是非常差的:我们早期曾将 P.563 的核心算法实时化并移植到 SDK 中,但测试下来其对不同类型语料的评分误差的方差过大,最终没有产品化。而基于深度学习的方法,理论上可以训练出比 P.563 鲁棒性更好、误差更小的端到端评价算法,但它的算法复杂度,以及较低的投入回报率,仍是两块绊脚石。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"面向实时音频互动场景的新 QoE 评价方法","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"综上分析,如果我们需要一个部署在端上实时反馈通话的质量的评价方法,上述任何一种方法都是不合适的。我们需要另辟蹊径,设计一个新的评价系统,这个系统需要具备以下几个特点:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"需要对多种实时互动场景下的语料(音乐/语音/混合)具有鲁棒性,不会出现明显的评估误差。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"需要具备多采样率(窄带/宽带/超宽带/全带)的评估能力。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"复杂度要足够低,能够在任意设备上对多人通话中对每一路的语音质量进行评估,且不引入明显性能增长。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"线上的质量评分能够和线下的测试结果对齐,即同一段通话,该评估方法对当前线上发生的通话的评分,与事后用有参评价方法分析这段通话的得分,两者应该几乎一致。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"当这套 QoE 评价体系满足以上特点后,便等同于让你在产品上线后都可以进行以往所做的“上线前的质量评价”,你可以随时看到当前你的用户的通话体验评分。这不仅是评价体系能力的提升,更能帮助你有的放矢地大幅提升用户体验。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"声网基于实时互动的特性,设计了一种基于隐状态的实时语音质量评估方法——声网 Agora 音频互动 MoS 方法。该方法结合了信号处理、心理学和深度学习,能在极低算法复杂度下,对通话的语音质量进行实时评分。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c2/c2aa39acad04e675e79deab3590d23bf.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"图:Agora 音频互动 MoS 方法与行业现有评价方法对比","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"该方法主要分为两部分:第一部分是在发送端做的上行质量评估,主要用来评估采集、信号处理的质量得分;第二部分是在接收端做的下行质量评估,主要用来评估经过编解码损伤和网络损伤后的得分。整体的架构图可以参考这张图:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a4/a406857d722dc73b39de0e326df9c9ec.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"这篇文章主要讲一下下行质量评估,也就是影响实时互动体验最重要的部分。这个部分我们把在发送端的编码模块也考虑了进来。因此,这部分就包含了编码-发送-传输-解码-后处理-播放这条链路。不同于以往的基于网络状态进行分数拟合的方法,我们把重心放在了监测SDK中各模块的状态。这种设计的核心思想很简单,如果在完全无损伤的网络中,这条下行链路在播放前仅包含编码损伤,各个弱网对抗算法模块也不会被触发。一旦网络出现了波动,各弱网对抗模块就会开始运作,其每次启动或多或少的都会对最终播放的音质产生影响。因此,构建一个下行链路的质量评估算法的核心就变成了得到SDK各模块和音质的映射关系。当然,实际下行质量评估算法设计中还有若干其他影响因子,比如编码器架构、编码不同语料的效率、有效码率、网络损伤模型等,这些都会明显的影响最终的听感和质量评分。一般来说,the state of art的评估方法,在多弱网环境下的打分平均误差(RMSE)在0.3左右,我们设计的评估方法在多弱网环境下能将平均误差控制在 0.2 以内。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基于这套下行质量评估算法,我们构建了一个全球音频网络质量地图,用户可以实时监控发生在世界各个角落的通话质量,下图是地图一角,该图中横轴、纵轴分别表示在不同地区的用户,表格中的 MoS 分则体现了他们当前通话的 QoE:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/27/275a23319c6eafedd3a1ea13769b6c25.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"这套应用于全球范围的声网音频互动 MoS 方法已经在 Agora 音频/视频 SDK 3.3.1 及更新的版本中对外开放接口,大家可通过 AgoraRtcRemoteAudioStats 中的 mosValue 实时获取每条通话的质量评分,目前仅提供了下行(编解码-传输-播放)链路的分数,后续会开放提供上行质量打分接口。详细的接口参数说明,请点击","attrs":{}},{"type":"link","attrs":{"href":"https://docs.agora.io/cn?utm_source=AgoraWeChat&utm_medium=referral&utm_campaign=qoe","title":"","type":null},"content":[{"type":"text","text":"「阅读原文」","attrs":{}}]},{"type":"text","text":"进入声网文档中心,搜索“mosValue”参考详细文档。","attrs":{}}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章