實時音視頻通訊過程中聲音的那些事兒

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"前言","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於新冠疫情的影響,視頻會議和線上教育迎來了飛速的發展。而讓這一切成爲現實的基礎就是實時音視頻通訊技術,但在實時音視頻通訊過程中,會面臨各種各樣的問題,有可能是網絡問題,也有可能是產品問題,在一定程度上左右了用戶體驗(QoE)。儘管服務質量(QoS)是一個產品或者服務非常重要的參考標準,但是對於用戶而言,他們更關心是QoS指標。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"正文","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"衆所周知,一個產品或者服務的價值,很大程度上體現在用戶的口碑上。如果用戶都說這個產品或者服務好,那麼這個產品或者服務一定能夠贏得市場。這就不得不提一個和用戶口碑相關的指標——用戶體驗(QoE)。在實時音視頻通訊領域,用戶的音頻體驗佔有非常重要的地位。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"說到QoE,有很多評價的方法,通用的評價方法可以分爲有參考客觀評價方法、無參考客觀評價方法和主觀評價方法三種。其中,有參考客觀評價方法有P.861、P.862、P.863等,無參考客觀評價方法有P.563、ANIQUE+、P.1201、xxNet等。它們都爲音頻QoE指標的量化對比提供了理論依據。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"今天,我們主要圍繞音頻QoE指標在實際項目中遇到的問題進行展開。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"噪聲問題","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"噪聲問題應該是所有實時音視頻產品不得不面臨的問題,降噪處理(NS)可以說是產品必備的基礎功能之一。但是,產生噪音的原因有很多,比如設備噪聲、環境噪聲、聲音信號溢出、算法問題等。其中,對於設備噪聲,常見的形式有風扇聲音、鍵盤聲音、異常電流聲音等。對於環境噪聲,常見的形式有鳴笛聲音、周圍人的說話聲音、走路的聲音、電視的聲音、鬧鈴的聲音等。對於聲音信號溢出,大多和音頻源有關係。對於算法問題,有可能是算法設計本身的問題,比如回聲殘留,還有就是算法適用範圍的問題。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b0/b08880451cad4016a1d2953a0cffe351.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來,通過一個典型的案例來分析一下實際項目中的噪聲問題。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個噪聲問題是在科大訊飛語音識別服務對接過程中遇到的,由於項目需要,我司的移動端(安卓和蘋果)SDK需要集成科大訊飛的語音識別功能,並做成一個可選功能對外提供。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對接科大訊飛語音識別服務的關鍵一步就是將移動端設備採集的音頻PCM數據,每四十毫秒回調一次雲端接口。由於安卓和蘋果底層是用一套C++代碼實現的,對外接口單獨封裝了Java層和OC層,所以在音頻PCM數據的組織上,我在C++層實現了數據採集、存儲和處理操作。最開始的時候,我將音頻數據保存爲16位短整型,安卓端SDK通過JNI層的數據轉換,轉換爲8比特的音頻原始數據,再由Java層回調科大訊飛的語音識別接口,是沒有問題的,語音內容能夠以文字的形式返回,並且正確率能夠保證在95%以上;但是到了蘋果端就出問題了,蘋果端SDK在OC層將數據轉化爲8比特的音頻原始數據,再由OC層回調科大訊飛的語音識別接口,返回的文字內容總是詞不達意,正確率都不到50%。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"於是,我們展開了問題排查的排查工作,首先通過將C++層回調的音頻PCM原始數據保存下來進行播放,聲音是沒有問題的,說明採集模塊正常。然後,我們又將OC層轉換前的16位短整形(注意:OC語言是沒有短整形的概念的,這樣講是爲了方便大家理解)數據保存下來,播放也是沒有問題的,說明C++層到OC層的數據轉換邏輯正常。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後,我猜測只有一種可能,問題出在了16位短整形轉換成8位的字節數據上。爲了驗證我的想法,我將轉換後的8位音頻數據保存下來,播放時果然發現了問題,存在","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"嚴重的噪音","attrs":{}},{"type":"text","text":"!通過觀察聲音的波形圖發現,這段音頻中存在有規律性的等間隔噪音波形。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"好了,問題定位了,那就解決吧!分析問題的原因可能是iOS平臺在處理16位短整形數據時存在某種自動截取機制,會導致數據丟失。爲了避免音頻數據在OC層和JNI層的轉換問題,我在C++層處理數據時,直接將音頻PCM原始數據處理成8位字節類型,再進行向上回調。通過驗證,安卓端和iOS端的語音識別表現都正常了。","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"至此,噪音問題解決","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"聲音偏小","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"聲音偏小問題的原因也有很多,大致可以分爲四類,設備採集能力弱、設備播放能力弱、模擬增益小、數字增益小。其中,設備採集能力弱是比較常見的原因,當然和用戶說話聲音小也有一定的關係。設備播放能力弱是從聲音的接收端進行分析得到的結果,有可能用戶的播放設備,比如耳機、音響存在一定硬件問題,導致聲音輸出音量小。模擬增益和數字增益是從算法的角度出發,對聲音的增益程度有差異。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/fa/fa4fbef61910cba5a55527023f7592e0.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來,通過一個典型的案例來分析一下實際項目中音量偏小的問題。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我司對外提供的實時音視頻SDK,第三方客戶對接後,反映錘子手機在進入直播間後,聲音特別小,別的安卓手機都正常。問題拋出後,讓我方去排查。最終,這個重擔又落到了我身上。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"拿到有問題的錘子手機,我開始了問題排查工作。聲音偏小的問題很容易復現,只要進入直播間,基本上100%必現。因此,我斷定這可能不是一個偶然現象,和自己最初的判斷不符。後來通過深入分析發現,這款錘子手機的語音通話模式的聲音本身就非常小,而WebRTC在直播推流和拉流過程中默認使用語音通話模式,因此,導致了直播間內播放聲音非常小的問題。【老羅確實做手機的年頭有些短,因爲後來陸陸續續發現,幾乎所有型號的錘子手機都存在這個問題,真替老羅着急】","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼,這個聲音偏小的問題有沒有解決方法呢?方法肯定是有的,但是個折中的方案。因爲我後來發現,錘子手機的媒體模式聲音非常大,於是,我在SDK底層增加了黑名單,只要是黑名單中的手機型號都默認使用媒體模式,而不是通話模式。","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"至此,聲音偏小問題解決","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"回聲問題","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"回聲問題也是實時音視頻通訊中比較常見的問題,形成的原因也有很多,基本上也能分爲四大類,延時抖動、大混響環境、採集信號溢出、雙講。其中,延時抖動可能是由於線程繁忙導致的,也有可能是雙設備導致的。大混響環境多半是混響長度超出了濾波器的長度。採集信號溢出很有可能是濾波器不收斂造成的。雙講,比較依賴自然語言處理技術,在內部處理過程中容易顧此失彼。其實,WebRTC在處理雙講時,本身就有一定的問題,所以對雙講支持的不好。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/97/97d52de905221e42c5077adf47f09593.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來,通過一個典型的案例來分析一下實際項目中的回聲問題。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在視頻會議產品中,我司採購了一批安卓盒子,用做視頻會議設備終端。安裝了我司的移動端版本的客戶端後,遇到了一個問題,發現講話時聲音總是忽大忽小,甚至消失。後來排查發現,原來是安卓盒子本身就支持硬件的回聲消除,移動端安卓APP的軟件回聲消除和安卓盒子的硬件回聲消除作用疊加了,導致了主講人的聲音被循環消除。後來關閉了硬件設備的回聲消除,主講人的聲音就正常了。爲了對比驗證,我們關閉軟件的回聲消除,同時打開安卓盒子的硬件回聲消除,主講人的聲音也是正常的。","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"至此,回聲消除問題解決","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"結尾","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"直播過程中音頻的用戶體驗,是直播服務最後的一道保障。用戶允許視頻畫面在一定程度上的卡頓,但是對於聲音的卡頓是零容忍的。守好最後一道防線非常重要,我們要重視音頻的QoE。音頻好了,才能進一步追求視頻的最佳表現。好了,今天關於音頻QoE指標在實際項目中的介紹就結束了,歡迎大家贊點評論。關注我,分享更多音視頻直播內容。","attrs":{}}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章