深入淺出 WebRTC AEC(聲學回聲消除)

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"前言:近年來,音視頻會議產品提升着工作協同的效率,在線教育產品突破着傳統教育形式的種種限制,娛樂互動直播產品豐富着生活社交的多樣性,背後都離不開音視頻通信技術的優化與創新,其中音頻信息內容傳遞的流暢性、完整性、可懂度直接決定着用戶之間的溝通質量。自 2011 年 WebRTC 開源以來,無論是其技術架構,還是其中豐富的算法模塊都是值得我們細細品味,音頻方面熟知的 3A 算法(AGC: Automatic gain control; ANS: Adaptive noise suppression; AEC: Acoustic echo cancellation)就是其中閃閃發光的明珠。本文章將結合實例全面解析 WebRTC AEC 的基本框架和基本原理,一起探索回聲消除的基本原理,技術難點以及優化方向。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"作者:珞神,阿里雲高級開發工程師,負責阿里雲 RTC 音頻研發","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"回聲的形成","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"WebRTC 架構中上下行音頻信號處理流程如圖 1,音頻 3A 主要集中在上行的發送端對發送信號依次進行回聲消除、降噪以及音量均衡(這裏只討論 AEC 的處理流程,如果是 AECM 的處理流程 ANS 會前置),AGC 會作爲壓限器作用在接收端對即將播放的音頻信號進行限幅。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7d/7d3d5077df36677730459d8bbf1e359f.png","alt":"圖 1 WebRTC 中音頻信號上下行處理流程框圖","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼回聲是怎麼形成的呢?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如圖 2 所示,A、B 兩人在通信的過程中,我們有如下定義:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"x(n): 遠端參考信號,即 A 端訂閱的 B 端音頻流,通常作爲參考信號;","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"y(n): 回聲信號,即揚聲器播放信號 x(n) 後,被麥克風採集到的信號,此時經過房間混響以及麥克風採集的信號 y(n) 已經不能等同於信號 x(n) 了, 我們記線性疊加的部分爲 y'(n), 非線性疊加的部分爲 y''(n), y(n) = y'(n) + y''(n);","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"s(n): 麥克風採集的近端說話人的語音信號,即我們真正想提取併發送到遠端的信號;","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"v(n):環境噪音,這部分信號會在 ANS 中被削弱;","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"d(n): 近端信號,即麥克風採集之後,3A 之前的原始信號,可以表示爲:d(n) = s(n) + y(n) + v(n);","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"s'(n): 3A 之後的音頻信號,即準備經過編碼發送到對端的信號。","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"WebRTC 音頻引擎能夠拿到的已知信號只有近端信號 d(n) 和遠端參考信號 x(n)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e6/e64edb09258d9bf29f84eb17145f0db6.png","alt":"圖 2 回聲信號生成模型","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果信號經過 A 端音頻引擎得到 s'(n) 信號中依然殘留信號 y(n),那麼 B 端就能聽到自己回聲或殘留的尾音(回聲抑制不徹底留下的殘留)。AEC 效果評估在實際情況中可以粗略分爲如下幾種情況(專業人員可根據應用場景、設備以及單雙講進一步細分):","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/78/783c56714eda4c79e8f69449acac5fa2.jpeg","alt":"file","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"回聲消除的本質","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在解析 WebRTC AEC 架構之前,我們需要了解回聲消除的本質是什麼。音視頻通話過程中,聲音是傳達信息的主要途徑,因此從複雜的錄音信號中,通過信號處理的手段使得我們要傳遞的信息:高保真、低延時、清晰可懂是一直以來追求的目標。在我看來,回聲消除,噪聲抑制和聲源分離同屬於語音增強的範疇,如果把噪聲理解爲廣義的噪聲三者之間的關係如下圖:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ec/ec57fd925b373c8c67977f8811313027.png","alt":"圖 3 語音增強與回聲消除的關係","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"噪聲抑制需要準確估計出噪聲信號,其中平穩噪聲可以通過語音檢測判別有話端與無話端的狀態來動態更新噪聲信號,進而參與降噪,常用的手段是基於譜減法(即在原始信號的基礎上減去估計出來的噪聲所佔的成分)的一系列改進方法,其效果依賴於對噪聲信號估計的準確性。對於非平穩噪聲,目前用的較多的就是基於遞歸神經網絡的深度學習方法,很多 Windows 設備上都內置了基於多麥克風陣列的降噪的算法。效果上,爲了保證音質,噪聲抑制允許噪聲殘留,只要比原始信號信噪比高,噪且聽覺上失真無感知即可。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"單聲道的聲源分離技術起源於傳說中的雞尾酒會效應,是指人的一種聽力選擇能力,在這種情況下,注意力集中在某一個人的談話之中而忽略背景中其他的對話或噪音。該效應揭示了人類聽覺系統中令人驚奇的能力,即我們可以在噪聲中談話。科學家們一直在致力於用技術手段從單聲道錄音中分離出各種成分,一直以來的難點,隨着機器學習技術的應用,使得該技術慢慢變成了可能,但是較高的計算複雜度等原因,距離 RTC 這種低延時系統中的商用還是有一些距離。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"噪聲抑制與聲源分離都是單源輸入,只需要近端採集信號即可,傲嬌的回聲消除需要同時輸入近端信號與遠端參考信號。有同學會問已知了遠端參考信號,爲什麼不能用噪聲抑制方法處理呢,直接從頻域減掉遠端信號的頻譜不就可以了嗎?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/79/797bf11b1ff5a3ae511d84d122b37118.png","alt":"file","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上圖中第一行爲近端信號 s(n),已經混合了近端人聲和揚聲器播放出來的遠端信號,黃色框中已經標出對齊之後的遠端信號,其語音表達的內容一致,但是頻譜和幅度(明顯經過揚聲器放大之後聲音能量很高)均不一致,意思就是:參考的遠端信號與揚聲器播放出來的遠端信號已經是“貌合神離”了,與降噪的方法相結合也是不錯的思路,但是直接套用降噪的方法顯然會造成回聲殘留與雙講部分嚴重的抑制。接下來,我們來看看 WebRTC 科學家是怎麼做的吧。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"信號處理流程","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"WebRTC AEC 算法包含了延時調整策略,線性回聲估計,非線性回聲抑制 3 個部分。回聲消除本質上更像是音源分離,我們期望從混合的近端信號中消除不需要的遠端信號,保留近端人聲發送到遠端,但是 WebRTC 工程師們更傾向於將兩個人交流的過程理解爲一問一答的交替說話,存在遠近端同時連續說話的情況並不多(即保單講輕雙講)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此只需要區分遠近端說話區域就可以通過一些手段消除絕大多數遠端回聲,至於雙講恢復能力 WebRTC AEC 算法提供了 {kAecNlpConservative, kAecNlpModerate, kAecNlpAggressive} 3 個模式,由低到高依次代表不同的抑制程度,遠近端信號處理流程如圖 4:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/08/088823cdb9237800a76210d5dca8fb38.png","alt":"圖 4 WebRTC AEC 算法結構框圖","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"NLMS 自適應算法(上圖中橙色部分)的運用旨在儘可能地消除信號 d(n) 中的線性部分回聲,而殘留的非線性回聲信號會在非線性濾波(上圖中紫色部分)部分中被消除,這兩個模塊是 Webrtc AEC 的核心模塊。模塊前後依賴,現實場景中遠端信號 x(n) 由揚聲器播放出來在被麥克風採集的過程中,同時包含了回聲 y(n) 與近端信號 x(n) 的線性疊加和非線性疊加:需要消除線性回聲的目的是爲了增大近端信號 X(ω) 與濾波結果 E(ω) 之間的差異,計算相干性時差異就越大(近端信號接近 1,而遠端信號部分越接近 0),更容易通過門限直接區分近端幀與遠端幀。非線性濾波部分中只需要根據檢測的幀類型,調節抑制係數,濾波消除回聲即可。下面我們結合實例分析這套架構中的線性部分與非線性分。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"線性濾波","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"線性回聲 y'(n) 可以理解爲是遠端參考信號 x(n) 經過房間衝擊響應之後的結果,線性濾波的本質也就是在估計一組濾波器使得 y'(n) 儘可能的等於 x(n),通過統計濾波器組的最大幅值位置 index 找到與之對齊遠端信號幀,該幀數據會參與相干性計算等後續模塊。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"需要注意的是,如果 index 在濾波器階數兩端瘋狂試探,只能說明當前給到線性部分的遠近端延時較小或過大,此時濾波器效果是不穩定的,需要藉助固定延時調整或大延時調整使 index 處於一個比較理想的位置。線性部分算法是可以看作是一個固定步長的 NLMS 算法,具體細節大家可以結合源碼走讀,本節重點講解線型濾波在整個框架中的作用。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從個人理解來看,線性部分的目的就是最大程度的消除線性回聲,爲遠近端幀判別的時候,最大程度地保證了信號之間的相干值( 0~1 之間,值越大相干性越大)的可靠性。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們記消除線性回聲之後的信號爲估計的回聲信號 e(n),e(n) = s(n) + y''(n) + v(n),其中 y''(n) 爲非線性回聲信號,記 y'(n) 爲線性回聲,y(n) = y'(n) + y''(n)。相干性的計算 (Matlab代碼):","attrs":{}}]},{"type":"codeblock","attrs":{"lang":""},"content":[{"type":"text","text":"% WebRtcAec_UpdateCoherenceSpectra →_→ UpdateCoherenceSpectra\nSd = Sd * ptrGCoh(1) + abs(wined_fft_near) .* abs(wined_fft_near)*ptrGCoh(2);\nSe = Se * ptrGCoh(1) + abs(wined_fft_echo) .* abs(wined_fft_echo)*ptrGCoh(2);\nSx = Sx * ptrGCoh(1) + max(abs(wined_fft_far) .* abs(wined_fft_far),ones(N+1,1)*MinFarendPSD)*ptrGCoh(2);\nSde = Sde * ptrGCoh(1) + (wined_fft_near .* conj(wined_fft_echo)) *ptrGCoh(2);\nSxd = Sxd * ptrGCoh(1) + (wined_fft_near .* conj(wined_fft_far)) *ptrGCoh(2); \n\n% WebRtcAec_ComputeCoherence →_→ ComputeCoherence\ncohde = (abs(Sde).*abs(Sde))./(Sd.*Se + 1.0e-10);\ncohdx = (abs(Sxd).*abs(Sxd))./(Sx.*Sd + 1.0e-10);","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"兩個實驗","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(1)計算近端信號 d(n) 與遠端參考信號 x(n) 的相關性 cohdx,理論上遠端回聲信號的相干性應該更接近 0(爲了方便後續對比,WebRTC 做了反向處理: 1 - cohdx),如圖 5(a),第一行爲計算近端信號 d(n),第二行爲遠端參考信號 x(n),第三行爲二者相干性曲線: 1 - cohdx,會發現回聲部分相干值有明顯起伏,最大值有0.7,近端部分整體接近 1.0,但是有持續波動,如果想通過一條固定的門限去區分遠近端幀,會存在不同程度的誤判,反映到聽感上就是回聲(遠端判斷成近端)或丟字(近端判斷爲遠端)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f4/f4d3d96085ed7cdb3c7db6eb98b294ec.png","alt":" (a) 近端信號與遠端參考信號的相干性 ","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/fc/fcb361f1c74cc04b70aa01af81a40692.png","alt":" (b) 近端信號與估計的回聲信號的相干性","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖 5 信號的相干性","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(2)計算近端信號 d(n) 與估計的回聲信號 e(n) 的相干性,如圖 5(b),第二行爲估計的回聲信號 e(n),第三行爲二者相干性 cohde,很明顯近端的部分幾乎全部逼近 1.0,WebRTC 用比較嚴格的門限(>=0.98)即可將區分絕大部分近端幀,且誤判的概率比較小,WebRTC 工程師設置如此嚴格的門限想必是寧可犧牲一部分雙講效果,也不願意接受回聲殘留。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從圖 5 可以體會到,線性濾波之後可以進一步凸顯遠端參考信號 x(n) 與估計的回聲信號 e(n) 的差異,從而提高遠近端幀狀態的判決的可靠性。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"存在的問題與改進","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"理想情況下,遠端信號從揚聲器播放出來沒有非線性失真,那麼 e(n) = s(n) + v(n),但實際情況下 e(n)與d(n) 很像,只是遠端區域有一些幅度上的變化,說明 WebRTC AEC 線性部分在這個 case 中表現不佳,如圖 6(a) 從頻譜看低頻段明顯削弱,但中高頻部分幾乎沒變。而利用變步長的雙濾波器結構的結果會非常明顯,如圖 6(b) 所示無論是時域波形和頻譜與近端信號 x(n) 都有很大差異,目前 aec3 和 speex 中都採用這種結構,可見 WebRTC AEC 中線性部分還有很大的優化空間。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c1/c1686d83f7ea434549d497882c9c943f.png","alt":"(a) WebRTC AEC 線性部分輸出 ","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7a/7aac11c6dda63d248b2c7471982fd948.png","alt":" (b) 改進的線性部分輸出","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖 6 近端信號與估計的回聲信號的對比","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"如何衡量改進的線性部分效果?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏我們對比了現有的固定步長的 NLMS 和變步長的 NLMS,近端信號 d(n) 爲加混響的遠端參考信號 x(n) + 近端語音信號 s(n)。理論上 NLMS 在處理這種純線性疊加的信號時,可以不用非線性部分出馬,直接幹掉遠端回聲信號。圖 7(a) 第一行爲近端信號 d(n),第二列爲遠端參考信號 x(n),線性部分輸出結果,黃色框中爲遠端信號。WebRTC AEC 中採用固定步長的 NLMS 算法收斂較慢,有些許回聲殘留。但是變步長的 NLMS 收斂較快,回聲抑制相對好一些,如圖 7(b)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7e/7e4155d3ff9d987e8fdb78ce0c31d24a.png","alt":"(a)固定步長的 NLMS","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/15/1592661b7c46e39d43ab43714d1ac67c.png","alt":"(b) 變步長的 NLMS","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖 7 兩種 NLMS 算法的效果對比","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"線性濾波器參數設置","attrs":{}}]},{"type":"codeblock","attrs":{"lang":""},"content":[{"type":"text","text":"#define FRAME_LEN 80\n#define PART_LEN 64\nenum { kExtendedNumPartitions = 32 };\nstatic const int kNormalNumPartitions = 12;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"FRAME","attrs":{}},{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"LEN 爲每次傳給音頻 3A 模塊的數據的長度,默認爲 80 個採樣點,由於 WebRTC AEC 採用了 128 點 FFT,內部拼幀邏輯會取出 PART","attrs":{}},{"type":"text","text":"LEN = 64 個樣本點與前一幀剩餘數據連接成128點做 FFT,剩餘的 16 點遺留到下一次,因此實際每次處理 PART_LEN 個樣本點(4ms 數據)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"默認濾波器階數僅爲 kNormalNumPartitions = 12 個,能夠覆蓋的數據範圍爲 kNormalNumPartitions ","attrs":{}},{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":" 4ms = 48ms,如果打開擴展濾波器模式(設置 extended_filter_enabled爲true),覆蓋數據範圍爲 kNormalNumPartitions ","attrs":{}},{"type":"text","text":" 4ms = 132ms。隨着芯片處理能力的提升,默認會打開這個擴展濾波器模式,甚至擴展爲更高的階數,以此來應對市面上絕大多數的移動設備。另外,線性濾波器雖然不具備調整延時的能力,但可以通過估計的 index 衡量當前信號的延時狀態,範圍爲 [0, kNormalNumPartitions],如果 index 處於作用域兩端,說明真實延時過小或過大,會影響線性回聲估計的效果,嚴重的會帶來回聲,此時需要結合固定延時與大延時檢測來修正。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"非線性濾波","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"非線性部分一共做了兩件事,就是想盡千方百計幹掉遠端信號。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(1) 根據線性部分提供的估計的回聲信號,計算信號間的相干性,判別遠近端幀狀態。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(2) 調整抑制係數,計算非線性濾波參數。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"非線性濾波抑制係數爲 hNl,大致表徵着估計的回聲信號 e(n) 中,期望的近端成分與殘留的非線性回聲信號 y''(n) 在不同頻帶上的能量比,hNl 是與相干值是一致的,範圍是 [0,1.0],通過圖 5(b) 可以看出需要消除的遠端部分幅度值也普遍在 0.5 左右,如果直接使用 hNl 濾波會導致大量的回聲殘留。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此 WebRTC 工程師對 hNl 做了如下尺度變換,over","attrs":{}},{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"drive 與 nlp","attrs":{}},{"type":"text","text":"mode 相關,代表不同的抑制激進程度,drive","attrs":{}},{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"curve 是一條單調遞增的凸曲線,範圍 [1.0, 2.0]。由於中高頻的尾音在聽感上比較明顯,所以他們設計了這樣的抑制曲線來抑制高頻尾音。我們記尺度變換的 α = over","attrs":{}},{"type":"text","text":"drive","attrs":{}},{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"scaling * drive","attrs":{}},{"type":"text","text":"curve,如果設置 nlp_mode = kAecNlpAggressive,α 大約會在 30 左右。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":""},"content":[{"type":"text","text":"% matlab代碼如下:\nover_drive = min_override(nlp_mode+1);\nif (over_drive < over_drive_scaling)\n over_drive_scaling = 0.99*over_drive_scaling + 0.01*over_drive; % default 0.99 0.01\nelse\n over_drive_scaling = 0.9*over_drive_scaling + 0.1*over_drive; % default 0.9 0.1\nend\n\n% WebRtcAec_Overdrive →_→ Overdrive\nhNl(index) = weight_curve(index).*hNlFb + (1-weight_curve(index)).* hNl(index);\nhNl = hNl.^(over_drive_scaling * drive_curve);\n\n% WebRtcAec_Suppress →_→ Suppress\nwined_fft_echo = wined_fft_echo .*hNl;\nwined_fft_echo = conj(wined_fft_echo);","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果當前幀爲近端幀(即 echo_state = false),假設第 k 個頻帶 hNl(k) = 0.99994 ,hNl(k) = hNl(k)^α = 0.99994 ^ 30 = 0.9982,即使濾波後的損失聽感上幾乎無感知。如圖 8(a),hNl 經過 α 調製之後,幅值依然很接近 1.0。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果當前幀爲遠端幀(即 echo_state = true),假設第 k 個頻帶 hNl(k) = 0.6676 ,hNl(k) = hNl(k)^α = 0.6676 ^ 30 = 5.4386e-06,濾波後遠端能量小到基本聽不到了。如圖 8(b),hNl 經過 α 調製之後,基本接近 0。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/0a/0a07fe85a09ae07ffc678fceefb6487b.png","alt":"(a)近端幀對應的抑制係數","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c9/c9f636bf4d1e37272826fc3662bf0c13.png","alt":"(b)遠端幀對應的抑制係數","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖 8 遠近端信號抑制係數在調製前後的變化","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"經過如上對比,爲了保證經過調製之後近端期望信號失真最小,遠端回聲可以被抑制到不可聽,WebRTC AEC 纔在遠近端幀狀態判斷的的模塊中設置瞭如此嚴格的門限。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另外,調整係數 α 過於嚴格的情況下會帶來雙講的抑制,如圖 9 第 1 行,近端說話人聲音明顯丟失,通過調整 α 後得以恢復,如第 2 行所示。因此如果在 WebRTC AEC 現有策略上優化 α 估計,可以緩解雙講抑制嚴重的問題。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ed/ed19360718924c54812db977393e5ae3.png","alt":"圖 9 雙講效果","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"延時調整策略","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"回聲消除的效果與遠近端數據延時強相關,調整不當會帶來算法不可用的風險。在遠近端數據進入線性部分之前,一定要保證延時在設計的濾波器階數範圍內,不然延時過大超出了線性濾波器估計的範圍或調整過當導致遠近端非因果都會造成無法收斂的回聲。先科普兩個問題:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"(1)爲什麼會存在延時?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先近端信號 d(n) 中的回聲是揚聲器播放遠端參考 x(n),又被麥克風採集到的形成的,也就意味着在近端數據還未採集進來之前,遠端數據緩衝區中已經躺着 N 幀 x(n)了,這個天然的延時可以約等於音頻信號從準備渲染到被麥克風採集到的時間,不同設備這個延時是不等的。蘋果設備延時較小,基本在 120ms 左右,Android 設備普遍在 200ms 左右,低端機型上會有 300ms 左右甚至以上。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"(2)遠近端非因果爲什麼會導致回聲?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從(1)中可以認爲,正常情況下當前幀近端信號爲了找到與之對齊的遠端信號,必須在遠端緩衝區沿着寫指針向前查找。如果此時設備採集丟數據,遠端數據會迅速消耗,導致新來的近端幀在向前查找時,已經找不到與之對齊的遠端參考幀了,會導致後續各模塊工作異常。如圖 10(a) 表示正常延時情況,(b) 表示非因果。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/79/79835127af515c2d5cda6a1d6dd7d642.png","alt":"(a)遠近端正常延時","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/02/0226c2c99cd8b58a2059bb0568cdf203.png","alt":"(b)遠近端非因果","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖10 正常遠近端延時與非因果","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"WebRTC AEC 中的延時調整策略關鍵而且複雜,涉及到固定延時調整,大延時檢測,以及線性濾波器延時估計。三者的關係如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"① 固定延時調整隻會發生在開始 AEC 算法開始處理之前,而且僅調整一次。如會議盒子等固定的硬件設備延時基本是固定的,可以通過直接減去固定的延時的方法縮小延時估計範圍,使之快速來到濾波器覆蓋的延時範圍之內。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面結合代碼來看看固定延時的調整過程:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":""},"content":[{"type":"text","text":"int32_t WebRtcAec_Process(void* aecInst,\nconst float* const* nearend,\nsize_t num_bands,\nfloat* const* out,\nsize_t nrOfSamples,\nint16_t reported_delay_ms,\nint32_t skew);","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"WebRtcAec","attrs":{}},{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"Process 接口如上,參數 reported","attrs":{}},{"type":"text","text":"delay_ms 爲當前設備需要調整延時的目標值。如某 Android 設備固定延時爲 400ms 左右,400ms 已經超出濾波器覆蓋的延時範圍,至少需要調整 300ms 延時,才能滿足回聲消除沒有回聲的要求。固定延時調整在 WebRTC AEC 算法開始之初僅作用一次: ","attrs":{}}]},{"type":"codeblock","attrs":{"lang":""},"content":[{"type":"text","text":"if (self->startup_phase) {\n int startup_size_ms = reported_delay_ms < kFixedDelayMs ? kFixedDelayMs : reported_delay_ms;\n int target_delay = startup_size_ms * self->rate_factor * 8;\n int overhead_elements = (WebRtcAec_system_delay_aliyun(self->aec) - target_delay) / PART_LEN;\n printf(\"[audio] target_delay = %d, startup_size_ms = %d, self->rate_factor = %d, sysdelay = %d, overhead_elements = %d\\n\", target_delay, startup_size_ms, self->rate_factor, WebRtcAec_system_delay(self->aec), overhead_elements);\n WebRtcAec_AdjustFarendBufferSizeAndSystemDelay_aliyun(self->aec, overhead_elements);\nself->startup_phase = 0;\n }","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"爲什麼 target_delay 是這麼計算?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"int target","attrs":{}},{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"delay = startup","attrs":{}},{"type":"text","text":"size_ms ","attrs":{}},{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":" self->rate_factor ","attrs":{}},{"type":"text","text":" 8;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"startup","attrs":{}},{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"size","attrs":{}},{"type":"text","text":"ms 其實就是設置下去的 reported","attrs":{}},{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"delay","attrs":{}},{"type":"text","text":"ms,這一步將計算時間毫秒轉化爲樣本點數。16000hz 採樣中,10ms 表示 160 個樣本點,因此 target","attrs":{}},{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"delay 實際就是需要調整的目標樣本點數(aecpc->rate","attrs":{}},{"type":"text","text":"factor = aecpc->splitSampFreq / 8000 = 2)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們用 330ms 延時的數據測試:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果設置默認延時爲 240ms,overhead_elements 第一次被調整了 -60 個 block,負值表示向前查找,正好爲 60 ","attrs":{}},{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":" 4 = 240ms,之後線性濾波器固定 index = 24,表示 24 ","attrs":{}},{"type":"text","text":" 4 = 96ms 延時,二者之和約等於 330ms。日誌打印如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/eb/eb4cfb33c2a99efd001f0527ebca7276.png","alt":"file","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"② 大延時檢測是基於遠近端數據相似性在遠端大緩存中查找最相似的幀的過程,其算法原理有點類似音頻指紋中特徵匹配的思想。大延時調整的能力是對固定延時調整與線型濾波器能力的補充,使用它的時候需要比較慎重,需要控制調整的頻率,以及控制造成非因果的風險。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"WebRTC AEC 算法中開闢了可存儲 250 個 block 大緩衝區,每個 block 的長度 PART_LEN = 64 個樣本點,能夠保存最新的 1s 的數據,這也是理論上的大延時能夠估計的範圍,絕對夠用了。","attrs":{}}]},{"type":"codeblock","attrs":{"lang":""},"content":[{"type":"text","text":"static const size_t kBufferSizeBlocks = 250;\nbuffer_ = WebRtc_CreateBuffer(kBufferSizeBlocks, sizeof(float) * PART_LEN);\naec->delay_agnostic_enabled = 1;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們用 610ms 延時的數據測試(啓用大延時調整需要設置 delay","attrs":{}},{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"agnostic","attrs":{}},{"type":"text","text":"enabled = 1):","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們還是設置默認延時爲 240ms,剛開始還是調整了 -60 個 block,隨後大延時調整接入之後有調整了 -88 個 block,一共調整(60 + 88) * 4 = 592ms,之後線性濾波器固定 index = 4,表示最後剩餘延時剩餘 16ms,符合預期。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/87/870d3dbec32f9fc9622ae27cd0f24383.png","alt":"file","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/88/8834d691499e93c6bf3d2ab3530d7cd0.png","alt":"file","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"③ 線性濾波器延時估計是固定延時調整和大延時調整之後,濾波器對當前遠近端延時的最直接反饋。前兩者調整不當會造成延時過小甚至非因果,或延時過大超出濾波器覆蓋能力,導致無法收斂的回聲。因此前兩者在調整的過程中需要結合濾波器的能力,確保剩餘延時在濾波器能夠覆蓋的範圍之內,即使延時小範圍抖動,線性部分也能自適應調整。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"總結與優化方向","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"WebRTC AEC 存在的問題:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(1)線性部分收斂時間較慢,固定步長的 NLMS 算法對線性部分回聲的估計欠佳;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(2)線性部分濾波器階數默認爲 32 階,默認覆蓋延時 132ms,對移動端延時較大設備支持不是很好,大延時檢測部分介入較慢,且存在誤調導致非因果回聲的風險;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(3)基於相干性的幀狀態依賴嚴格的固定門限,存在一定程度的誤判,如果再去指導非線性部分抑制係數的調節,會帶來比較嚴重的雙講抑制。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"優化的方向:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(1)算法上可以通過學習 speex 和 AEC3 的線性部分,改善當前線性濾波效果;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(2)算法上可以優化延時調整策略,工程上可以新增參數配置下發等工程手段解決一些設備的延時問題;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(3)另外,有一些新的思路也是值得我們嘗試的,如開頭提到的,既然回聲也可以是視爲噪聲,那麼能否用降噪的思路做回聲消除呢,答案是可以的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"「視頻雲技術」你最值得關注的音視頻技術公衆號,每週推送來自阿里雲一線的實踐技術文章,在這裏與音視頻領域一流工程師交流切磋。","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章