詳解 WebRTC 高音質低延時的背後 — AGC(自動增益控制)

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"前面我們介紹了 WebRTC 音頻 3A 中的","attrs":{}},{"type":"link","attrs":{"href":"https://mp.weixin.qq.com/s/iq6EWCQHoYTtAwZBzs8tYA","title":"","type":null},"content":[{"type":"text","text":"聲學回聲消除(AEC:Acoustic Echo Cancellation)","attrs":{}}]},{"type":"text","text":"的基本原理與優化方向,這一章我們接着聊另外一個 \"A\" -- 自動增益控制(AGC:Auto Gain Control)。本文將結合實例全面解析 WebRTC AGC 的基本框架,一起探索其基本原理、模式的差異、存在的問題以及優化方向。","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"作者|珞神 ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"審校|泰一","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"前言","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"自動增益控制(AGC:Auto Gain Control)","attrs":{}},{"type":"text","text":"是我認爲鏈路最長,最影響音質和主觀聽感的音頻算法模塊,一方面是 AGC 必須作用於發送端來應對移動端與 PC 端多樣的採集設備,另一方面 AGC 也常被作爲壓限器作用於接收端,均衡混音信號防止爆音。設備的多樣性最直接的體現就是音頻採集的差異,一般表現爲音量過大導致爆音,採集音量過小對端聽起來很喫力。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在音視頻通話的現實場景中,不同的參會人說話音量各有不同,參會用戶需要頻繁的調整播放音量來滿足聽感的需要,戴耳機的用戶隨時承受着大音量對耳朵的 “暴擊”。因此,對發送端音量的均衡在上述場景中顯得尤爲重要,優秀的自動增益控制算法能夠統一音頻音量大小,極大地緩解了由設備採集差異、說話人音量大小、距離遠近等因素導致的音量的差異。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"AGC 在 WebRTC 中的位置","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在講 AGC 音頻流處理框架之前,我們先看看 AGC 在音視頻實時通信中的位置,如圖 1 展示了同一設備作爲發送端音頻數據從採集到編碼,以及作爲接收端音頻數據從解碼到播放的過程。AGC 在發送端作爲均衡器和壓限器調整推流音量,在接收端僅作爲壓限器防止混音之後播放的音頻數據爆音,理論上推流端 AGC 做的足夠魯棒之後,拉流端僅作爲壓限器是足夠的,有的廠家爲了進一步減小混音之後不同人聲的音量差異也會再做一次 AGC。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d3/d361de3dbb49607b4e2605f8e008884a.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖 1 WebRTC 中音頻信號上下行處理流程框圖","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"AGC 的核心參數","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"先科普一下樣本點幅度值 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Sample","attrs":{}},{"type":"text","text":" 與分貝 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"dB","attrs":{}},{"type":"text","text":" 之間的關係,以 16bit 量化的音頻採樣點爲例:","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"dB = 20 * log10(Sample / 32768.0)","attrs":{}},{"type":"text","text":",與 Adobe Audition 右側縱座標刻度一致。幅度值表示:16bit 採樣最小值爲 0,最大值絕對值爲 32768(","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"幅度值如下圖右邊欄縱座標","attrs":{}},{"type":"text","text":")。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ec/ec0e726ba8c48fec3ac7af63fb467269.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分貝表示:最大值爲 0 分貝(","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"分貝值如下圖右邊欄縱座標","attrs":{}},{"type":"text","text":"),一般音量到達 -3dB 已經比較大了,3 也經常設置爲 AGC 目標音量。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f0/f07aa38a74b2299a6a3b156fea4d7c40.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"核心參數有:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"typedef struct {\n int16_t targetLevelDbfs; // 目標音量\n int16_t compressionGaindB; // 增益能力\n uint8_t limiterEnable; // 壓限器開關\n} AliyunAgcConfig;\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"目標音量 - targetLevelDbfs","attrs":{}},{"type":"text","text":":表示音量均衡結果的目標值,如設置爲 1 表示輸出音量的目標值爲 - 1dB;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"增益能力 - compressionGaindB","attrs":{}},{"type":"text","text":":表示音頻最大的增益能力,如設置爲 12dB,最大可以被提升 12dB;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"壓限器開關 - limiterEnable","attrs":{}},{"type":"text","text":":一般與 targetLevelDbfs 配合使用,compressionGaindB 是調節小音量的增益範圍,limiter 則是對超過 targetLevelDbfs 的部分進行限制,避免數據爆音。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"AGC 的核心模式","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了以上三個核心的參數外,針對不同的接入設備 WebRTC AGC 提供了以下三種模式:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"enum {\n kAgcModeUnchanged,\n kAgcModeAdaptiveAnalog, // 自適應模擬模式\n kAgcModeAdaptiveDigital, // 自適應數字增益模式\n kAgcModeFixedDigital // 固定數字增益模式\n};\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以下我們會結合實例從基本功能,適用場景,信號流圖以及存在的問題等方面闡述這三個模式。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"固定數字增益 - FixedDigital","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"固定數字增益模式最基礎的增益模式也是 AGC 的核心,其他兩種模式都是在此基礎上擴展得到。主要是對信號進行固定增益的放大,最大增益不超過設置的增益能力 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"compressionGaindB","attrs":{}},{"type":"text","text":",結合 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"limiter","attrs":{}},{"type":"text","text":" 使用的時候上限不超過設置的目標音量 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"targetLevelDbfs","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"固定數字增益模式下僅依靠核心函數 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"WebRtcAgc_ProcessDigital","attrs":{}},{"type":"text","text":" 對輸入信號音量進行均衡,由於沒有反饋機制,其信號處理流程也是極其簡單,設置好參數之後信號會經過如下流程:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a9/a9ca1bd7804bade0dd9fdadebc3887f1.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"固定數字增益模式是最核心的模式,主要有如下兩個方面值得我們深入學習:","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"語音檢測模塊 WebRtcAgc_ProcessVad 的基本思想","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在實時通信的場景中,麥克風採集的近端信號中會存在遠端的信號的成分,流程中會先通過 WebRtcAgc_ProcessVad 函數對遠端信號進行分析,在探測實際近端信號包絡的時候需要剔除遠端信號這個干擾項,避免因殘留的回聲信號影響了近端信號包絡等參數的統計。最傳統的 VAD 會基於能量,過零率和噪聲門限等指標區分語音段和無話段,WebRTC AGC 中爲粗略的區分語音段提供了新的思路:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"計算短時均值和方差,描述語音包絡瞬時變化,能夠準確反映語音的包絡,如","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"圖 2 左紅色曲線","attrs":{}},{"type":"text","text":";","attrs":{}}]}]}]},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"// update short-term estimate of mean energy level (Q10)\ntmp32 = state->meanShortTerm * 15 + dB;\nstate->meanShortTerm = (int16_t)(tmp32 >> 4);\n \n// update short-term estimate of variance in energy level (Q8)\ntmp32 = (dB * dB) >> 12;\ntmp32 += state->varianceShortTerm * 15;\nstate->varianceShortTerm = tmp32 / 16;\n \n// update short-term estimate of standard deviation in energy level (Q10)\ntmp32 = state->meanShortTerm * state->meanShortTerm;\ntmp32 = (state->varianceShortTerm << 12) - tmp32;\nstate->stdShortTerm = (int16_t)WebRtcSpl_Sqrt(tmp32);\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":2,"normalizeStart":2},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"計算長時均值和方差,描述信號整體緩慢的變化趨勢,勾勒信號的 “重心線”,比較平滑有利於利用門限值作爲檢測條件,如","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"圖 2 左藍色曲線","attrs":{}},{"type":"text","text":";","attrs":{}}]}]}]},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"// update long-term estimate of mean energy level (Q10)\ntmp32 = state->meanLongTerm * state->counter + dB;\nstate->meanLongTerm = WebRtcSpl_DivW32W16ResW16(tmp32, WebRtcSpl_AddSatW16(state->counter, 1));\n// update long-term estimate of variance in energy level (Q8)\ntmp32 += state->varianceLongTerm * state->counter;\nstate->varianceLongTerm = WebRtcSpl_DivW32W16(tmp32, WebRtcSpl_AddSatW16(state->counter, 1));\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":3,"normalizeStart":3},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"計算","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"標準分數","attrs":{}},{"type":"text","text":",描述短時均值與 “重心線” 的偏差,位於中心之上的部分可以認爲發生語音活動的可能性極大;","attrs":{}}]}]}]},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"tmp32 = tmp16 * (int16_t)(dB - state->meanLongTerm);\ntmp32 = WebRtcSpl_DivW32W16(tmp32, state->stdLongTerm);\nstate->logRatio = (int16_t)(tmp32 >> 6);\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/91/913d1a2d87d8f646b78c878481c86257.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖 2 左:長短時均值與方差 右:輸入與 vad 檢測門限","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"WebRtcAgc_ProcessDigital 如何對音頻數據進行增益","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3 個核心參數都是圍繞固定數字增益模式展開的,我們需要搞清楚的是 WebRTC AGC 中核心函數 - ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"WebRtcAgc_ProcessDigital","attrs":{}},{"type":"text","text":" 是如何對音頻數據進行增益的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"根據指定的 targetLevelDbfs 和 compressionGaindB,計算增益表 gainTable;","attrs":{}}]}]}]},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"/* 根據設置的目標增益與增益能力,計算增益表gainTable */\nif (WebRtcAgc_CalculateGainTable(&(stt->digitalAgc.gainTable[0]), stt->compressionGaindB, stt->targetLevelDbfs, stt->limiterEnable, stt->analogTarget) == -1) {\n return -1;\n }\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這一步中增益表 gainTable 可以理解爲對信號能量值(幅值的平方)的量化,我們先固定 targetLevelDbfs,分別設置 compressionGaindB 爲 3dB~15dB,所對應的增益表曲線如下,可以看到增益能力設置越大,曲線越高,如下圖。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ec/eca1e69e826b067fab670fd629c04ad5.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大家可能會好奇增益表 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"gainTable","attrs":{}},{"type":"text","text":" 的長度爲什麼只有 32 呢?32 其實表示的是一個 int 型數據的 32 位(short 型數據的能量值範圍爲 [0, 32768^2] 可以用無符號 int 型數據表示),從高位到低位,爲 1 的最高位具有最大的數量級稱爲整數部分 - intpart,後續數位組成小數部分稱爲 fracpart。因此 [0, 32768] 之間的任意一個數都對應數字增益表中的一個增益值。接下來我們講講如何查表並應用增益值完成音量均衡。","attrs":{}}]},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"/** 部分關鍵源碼 */\n/** 提取整數部分和小數部分 */\nintPart = (uint16_t)(absInLevel >> 14); // extract the integral part\nfracPart = (uint16_t)(absInLevel & 0x00003FFF); // extract the fractional part\n......\n/** 根據整數部分和小數部分生成數字增益表 */\ngainTable[i] = (1 << intPart) + WEBRTC_SPL_SHIFT_W32(fracPart, intPart - 14);\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":2,"normalizeStart":2},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"根據輸入信號包絡在增益表 gainTable 中查找增益值,並應用增益到輸入信號;","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於人耳的聽覺曲線,AGC 中在應用增益是是分段的,一幀 160 個樣本點會分爲 10 段,每段 16 個樣本點,因此會引入分段增益數組 gains,下述代碼中描述了數字增益表與增益數組的關係,直接體現了查表的過程,其思想與計算增益表時相似,也是先計算整數部分與小數部分,再通過增益表組合計算出新的增益值,其中就包含了小數部分的補償。","attrs":{}}]},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"// Translate signal level into gain, using a piecewise linear approximation\n // find number of leading zeros\n zeros = WebRtcSpl_NormU32((uint32_t)cur_level);\n if (cur_level == 0) {\n zeros = 31;\n }\n tmp32 = (cur_level << zeros) & 0x7FFFFFFF;\n frac = (int16_t)(tmp32 >> 19); // Q12.\n tmp32 = (stt->gainTable[zeros - 1] - stt->gainTable[zeros]) * frac;\n gains[k + 1] = stt->gainTable[zeros] + (tmp32 >> 12);\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下述代碼是根據分段增益數組 gains,右移 16 位後獲得實際的增益值(","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"之前計算增益表和增益數組都是基於樣本點能量,這裏右移 16 位可以理解成找到一個整數 α,使得信號幅度值 sample 乘以 α 最接近 32768","attrs":{}},{"type":"text","text":"),直接乘到輸出信號上(這裏的輸出信號在函數開始已經被拷貝了輸入信號)。","attrs":{}}]},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"/** 增益數組gains作用到輸出信號,完成音量均衡 */\n for (k = 1; k < 10; k++) {\n delta = (gains[k + 1] - gains[k]) * (1 << (4 - L2));\n gain32 = gains[k] * (1 << 4);\n // iterate over samples\n for (n = 0; n < L; n++) {\n for (i = 0; i < num_bands; ++i) {\n tmp32 = out[i][k * L + n] * (gain32 >> 4);\n out[i][k * L + n] = (int16_t)(tmp32 >> 16);\n }\n gain32 += delta;\n }\n }\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們以 compressionGaindB = 12dB 的曲線爲例,上圖爲計算的數字增益表 gainTable 的實際值,下圖爲右移 16 位之後得到的實際增益倍數。可以看到 compressionGaindB = 12dB 時,整數部分最大增益爲 3,理論上增益 12dB 實際上是放大了 4 倍,這裏整數部分最大可以乘上 3 倍,後續再由小數部分補充剩餘的 0~1.0 倍,從而可以防止爆音。簡單舉兩個例子:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/10/102f9374a5b6f18d4402fd9549c838be.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"A. 幅度值爲 8000 的數據,包絡 cur_level = 8000^2 = 0x3D09000,通過 WebRtcSpl_NormU32 ((uint32_t) cur_level); 計算得到前置 0 有 6 個,查表得到整數部分增益爲 stt->gainTable [6] = 3,即 8000 可以大膽乘以 3 倍,之後增益倍數小於 1.0 的部分由 fracpart 決定;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"B. 幅度值爲 16000 的數據,包絡 cur_level = 16000^2 = 0xF424000,通過 WebRtcSpl_NormU32 ((uint32_t) cur_level); 計算得到前置 0 有 4 個,查表得到整數部分增益爲 stt->gainTable [4] = 2,此時會發現 16000 * 2 = 32000,之後均衡到目標音量的過程由 limiter 決定,細節這裏不展開。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"簡單說就是,[0, 32768] 中的任何一個數想要增益指定的分貝且結果又不超過 32768,都能在數字增益表 gainTable 中找到確定的元素滿足這個要求。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"關於目標增益 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"targetLevelDbfs","attrs":{}},{"type":"text","text":" 和 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Limiter","attrs":{}},{"type":"text","text":" 的應用在 WebRtcAgc_ProcessDigital 以及相關函數中均有體現,這裏就不展開闡述,大家可以走讀源碼深入學習。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面我們用幾個 case 來看看固定數字增益模式的效果和存在的問題,先固定設置 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"targetLevelDbfs = 1","attrs":{}},{"type":"text","text":", compressionGaindB = 12。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1. 採集音量較小,均衡後改善不明顯;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"設備採集音量 - 24dB, 均衡後音量只有 - 12dB,整體音量聽感上會覺得偏小;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/76/76a86531e6b826b48ea1942e6c8fd6fb.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"2. 採集音量較大,底噪明顯增強;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"設備採集音量 - 9dB, 均衡後音量達到 - 1dB,整體音量聽感上正常,但語音幀間起伏減小,主要是無話段的噪聲部分得到較大提升。這個情況下主要的問題就是當採集音量本身就比較大時,如果環境噪聲較大,且降噪能力不強時,一旦 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"compressionGaindB","attrs":{}},{"type":"text","text":" 設置較大,那麼語音部分會被限制在 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"targetLevelDbfs","attrs":{}},{"type":"text","text":",但是無話段部分底噪會得到全量的提升,對端參會人可以聽到明顯的噪聲。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a0/a0aea9a7412af593d0be9de1be54a136.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"3. 採集聲音起伏較大(以人爲拼接的由大到小的音頻爲例),均衡後依然無法改善;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/fb/fb0c16e4150813ed92edc5faf292ee3d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"自適應模擬增益 - AdaptiveAnalog","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"在講自適應模擬增益之前,我們需要明確 PC 端影響採集音量的功能:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"PC 端支持調節採集音量,調節範圍爲 0~1.0,WebRTC 客戶端代碼內部映射到了 0~255;","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"/** 以mac爲例,麥克風靈敏度被轉成了0~255 */\nint32_t AudioMixerManagerMac::MicrophoneVolume(uint32_t& volume) const {\n ......\n // vol 0.0 to 1.0 -> convert to 0 - 255\n volume = static_cast(volFloat32 * 255 + 0.5);\n ......\n return 0;\n}\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":2,"normalizeStart":2},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"絕大多數 windows 筆記本設備內置了麥克風陣列,並提供麥克風陣列增強算法,降噪的同時還會額外提供 0~10dB 的增益(不同機型範圍不同,聯想的設備增益高達 36dB),如圖 3;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/06/068454d87698895248b5b1165147b0dd.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"圖 3 左:MAC 端模擬增益調節 右:Windows 端麥克風陣列自帶的增益能力","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於控制音量的模塊過多,導致 PC 端 AGC 算法更加敏感。線上很多客戶設置的默認值並不合理,這會直接影響音視頻通話的體驗:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"採集音量過大會導致噪聲被明顯提升,人聲爆音;","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/23/239b203f0e660c244a74fe817034013e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":2,"normalizeStart":2},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"採集音量過大會導致播放的信號回採到麥克風之後有較大的非線性失真,對回聲消除算法是不小的挑戰;","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/99/99ea25bf0ae13b98739fe40838412361.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":3,"normalizeStart":3},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"採集音量過小,數字增益能力有限導致對端聽不清;","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7d/7d1fc13727908ae21dc3b9db40247c27.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"絕大多數用戶在察覺到聲音異常後並不知道 PC 設備還具備手動調節採集增益的功能,依賴於線上用戶(尤其是教育場景很多是小學生)自己去調節模擬增益值幾乎不可能,將模擬增益值動態調節的功能做到 AGC 算法內部更可行,配合數字增益部分將近端信號均衡到理想的位置,因此,WebRTC 科學家開發設計了自適應模擬增益模式,通過反饋機制來調節原始採集音量,目標就是與數字增益模塊相互配合,找到最合適的麥克風增益值並反饋給設備層,使得近端數據再經過數字增益之後達到目標增益,音頻數據流框圖如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/54/54ff131a5e930b4dd991cccf0d4ab64f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在固定數字增益的基礎上主要有兩處新增:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"在數字增益之後,新增了模擬增益更新模塊:","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"WebRtcAgc_ProcessAnalog","attrs":{}},{"type":"text","text":",會根據當前模擬增益值 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"inMicLevel","attrs":{}},{"type":"text","text":"(WebRTC 中將尺度映射到 0~255)等中間參數,計算下一次需要調節的模擬增益值 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"outMicLevel","attrs":{}},{"type":"text","text":",並反饋給設備層。","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"// Scale from VoE to ADM level range.\nuint32_t new_voe_mic_level = shared_->transmit_mixer()->CaptureLevel();\nif (new_voe_mic_level != voe_mic_level) {\n // Return the new volume if AGC has changed the volume.\n new_mic_volume = static_cast((new_voe_mic_level * max_volume +static_cast(kMaxVolumeLevel / 2)) / kMaxVolumeLevel);\n return new_mic_volume;\n}\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":2,"normalizeStart":2},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"有些設備商麥克風陣列默認設置比較小,即使將模擬增益調滿採集依然很小,此時就需要數字增益補償部分來改善:","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"WebRtcAgc_AddMic","attrs":{}},{"type":"text","text":",可以在原始採集的基礎上再放大 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1.0~3.16","attrs":{}},{"type":"text","text":" 倍,如圖 4。那麼,如何判斷放大不夠呢?上一步中模擬增益更新模塊最終輸出實際爲 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"micVol","attrs":{}},{"type":"text","text":" 與最大值 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"maxAnalog(255)","attrs":{}},{"type":"text","text":" 之間較小的那個:","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"*outMicLevel = WEBRTC_SPL_MIN(stt->micVol, stt->maxAnalog) >> stt->scale;\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"即根據相關的規則計算得到的實際值 micVol 是有可能大於規定的最大值 maxAnalog 的,也就意味着將模擬增益調整到最大也無法達到目標音量,WebRtcAgc_AddMic 會監控這種事件的發生,並會通過查表的方式給予額外的補償。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"增益表 kGainTableAnalog:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"static const uint16_t kGainTableAnalog[GAIN_TBL_LEN] = {\n 4096, 4251, 4412, 4579, 4752, 4932, 5118, 5312, 5513, 5722, 5938,\n 6163, 6396, 6638, 6889, 7150, 7420, 7701, 7992, 8295, 8609, 8934,\n 9273, 9623, 9987, 10365, 10758, 11165, 11587, 12025, 12480, 12953};\n// apply gain\nsample = (in_mic[j][i] * gain) >> 12; // 經過右移之後,數組被量化到0~3.16.\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/fc/fce0b9e80d1431c8cc88e243691d077f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖 4 增益表的增益曲線","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每次以 1 的固定步長補償輸入信號,gainTableIdx = 0 表示放大倍數爲 1 倍,即什麼也不做。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"/* Increment through the table towards the target gain.\n * If micVol drops below maxAnalog, we allow the gain\n * to be dropped immediately. */\nif (stt->gainTableIdx < targetGainIdx) {\n stt->gainTableIdx++;\n} else if (stt->gainTableIdx > targetGainIdx) {\n stt->gainTableIdx--;\n}\ngain = kGainTableAnalog[stt->gainTableIdx];\n// apply gain\nsample = (in_mic[j][i] * gain) >> 12;\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"存在的問題:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"無語音狀態下的模擬值上調行爲;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/50/50ed022ae64f3804bc6a7e2cc92c0d6c.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"調整幅度過大,造成明顯的聲音起伏;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/65/6572e12827ab6db25d51bf52a97541b7.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"頻繁調整操作系統 API,帶來不必要的性能消耗,嚴重的會導致線程阻塞;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"數字部分增益能力有限,無法與模擬增益形成互補;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":5,"align":null,"origin":null},"content":[{"type":"text","text":"爆音檢測不是很敏感,不能及時下調模擬增益;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":6,"align":null,"origin":null},"content":[{"type":"text","text":"AddMic 模塊精度不夠,補償過程中存在爆音的風險爆音。","attrs":{}}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"自適應數字增益 - AdaptiveDigital","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於音頻視頻通信的娛樂、社交、在線教育等領域離不開多種多樣的智能手機和平板設備,然而這些移動端並沒有類似 PC 端調節模擬增益的接口。聲源與設備的距離,聲源音量以及硬件採集能力等因素都會影響採集音量,單純依賴固定數字增益效果十分有限,尤其是多人會議的時候會明顯感受到不同說話人的音量並不一致,聽感上音量起伏較大。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了解決這個問題,WebRTC 科學家仿照了 PC 端模擬增益調節的能力,基於模擬增益框架新增了虛擬麥克風調節模塊:WebRtcAgc_VirtualMic,利用兩個長度爲 128 的數組:增益曲線 - kGainTableVirtualMic 和抑制曲線 - kSuppressionTableVirtualMic 來模擬 PC 端模擬增益(增益部分爲單調遞增的直線,抑制部分爲單調遞減的凹曲線),前者提供 1.0~3.0 倍的增益能力,後者提供 1.0~0.1 的下壓能力。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/1e/1e4190fdd88dbeaaaeba4665972dba16.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖 5 增益曲線與抑制曲線","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"核心邏輯邏輯與自適應模擬增益一致。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"與自適應模式增益模式一樣,依然利用 WebRtcAgc_ProcessAnalog 更新 micVol;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"根據 micVol 在 WebRtcAgc_VirtualMic 模塊中更新增益下標 gainIdx,並查表得到新的增益 gain;","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"/* 設置期望的音量水平 */\n gainIdx = stt->micVol;\n if (gainIdx > 127) {\n gain = kGainTableVirtualMic[gainIdx - 128];\n } else {\n gain = kSuppressionTableVirtualMic[127 - gainIdx];\n }\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":3,"normalizeStart":3},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"應用增益 gain,期間一旦檢測到飽和,會逐步遞減 gainIdx;","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"/* 飽和檢測更新增益 */\nif (tmpFlt > 32767) {\n tmpFlt = 32767;\n gainIdx--;\n if (gainIdx >= 127) {\n gain = kGainTableVirtualMic[gainIdx - 127];\n } else {\n gain = kSuppressionTableVirtualMic[127 - gainIdx];\n }\n}\nif (tmpFlt < -32768) {\n tmpFlt = -32768;\n gainIdx--;\n if (gainIdx >= 127) {\n gain = kGainTableVirtualMic[gainIdx - 127];\n } else {\n gain = kSuppressionTableVirtualMic[127 - gainIdx];\n }\n}\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":4,"normalizeStart":4},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"增益後的數據傳入 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"WebRtcAgc_AddMic","attrs":{}},{"type":"text","text":",檢查 micVol 是否大於最大值 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"maxAnalog","attrs":{}},{"type":"text","text":" 決定是否需要激活額外的補償。","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"音頻數據流框圖如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/1e/1ecb7c51d1a509e1666525c338b3b6ed.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"存在的問題與自適應模式增益相似,這裏需要明確說的一個問題是數字增益自適應調節靈敏度不高,當輸入音量起伏時容易出現塊狀拉昇或壓縮,用一個比較明顯的例子說明:遇到大音量時需要調用壓縮曲線,如果後面緊跟較小音量,會導致小音量進一步壓縮,接着會調大增益,此時小音量後續如果接着跟大音量,會導致大音量爆音,需要 limiter 參與壓限,對音質是存在失真的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e4/e413db727659d8eed8780c6671a90756.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/44/4423991397c014fca7e9c6aed1ab4736.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"總結與優化方向","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了更好的聽感體驗,AGC 算法的目標就是忽略設備採集差異,依然能夠將推流端音頻音量均衡到理想位置,杜絕音量小、杜絕爆音、解決多人混音後不同人聲音量起伏等核心問題。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"針對上述章節提到的各個模式存在的問題,有如下幾點啓示:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"模擬增益調節,必須修復調節頻繁,步長過大等問題;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"AddMic 部分精度不夠,可以提前預判,不要等到檢測到爆音再回調;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"PC 端數字增益和模擬增益模塊上是相互獨立的,但是效果上應該是相互補償的;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"AGC 對音量的均衡不應該影響 MOS,不能因爲追求靈敏度放棄了 MOS。","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另外,代碼中很多位運算初讀起來比較容易勸退,希望大家抓核心代碼,形成整體框架後多實踐,再吸收消化。","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後,讓我們看看優化後的效果:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"模擬增益調節之後,採集的音頻信號音量存在起伏,經過數字部分均衡後音頻包絡保持較好,音量整體一致;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/5c/5c219b78d8139d53fcfa92aa7861594d.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"語音和環境中的雜音,經過 AGC 之後語音部分音量起伏減小,雜音部分未見明顯提升;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/2c/2cc15504eb9253b959b9614411335d30.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"一個比較極端的 case,小語音部分最大提升了 35dB,收斂時間保持在 10s 以內。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/77/774733a17ac6e93b7d46e6842a9767b1.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"「視頻雲技術」你最值得關注的音視頻技術公衆號,每週推送來自阿里雲一線的實踐技術文章,在這裏與音視頻領域一流工程師交流切磋。公衆號後臺回覆【技術】可加入阿里雲視頻雲技術交流羣,和作者一起探討音視頻技術,獲取更多行業最新信息。","attrs":{}}]}],"attrs":{}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章