speex從1.2版本開始支持靜音檢測vad(還有降噪、回聲消除、自動增益控制agc、抖動buffer、重採樣等一堆功能)等針對語音的預處理功能,實現在libspeexdsp庫中。
真正用起來後,發現各種坑!
首先我打開了降噪、agc和vad,結果預處理後的音頻播放起來有電流突突聲(不知道怎麼形容,看圖)
因爲speex初始化時frame size填的20ms幀長,所以各位從上圖可以看到,每隔20ms,波形會出現一個突變,突變從20ms對齊處開始,持續1.5ms左右
將降噪和agc關閉後,現象不變,還跟上圖一樣
察看speexdsp源碼中的preprocess.c文件,發現speex_preprocess_state_init函數默認打開降噪,不過我用speex_preprocess_ctl函數顯式關閉後,結果還是如上圖。而speex_preprocess_run函數裏面有段註釋嚇到我了
/* If noise suppression is off, don't apply the gain (but then why call this in the first place!) */
speexdsp的降噪也是擺設,打開降噪功能後,背景噪聲根本沒有任何減少(還增加了它自己引入的電流突突聲)
speexdsp還有個問題:即使是單純的背景噪聲,它也可能將其檢測爲語音,感覺它是單純基於頻域,即只要屬於高頻成分,一律認爲是人聲
以上兩點導致vad功能完全不可用
最後附上代碼,好奇的同學可以自行嘗試
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdint.h>
#include <assert.h>
#include <speex/speex_preprocess.h>
#define SAMPLE_RATE (16000)
#define FRAME_SIZE (20) //ms
#define SAMPLES_PER_FRAME (SAMPLE_RATE/1000 * FRAME_SIZE)//每毫秒16個樣點
#define FRAME_BYTES (SAMPLES_PER_FRAME * 2)//每個樣點2字節(單通道)
int main()
{
size_t n = 0;
FILE *inFile = fopen("/run/shm/rec_whp.raw", "rb");
FILE *outFile = fopen("/run/shm/rec_spx2.raw", "wb");
char *buf = malloc(FRAME_BYTES);
assert(buf != NULL);
SpeexPreprocessState *state = speex_preprocess_state_init(FRAME_SIZE, SAMPLE_RATE);
int denoise = 0;
speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_DENOISE, &denoise); //關閉降噪
//speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_NOISE_SUPPRESS, &noiseSuppress); //設置噪聲的dB
//speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_AGC, &agc);//增益
//speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_AGC_LEVEL,&agcLevel);//設置增益的dB
//int vad = 1, vadProbStart = 80, vadProbContinue = 65;
int vad = 1, vadProbStart = 99, vadProbContinue = 99;
speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_VAD, &vad); //靜音檢測
speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_PROB_START , &vadProbStart); //Set probability required for the VAD to go from silence to voice
speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_PROB_CONTINUE, &vadProbContinue); //Set probability required for the VAD to stay in the voice state (integer percent)
while (1)
{
n = fread(buf, 2, SAMPLES_PER_FRAME, inFile);
if (n == 0)
break;
speex_preprocess_run(state, (spx_int16_t*)(buf));
fwrite(buf, 2, SAMPLES_PER_FRAME, outFile);
}
free(buf);
fclose(inFile);
fclose(outFile);
speex_preprocess_state_destroy(state);
return 0;
}
編譯運行:
gcc squelch.c -lspeexdsp
./a.out
還好我最終用自己想出來的方法實現了靜音檢測,雖然應用範圍較窄,但符合我們的使用場景