不要用speex做靜音檢測vad

原創

六个九十度

2020-02-22 00:09

speex從1.2版本開始支持靜音檢測vad（還有降噪、回聲消除、自動增益控制agc、抖動buffer、重採樣等一堆功能）等針對語音的預處理功能，實現在libspeexdsp庫中。

真正用起來後，發現各種坑！

首先我打開了降噪、agc和vad，結果預處理後的音頻播放起來有電流突突聲（不知道怎麼形容，看圖）

因爲speex初始化時frame size填的20ms幀長，所以各位從上圖可以看到，每隔20ms，波形會出現一個突變，突變從20ms對齊處開始，持續1.5ms左右

將降噪和agc關閉後，現象不變，還跟上圖一樣

察看speexdsp源碼中的preprocess.c文件，發現speex_preprocess_state_init函數默認打開降噪，不過我用speex_preprocess_ctl函數顯式關閉後，結果還是如上圖。而speex_preprocess_run函數裏面有段註釋嚇到我了

   /* If noise suppression is off, don't apply the gain (but then why call this in the first place!) */

speexdsp的降噪也是擺設，打開降噪功能後，背景噪聲根本沒有任何減少（還增加了它自己引入的電流突突聲）

speexdsp還有個問題：即使是單純的背景噪聲，它也可能將其檢測爲語音，感覺它是單純基於頻域，即只要屬於高頻成分，一律認爲是人聲

以上兩點導致vad功能完全不可用

最後附上代碼，好奇的同學可以自行嘗試

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdint.h>
#include <assert.h>
#include <speex/speex_preprocess.h>
#define SAMPLE_RATE (16000)
#define FRAME_SIZE (20) //ms
#define SAMPLES_PER_FRAME (SAMPLE_RATE/1000 * FRAME_SIZE)//每毫秒16個樣點
#define FRAME_BYTES (SAMPLES_PER_FRAME * 2)//每個樣點2字節（單通道）
int main()
{
    size_t n = 0;
    FILE *inFile = fopen("/run/shm/rec_whp.raw", "rb");
    FILE *outFile = fopen("/run/shm/rec_spx2.raw", "wb");
    char *buf = malloc(FRAME_BYTES);
    assert(buf != NULL);

    SpeexPreprocessState *state = speex_preprocess_state_init(FRAME_SIZE, SAMPLE_RATE);
    int denoise = 0;
    speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_DENOISE, &denoise); //關閉降噪
    //speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_NOISE_SUPPRESS, &noiseSuppress); //設置噪聲的dB
    //speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_AGC, &agc);//增益
    //speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_AGC_LEVEL,&agcLevel);//設置增益的dB

    //int vad = 1, vadProbStart = 80, vadProbContinue = 65;
    int vad = 1, vadProbStart = 99, vadProbContinue = 99;
    speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_VAD, &vad); //靜音檢測
    speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_PROB_START , &vadProbStart); //Set probability required for the VAD to go from silence to voice
    speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_PROB_CONTINUE, &vadProbContinue); //Set probability required for the VAD to stay in the voice state (integer percent)
    while (1)
    {
        n = fread(buf, 2, SAMPLES_PER_FRAME, inFile);
        if (n == 0)
            break;
        speex_preprocess_run(state, (spx_int16_t*)(buf));
        fwrite(buf, 2, SAMPLES_PER_FRAME, outFile);
    }

    free(buf);
    fclose(inFile);
    fclose(outFile);
    speex_preprocess_state_destroy(state);
    return 0;
}

編譯運行：