使用譜減法對語音信號進行降噪(librosa),C#使用Speex

基本思想是把時域信號轉換到頻域進行處理,處理完畢後再轉回時域信號,具體算法可以參考:

https://blog.csdn.net/godloveyuxu/article/details/69225790

 

2020年5月10日補充:新增C#使用Speex降噪的代碼,在文章最後

 

使用C#對語音信號降噪處理比較困難,查閱資料知道可以使用Webrtc或者speex進行降噪,不過核心思想都是把C++轉成dll庫供C#調用,由於對C++不是很熟悉,折騰了好久都沒有實現,如果想了解一下,下面的文章可以參考一下:

Webrtc:  https://www.cnblogs.com/mod109/p/5767867.html

              https://www.cnblogs.com/Hard/p/csharp-use-webrtc-noisesuppression.html

speex::https://www.cnblogs.com/mod109/p/5744468.html

              https://blog.csdn.net/u012931018/article/details/16927583

              https://www.cnblogs.com/zhuweisky/archive/2010/09/16/1827896.html(這個是傲瑞科技的馬甲)

speex源碼: http://www.speex.org

                     http://zxy15914507674.gitee.io/shared_resource_name/speex-1.2beta3-win32.zip (這個源碼我改動過,有點問題)

上面的鏈接被碼雲廢掉了,直接去我的倉庫下載:https://gitee.com/zxy15914507674/shared_resource_name,找打對應的

speex-1.2beta3-win32.zip 下載即可

 

國內在多人語音聊天中,能使用C#進行二次開發的公司有傲瑞科技http://www.oraycn.com/Download_Free.aspx,但是要收錢,而且說好的提供源碼的,屁都不是,核心的全部封裝成dll了,網上的文章只不過是爲了宣傳它的公司的產品罷了

 

 

 

最後考慮使用python的librosa模塊實現,採用WCF和XML-RPC的方式進行調用(本文並沒有實現)

 

本文大部分內容轉自:https://blog.csdn.net/Boogyman/article/details/103264392

測試環境:

window server 2012

Anaconda

 

步驟:

下面代碼中的測試文件可以從這裏下載:http://zxy15914507674.gitee.io/shared_resource_name/librosa資源文件.rar

上面的鏈接被碼雲廢掉了,直接去我的倉庫下載:https://gitee.com/zxy15914507674/shared_resource_name,找打對應的

libtosa資源文件.rar 下載即可

 

1 安裝librosa模塊,參考:https://blog.csdn.net/zzc15806/article/details/79603994

由於我使用的的Anaconda,所以使用命令

conda install -c conda-forge librosa

進行安裝

2  當報NoBackendError這樣的錯誤時,還需要安裝ffmpeg模塊,輸入下面的命令

conda install ffmpeg -c conda-forge

3   輸入代碼如下:

import numpy as np
import librosa
import scipy
from scipy import io


class SpecSub(object):

    def __init__(self, input_wav):

        self.data, self.fs = librosa.load(input_wav, sr=None, mono=True)
        self.noise_frame = 3  # 使用前三幀作爲噪聲估計
        self.frame_duration = 200/1000  # 200ms 幀長
        self.frame_length = np.int(self.fs * self.frame_duration)
        self.fft = 2048  # 2048點fft

    def main(self):
        noise_data = self.get_noise_data()

        oris = librosa.stft(self.data, n_fft=self.fft)  # Short-time Fourier transform,
        mag = np.abs(oris)  # get magnitude
        angle = np.angle(oris)  # get phase

        ns = librosa.stft(noise_data, n_fft=self.fft)
        mag_noise = np.abs(ns)
        mns = np.mean(mag_noise, axis=1)  # get mean

        sa = mag - mns.reshape((mns.shape[0], 1))  # reshape for broadcast to subtract
        sa0 = sa * np.exp(1.0j * angle)  # apply phase information
        y = librosa.istft(sa0)  # back to time domain signal

        scipy.io.wavfile.write('./output.wav', self.fs, (y * 32768).astype(np.int16))  # save signed 16-bit WAV format

    def get_noise_data(self):
        noise_data = self.data[0:self.frame_length]
        for i in range(1, self.noise_frame):
            noise_data = noise_data + self.data[i*self.frame_length:(i+1)*self.frame_length]
        noise_data = noise_data / self.noise_frame

        return noise_data


ss = SpecSub('./test.wav')
ss.main()
print('done')
    

輸出的效果還算不錯,但發現1M不到的音頻文件降噪後變成3M多的音頻文件,在實時語音聊天中,這明顯不符合要求,而且該模塊讀入的是待處理的音頻文件,而不是字節流,這意味着C#發送過來的音頻數據(字節數組形式的數組)只能還原爲音頻文件才能給python進行處理,這明顯是不行的,不知你有什麼好的辦法,請多多指教。

 

 

2020年5月10日補充:

使用C#封裝C++語言實現的Speex

Demo源碼下載:http://zxy15914507674.gitee.io/shared_resource_name/speexdsp-1.2rc3.rar

上面的鏈接被碼雲廢掉了,直接去我的倉庫下載:https://gitee.com/zxy15914507674/shared_resource_name,找打對應的

speexdsp-1.2rc3.rar 下載即可

Demo源碼目錄結構:

具體封裝的細節請參考我寫的博客:https://blog.csdn.net/zxy13826134783/article/details/105958311

 

注意:下面的代碼只適用於標準的wav格式的音頻,那些把mp3後綴改爲wav後綴的音頻文件也能播放的不行,因爲文件頭不一樣

C++核心代碼如下:

// SpeexWinProj.cpp : 定義控制檯應用程序的入口點。
//

#include "stdafx.h"
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif

#include "speex/speex_jitter.h"
#include "speex/speex_echo.h"
#include "speex/speex_preprocess.h"
#include "speex/speex_resampler.h"
#include <stdio.h>
#include <math.h>
#include <stdlib.h>

#include <string.h>
#include <assert.h> 


#define HEADLEN 44
#define SAMPLE_RATE   (48000)  
#define SAMPLES_PER_FRAME  (1024)
#define FRAME_SIZE   (SAMPLES_PER_FRAME * 1000/ SAMPLE_RATE)
#define FRAME_BYTES  (SAMPLES_PER_FRAME)

union jbpdata {
	unsigned int idx;
	unsigned char data[4];
};

void synthIn(JitterBufferPacket *in, int idx, int span) {
	union jbpdata d;
	d.idx = idx;

	in->data = (char*)d.data;
	in->len = sizeof(d);
	in->timestamp = idx * 10;
	in->span = span * 10;
	in->sequence = idx;
	in->user_data = 0;
}

void jitterFill(JitterBuffer *jb) {
	char buffer[65536];
	JitterBufferPacket in, out;
	int i;

	out.data = buffer;

	jitter_buffer_reset(jb);

	for(i=0;i<100;++i) {
		synthIn(&in, i, 1);
		jitter_buffer_put(jb, &in);

		out.len = 65536;
		if (jitter_buffer_get(jb, &out, 10, NULL) != JITTER_BUFFER_OK) {
			printf("Fill test failed iteration %d\n", i);
		}
		if (out.timestamp != i * 10) {
			printf("Fill test expected %d got %d\n", i*10, out.timestamp);
		}
		jitter_buffer_tick(jb);
	}
}

void TestJitter()
{
	char buffer[65536];
	JitterBufferPacket in, out;
	int i;

	JitterBuffer *jb = jitter_buffer_init(10);

	out.data = buffer;

	/* Frozen sender case */
	jitterFill(jb);
	for(i=0;i<100;++i) {
		out.len = 65536;
		jitter_buffer_get(jb, &out, 10, NULL);
		jitter_buffer_tick(jb);
	}
	synthIn(&in, 100, 1);
	jitter_buffer_put(jb, &in);
	out.len = 65536;
	if (jitter_buffer_get(jb, &out, 10, NULL) != JITTER_BUFFER_OK) {
		printf("Failed frozen sender resynchronize\n");
	} else {
		printf("Frozen sender: Jitter %d\n", out.timestamp - 100*10);
	}
	return ;
}

///降噪的方法,第一個參數爲需要進行降噪的文件名,第二個參數是降噪完畢後輸出的文件名
void TestNoise(char *pSrcFile,char *pDenoiseFile)
{
	size_t n = 0;
    FILE *inFile, *outFile;
    fopen_s(&inFile, pSrcFile, "rb");
    fopen_s(&outFile,pDenoiseFile, "wb");

    char *headBuf = (char*)malloc(HEADLEN);
    char *dataBuf = (char*)malloc(FRAME_BYTES * 2 );
    memset(headBuf, 0, HEADLEN);
    memset(dataBuf, 0, FRAME_BYTES);
    assert(headBuf != NULL);
    assert(dataBuf != NULL);

    SpeexPreprocessState *state = speex_preprocess_state_init(1024, SAMPLE_RATE);
    int denoise = 1;
    int noiseSuppress = -25;
    speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_DENOISE, &denoise);
    speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_NOISE_SUPPRESS, &noiseSuppress);
    
    int i;
    i = 0;
    speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_AGC, &i);
    i = 80000;
    speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_AGC_LEVEL, &i);
    i = 0;
    speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_DEREVERB, &i);
    float f = 0;
    speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_DEREVERB_DECAY, &f);
    f = 0;
    speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_DEREVERB_LEVEL, &f);

    bool flag = true;

    while (1)
    {
		//直接讀入音頻文件前44個字節並寫入目標文件,因爲前44個字節表示頭文件,不能進行降噪,不然會打不開
        if (flag == true)
        {
            flag = false;
            n = fread(headBuf, 1, HEADLEN, inFile);
            if (n == 0)
                break;
            fwrite(headBuf, 1, HEADLEN, outFile);
        }
        else
        {
			//每次讀入1024個字節
            n = fread(dataBuf, 1, SAMPLES_PER_FRAME, inFile);
            if (n == 0)
                break;
			//對讀入的1024個字節進行降噪
            speex_preprocess_run(state, (spx_int16_t*)(dataBuf));
			//寫入降噪後的1024個字節
            fwrite(dataBuf, 1, SAMPLES_PER_FRAME, outFile);
        }
    }

    free(headBuf);
    free(dataBuf);
    fclose(inFile);
    fclose(outFile);
    speex_preprocess_state_destroy(state);
   
}

//本來是研究傳入和傳出都是字節數組的,這樣就可以遠程傳輸了,奈何對C++不熟悉,失敗了
void TestNoise_Buffer(char *input,char *output,int fileSize)
{
//	SpeexPreprocessState *state = speex_preprocess_state_init(1024, SAMPLE_RATE);
//    int denoise = 1;
//    int noiseSuppress = -25;
//    speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_DENOISE, &denoise);
//    speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_NOISE_SUPPRESS, &noiseSuppress);
//    
//    int i;
//    i = 0;
//    speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_AGC, &i);
//    i = 80000;
//    speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_AGC_LEVEL, &i);
//    i = 0;
//    speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_DEREVERB, &i);
//    float f = 0;
//    speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_DEREVERB_DECAY, &f);
//    f = 0;
//    speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_DEREVERB_LEVEL, &f);
//
//	char *dataBuf = (char*)malloc(FRAME_BYTES * 2 );
//	memset(dataBuf, 0, FRAME_BYTES);
//
//	int numCount=0;
//
//
//	while (numCount<fileSize-1024)
//	{
//		for(int i=0;i<1024;i++)
//		{
//			*(dataBuf++)=*(input++);
//		}
//		speex_preprocess_run(state,(spx_int16_t*)dataBuf);
//		for(int i=0;i<1024;i++)
//		{
//			*(output++)=*(dataBuf++);
//		}
//		numCount=numCount+1024;
//		
//	}
//	
//	free(dataBuf);
//
//	speex_preprocess_state_destroy(state);
//
//	
}


void _TestEcho(char *pSrcFile,char *pEchoFile,char *pAudioFile)
{
#define NN_ECHO 128
#define TAIL 1024

	FILE *echo_fd, *ref_fd, *e_fd;
	short echo_buf[NN_ECHO], ref_buf[NN_ECHO], e_buf[NN_ECHO];
	SpeexEchoState *st;
	SpeexPreprocessState *den;
	int sampleRate = 8000;

	echo_fd = fopen(pSrcFile, "rb");
	ref_fd  = fopen(pEchoFile,  "rb");
	e_fd    = fopen(pAudioFile, "wb");

	st = speex_echo_state_init(NN_ECHO, TAIL);
	den = speex_preprocess_state_init(NN_ECHO, sampleRate);
	speex_echo_ctl(st, SPEEX_ECHO_SET_SAMPLING_RATE, &sampleRate);
	speex_preprocess_ctl(den, SPEEX_PREPROCESS_SET_ECHO_STATE, st);

	while (!feof(ref_fd) && !feof(echo_fd))
	{
		fread(ref_buf, sizeof(short), NN_ECHO, ref_fd);
		fread(echo_buf, sizeof(short), NN_ECHO, echo_fd);
		speex_echo_cancellation(st, ref_buf, echo_buf, e_buf);
		speex_preprocess_run(den, e_buf);
		fwrite(e_buf, sizeof(short), NN_ECHO, e_fd);
	}
	speex_echo_state_destroy(st);
	speex_preprocess_state_destroy(den);
	fclose(e_fd);
	fclose(echo_fd);
	fclose(ref_fd);
}

int TestResampler()
{

#define NNTR 256
   spx_uint32_t i;
   short *in;
   short *out;
   float *fin, *fout;
   int count = 0;
   SpeexResamplerState *st = speex_resampler_init(1, 8000, 12000, 10, NULL);
   speex_resampler_set_rate(st, 96000, 44100);
   speex_resampler_skip_zeros(st);
   
   in = (short*)malloc(NNTR*sizeof(short));
   out = (short*)malloc(2*NNTR*sizeof(short));
   fin = (float*)malloc(NNTR*sizeof(float));
   fout = (float*)malloc(2*NNTR*sizeof(float));
   while (1)
   {
      spx_uint32_t in_len;
      spx_uint32_t out_len;
      fread(in, sizeof(short), NNTR, stdin);
      if (feof(stdin))
         break;
      for (i=0;i<NNTR;i++)
         fin[i]=in[i];
      in_len = NNTR;
      out_len = 2*NNTR;
      /*if (count==2)
         speex_resampler_set_quality(st, 10);*/
      speex_resampler_process_float(st, 0, fin, &in_len, fout, &out_len);
      for (i=0;i<out_len;i++)
         out[i]=floor(.5+fout[i]);
      /*speex_warning_int("writing", out_len);*/
      fwrite(out, sizeof(short), out_len, stdout);
      count++;
   }
   speex_resampler_destroy(st);
   free(in);
   free(out);
   free(fin);
   free(fout);
   return 0;
}


其中的void TestNoise(char *pSrcFile,char *pDenoiseFile)方法是降噪的方法,也是需要封裝的方法,其它方法沒有研究

 

C#代碼:

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Runtime.InteropServices;
using System.Text;
using System.Threading.Tasks;

namespace CSharpCall
{
    class Program
    {
        //加載dll庫,參數爲dll庫的名稱,返回句柄
        [DllImport("kernel32")]
        public static extern IntPtr LoadLibrary(string lpFileName);
        //通過句柄釋放dll庫
        [DllImport("Kernel32")]
        public static extern bool FreeLibrary(IntPtr handle);
        //根據函數名輸出庫函數,返回函數的指針
        [DllImport("Kernel32")]
        public static extern IntPtr GetProcAddress(IntPtr handle, String funcname);

        [UnmanagedFunctionPointerAttribute(CallingConvention.Cdecl)]
        unsafe public delegate void TestNoise_delegate(char* pSrcFile, char* pDenoiseFile);

        //[UnmanagedFunctionPointerAttribute(CallingConvention.Cdecl)]
       // unsafe public delegate void TestNoise_Buffer_delegate(char *input,char *ouput,int fileSize);
        
        unsafe static void Main(string[] args)
        {
            //加載c++對應的dll庫
            IntPtr dll = LoadLibrary("SpeexWinProj.dll");



            IntPtr TestNoise_func = GetProcAddress(dll, "TestNoise");
            //根據庫函數TestNoise_func獲取委託實例
            TestNoise_delegate TestNoise = (TestNoise_delegate)Marshal.GetDelegateForFunctionPointer(TestNoise_func, typeof(TestNoise_delegate));

            string fileNameInput = "test1.wav";

            char* fileName_Iniput = (char*)System.Runtime.InteropServices.Marshal.StringToHGlobalAnsi(fileNameInput).ToPointer();

            string fileNameOutPut = "out.wav";

            char* fileName_Output = (char*)System.Runtime.InteropServices.Marshal.StringToHGlobalAnsi(fileNameOutPut).ToPointer();

            TestNoise(fileName_Iniput, fileName_Output);
            Console.WriteLine("轉換完成");




            //IntPtr TestNoise_Buffer_func = GetProcAddress(dll, "TestNoise_Buffer");
            ////根據庫函數TestNoise_func獲取委託實例
            //TestNoise_Buffer_delegate TestNoise_Buffer = (TestNoise_Buffer_delegate)Marshal.GetDelegateForFunctionPointer(TestNoise_Buffer_func, typeof(TestNoise_Buffer_delegate));

            //FileStream fs = new FileStream("test1.wav", FileMode.Open);
            //byte []fileBuffer=new byte[fs.Length];
            
            //fs.Read(fileBuffer, 0, fileBuffer.Length);
            
            //fs.Close();

            
            //int sampleRate = 1024;
            //byte[] outbuffer = new byte[fileBuffer.Length];
           
            //for (int i = 0; i < 44; i++)
            //{
            //    outbuffer[i]=fileBuffer[i];
            //}

            //byte[] input = new byte[fileBuffer.Length - 44];
            //byte[] output = new byte[fileBuffer.Length];
            //for (int i = 0; i <input.Length; i++)
            //{
            //    input[i] = fileBuffer[i+44];
            //}
            //char* inPut = (char*)System.Runtime.InteropServices.Marshal.UnsafeAddrOfPinnedArrayElement(input,0).ToPointer();
            //char* outPut = (char*)System.Runtime.InteropServices.Marshal.UnsafeAddrOfPinnedArrayElement(output, 0).ToPointer();
            //TestNoise_Buffer(inPut, outPut, input.Length);
            
            ////[MarshalAs(UnmanagedType.LPArray)] byte[]
           
            //for (int i = 0; i < output.Length; i++)
            //{

            //    fileBuffer[i + 44] = Convert.ToByte(*(outPut++)) < 255 ? Convert.ToByte(*(outPut++)) : Convert.ToByte(254);
            //}      
            
           
            

            //FileStream fw = new FileStream("out1.wav", FileMode.Create);
            //fw.Write(outbuffer, 0, outbuffer.Length);
            //fw.Close();
            //Console.WriteLine("轉換完成");
           
            Console.ReadKey();
        }
    }
}

 

使用Speex處理音頻的字節數組,可以參考:https://blog.csdn.net/zxy13826134783/article/details/106297974

其中對數據類型轉型的核心方法:

1  把字符串轉換爲指針

 string fileNameInput = "test1.wav";

 char* fileName_Iniput = (char*)System.Runtime.InteropServices.Marshal.StringToHGlobalAnsi(fileNameInput).ToPointer();

2  把字節數組轉爲指針

byte[] input = new byte[1024];

char* inPut = (char*)System.Runtime.InteropServices.Marshal.UnsafeAddrOfPinnedArrayElement(input,0).ToPointer();

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章