使用谱减法对语音信号进行降噪(librosa),C#使用Speex

基本思想是把时域信号转换到频域进行处理,处理完毕后再转回时域信号,具体算法可以参考:

https://blog.csdn.net/godloveyuxu/article/details/69225790

 

2020年5月10日补充:新增C#使用Speex降噪的代码,在文章最后

 

使用C#对语音信号降噪处理比较困难,查阅资料知道可以使用Webrtc或者speex进行降噪,不过核心思想都是把C++转成dll库供C#调用,由于对C++不是很熟悉,折腾了好久都没有实现,如果想了解一下,下面的文章可以参考一下:

Webrtc:  https://www.cnblogs.com/mod109/p/5767867.html

              https://www.cnblogs.com/Hard/p/csharp-use-webrtc-noisesuppression.html

speex::https://www.cnblogs.com/mod109/p/5744468.html

              https://blog.csdn.net/u012931018/article/details/16927583

              https://www.cnblogs.com/zhuweisky/archive/2010/09/16/1827896.html(这个是傲瑞科技的马甲)

speex源码: http://www.speex.org

                     http://zxy15914507674.gitee.io/shared_resource_name/speex-1.2beta3-win32.zip (这个源码我改动过,有点问题)

上面的链接被码云废掉了,直接去我的仓库下载:https://gitee.com/zxy15914507674/shared_resource_name,找打对应的

speex-1.2beta3-win32.zip 下载即可

 

国内在多人语音聊天中,能使用C#进行二次开发的公司有傲瑞科技http://www.oraycn.com/Download_Free.aspx,但是要收钱,而且说好的提供源码的,屁都不是,核心的全部封装成dll了,网上的文章只不过是为了宣传它的公司的产品罢了

 

 

 

最后考虑使用python的librosa模块实现,采用WCF和XML-RPC的方式进行调用(本文并没有实现)

 

本文大部分内容转自:https://blog.csdn.net/Boogyman/article/details/103264392

测试环境:

window server 2012

Anaconda

 

步骤:

下面代码中的测试文件可以从这里下载:http://zxy15914507674.gitee.io/shared_resource_name/librosa资源文件.rar

上面的链接被码云废掉了,直接去我的仓库下载:https://gitee.com/zxy15914507674/shared_resource_name,找打对应的

libtosa资源文件.rar 下载即可

 

1 安装librosa模块,参考:https://blog.csdn.net/zzc15806/article/details/79603994

由于我使用的的Anaconda,所以使用命令

conda install -c conda-forge librosa

进行安装

2  当报NoBackendError这样的错误时,还需要安装ffmpeg模块,输入下面的命令

conda install ffmpeg -c conda-forge

3   输入代码如下:

import numpy as np
import librosa
import scipy
from scipy import io


class SpecSub(object):

    def __init__(self, input_wav):

        self.data, self.fs = librosa.load(input_wav, sr=None, mono=True)
        self.noise_frame = 3  # 使用前三帧作为噪声估计
        self.frame_duration = 200/1000  # 200ms 帧长
        self.frame_length = np.int(self.fs * self.frame_duration)
        self.fft = 2048  # 2048点fft

    def main(self):
        noise_data = self.get_noise_data()

        oris = librosa.stft(self.data, n_fft=self.fft)  # Short-time Fourier transform,
        mag = np.abs(oris)  # get magnitude
        angle = np.angle(oris)  # get phase

        ns = librosa.stft(noise_data, n_fft=self.fft)
        mag_noise = np.abs(ns)
        mns = np.mean(mag_noise, axis=1)  # get mean

        sa = mag - mns.reshape((mns.shape[0], 1))  # reshape for broadcast to subtract
        sa0 = sa * np.exp(1.0j * angle)  # apply phase information
        y = librosa.istft(sa0)  # back to time domain signal

        scipy.io.wavfile.write('./output.wav', self.fs, (y * 32768).astype(np.int16))  # save signed 16-bit WAV format

    def get_noise_data(self):
        noise_data = self.data[0:self.frame_length]
        for i in range(1, self.noise_frame):
            noise_data = noise_data + self.data[i*self.frame_length:(i+1)*self.frame_length]
        noise_data = noise_data / self.noise_frame

        return noise_data


ss = SpecSub('./test.wav')
ss.main()
print('done')
    

输出的效果还算不错,但发现1M不到的音频文件降噪后变成3M多的音频文件,在实时语音聊天中,这明显不符合要求,而且该模块读入的是待处理的音频文件,而不是字节流,这意味着C#发送过来的音频数据(字节数组形式的数组)只能还原为音频文件才能给python进行处理,这明显是不行的,不知你有什么好的办法,请多多指教。

 

 

2020年5月10日补充:

使用C#封装C++语言实现的Speex

Demo源码下载:http://zxy15914507674.gitee.io/shared_resource_name/speexdsp-1.2rc3.rar

上面的链接被码云废掉了,直接去我的仓库下载:https://gitee.com/zxy15914507674/shared_resource_name,找打对应的

speexdsp-1.2rc3.rar 下载即可

Demo源码目录结构:

具体封装的细节请参考我写的博客:https://blog.csdn.net/zxy13826134783/article/details/105958311

 

注意:下面的代码只适用于标准的wav格式的音频,那些把mp3后缀改为wav后缀的音频文件也能播放的不行,因为文件头不一样

C++核心代码如下:

// SpeexWinProj.cpp : 定义控制台应用程序的入口点。
//

#include "stdafx.h"
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif

#include "speex/speex_jitter.h"
#include "speex/speex_echo.h"
#include "speex/speex_preprocess.h"
#include "speex/speex_resampler.h"
#include <stdio.h>
#include <math.h>
#include <stdlib.h>

#include <string.h>
#include <assert.h> 


#define HEADLEN 44
#define SAMPLE_RATE   (48000)  
#define SAMPLES_PER_FRAME  (1024)
#define FRAME_SIZE   (SAMPLES_PER_FRAME * 1000/ SAMPLE_RATE)
#define FRAME_BYTES  (SAMPLES_PER_FRAME)

union jbpdata {
	unsigned int idx;
	unsigned char data[4];
};

void synthIn(JitterBufferPacket *in, int idx, int span) {
	union jbpdata d;
	d.idx = idx;

	in->data = (char*)d.data;
	in->len = sizeof(d);
	in->timestamp = idx * 10;
	in->span = span * 10;
	in->sequence = idx;
	in->user_data = 0;
}

void jitterFill(JitterBuffer *jb) {
	char buffer[65536];
	JitterBufferPacket in, out;
	int i;

	out.data = buffer;

	jitter_buffer_reset(jb);

	for(i=0;i<100;++i) {
		synthIn(&in, i, 1);
		jitter_buffer_put(jb, &in);

		out.len = 65536;
		if (jitter_buffer_get(jb, &out, 10, NULL) != JITTER_BUFFER_OK) {
			printf("Fill test failed iteration %d\n", i);
		}
		if (out.timestamp != i * 10) {
			printf("Fill test expected %d got %d\n", i*10, out.timestamp);
		}
		jitter_buffer_tick(jb);
	}
}

void TestJitter()
{
	char buffer[65536];
	JitterBufferPacket in, out;
	int i;

	JitterBuffer *jb = jitter_buffer_init(10);

	out.data = buffer;

	/* Frozen sender case */
	jitterFill(jb);
	for(i=0;i<100;++i) {
		out.len = 65536;
		jitter_buffer_get(jb, &out, 10, NULL);
		jitter_buffer_tick(jb);
	}
	synthIn(&in, 100, 1);
	jitter_buffer_put(jb, &in);
	out.len = 65536;
	if (jitter_buffer_get(jb, &out, 10, NULL) != JITTER_BUFFER_OK) {
		printf("Failed frozen sender resynchronize\n");
	} else {
		printf("Frozen sender: Jitter %d\n", out.timestamp - 100*10);
	}
	return ;
}

///降噪的方法,第一个参数为需要进行降噪的文件名,第二个参数是降噪完毕后输出的文件名
void TestNoise(char *pSrcFile,char *pDenoiseFile)
{
	size_t n = 0;
    FILE *inFile, *outFile;
    fopen_s(&inFile, pSrcFile, "rb");
    fopen_s(&outFile,pDenoiseFile, "wb");

    char *headBuf = (char*)malloc(HEADLEN);
    char *dataBuf = (char*)malloc(FRAME_BYTES * 2 );
    memset(headBuf, 0, HEADLEN);
    memset(dataBuf, 0, FRAME_BYTES);
    assert(headBuf != NULL);
    assert(dataBuf != NULL);

    SpeexPreprocessState *state = speex_preprocess_state_init(1024, SAMPLE_RATE);
    int denoise = 1;
    int noiseSuppress = -25;
    speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_DENOISE, &denoise);
    speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_NOISE_SUPPRESS, &noiseSuppress);
    
    int i;
    i = 0;
    speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_AGC, &i);
    i = 80000;
    speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_AGC_LEVEL, &i);
    i = 0;
    speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_DEREVERB, &i);
    float f = 0;
    speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_DEREVERB_DECAY, &f);
    f = 0;
    speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_DEREVERB_LEVEL, &f);

    bool flag = true;

    while (1)
    {
		//直接读入音频文件前44个字节并写入目标文件,因为前44个字节表示头文件,不能进行降噪,不然会打不开
        if (flag == true)
        {
            flag = false;
            n = fread(headBuf, 1, HEADLEN, inFile);
            if (n == 0)
                break;
            fwrite(headBuf, 1, HEADLEN, outFile);
        }
        else
        {
			//每次读入1024个字节
            n = fread(dataBuf, 1, SAMPLES_PER_FRAME, inFile);
            if (n == 0)
                break;
			//对读入的1024个字节进行降噪
            speex_preprocess_run(state, (spx_int16_t*)(dataBuf));
			//写入降噪后的1024个字节
            fwrite(dataBuf, 1, SAMPLES_PER_FRAME, outFile);
        }
    }

    free(headBuf);
    free(dataBuf);
    fclose(inFile);
    fclose(outFile);
    speex_preprocess_state_destroy(state);
   
}

//本来是研究传入和传出都是字节数组的,这样就可以远程传输了,奈何对C++不熟悉,失败了
void TestNoise_Buffer(char *input,char *output,int fileSize)
{
//	SpeexPreprocessState *state = speex_preprocess_state_init(1024, SAMPLE_RATE);
//    int denoise = 1;
//    int noiseSuppress = -25;
//    speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_DENOISE, &denoise);
//    speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_NOISE_SUPPRESS, &noiseSuppress);
//    
//    int i;
//    i = 0;
//    speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_AGC, &i);
//    i = 80000;
//    speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_AGC_LEVEL, &i);
//    i = 0;
//    speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_DEREVERB, &i);
//    float f = 0;
//    speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_DEREVERB_DECAY, &f);
//    f = 0;
//    speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_DEREVERB_LEVEL, &f);
//
//	char *dataBuf = (char*)malloc(FRAME_BYTES * 2 );
//	memset(dataBuf, 0, FRAME_BYTES);
//
//	int numCount=0;
//
//
//	while (numCount<fileSize-1024)
//	{
//		for(int i=0;i<1024;i++)
//		{
//			*(dataBuf++)=*(input++);
//		}
//		speex_preprocess_run(state,(spx_int16_t*)dataBuf);
//		for(int i=0;i<1024;i++)
//		{
//			*(output++)=*(dataBuf++);
//		}
//		numCount=numCount+1024;
//		
//	}
//	
//	free(dataBuf);
//
//	speex_preprocess_state_destroy(state);
//
//	
}


void _TestEcho(char *pSrcFile,char *pEchoFile,char *pAudioFile)
{
#define NN_ECHO 128
#define TAIL 1024

	FILE *echo_fd, *ref_fd, *e_fd;
	short echo_buf[NN_ECHO], ref_buf[NN_ECHO], e_buf[NN_ECHO];
	SpeexEchoState *st;
	SpeexPreprocessState *den;
	int sampleRate = 8000;

	echo_fd = fopen(pSrcFile, "rb");
	ref_fd  = fopen(pEchoFile,  "rb");
	e_fd    = fopen(pAudioFile, "wb");

	st = speex_echo_state_init(NN_ECHO, TAIL);
	den = speex_preprocess_state_init(NN_ECHO, sampleRate);
	speex_echo_ctl(st, SPEEX_ECHO_SET_SAMPLING_RATE, &sampleRate);
	speex_preprocess_ctl(den, SPEEX_PREPROCESS_SET_ECHO_STATE, st);

	while (!feof(ref_fd) && !feof(echo_fd))
	{
		fread(ref_buf, sizeof(short), NN_ECHO, ref_fd);
		fread(echo_buf, sizeof(short), NN_ECHO, echo_fd);
		speex_echo_cancellation(st, ref_buf, echo_buf, e_buf);
		speex_preprocess_run(den, e_buf);
		fwrite(e_buf, sizeof(short), NN_ECHO, e_fd);
	}
	speex_echo_state_destroy(st);
	speex_preprocess_state_destroy(den);
	fclose(e_fd);
	fclose(echo_fd);
	fclose(ref_fd);
}

int TestResampler()
{

#define NNTR 256
   spx_uint32_t i;
   short *in;
   short *out;
   float *fin, *fout;
   int count = 0;
   SpeexResamplerState *st = speex_resampler_init(1, 8000, 12000, 10, NULL);
   speex_resampler_set_rate(st, 96000, 44100);
   speex_resampler_skip_zeros(st);
   
   in = (short*)malloc(NNTR*sizeof(short));
   out = (short*)malloc(2*NNTR*sizeof(short));
   fin = (float*)malloc(NNTR*sizeof(float));
   fout = (float*)malloc(2*NNTR*sizeof(float));
   while (1)
   {
      spx_uint32_t in_len;
      spx_uint32_t out_len;
      fread(in, sizeof(short), NNTR, stdin);
      if (feof(stdin))
         break;
      for (i=0;i<NNTR;i++)
         fin[i]=in[i];
      in_len = NNTR;
      out_len = 2*NNTR;
      /*if (count==2)
         speex_resampler_set_quality(st, 10);*/
      speex_resampler_process_float(st, 0, fin, &in_len, fout, &out_len);
      for (i=0;i<out_len;i++)
         out[i]=floor(.5+fout[i]);
      /*speex_warning_int("writing", out_len);*/
      fwrite(out, sizeof(short), out_len, stdout);
      count++;
   }
   speex_resampler_destroy(st);
   free(in);
   free(out);
   free(fin);
   free(fout);
   return 0;
}


其中的void TestNoise(char *pSrcFile,char *pDenoiseFile)方法是降噪的方法,也是需要封装的方法,其它方法没有研究

 

C#代码:

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Runtime.InteropServices;
using System.Text;
using System.Threading.Tasks;

namespace CSharpCall
{
    class Program
    {
        //加载dll库,参数为dll库的名称,返回句柄
        [DllImport("kernel32")]
        public static extern IntPtr LoadLibrary(string lpFileName);
        //通过句柄释放dll库
        [DllImport("Kernel32")]
        public static extern bool FreeLibrary(IntPtr handle);
        //根据函数名输出库函数,返回函数的指针
        [DllImport("Kernel32")]
        public static extern IntPtr GetProcAddress(IntPtr handle, String funcname);

        [UnmanagedFunctionPointerAttribute(CallingConvention.Cdecl)]
        unsafe public delegate void TestNoise_delegate(char* pSrcFile, char* pDenoiseFile);

        //[UnmanagedFunctionPointerAttribute(CallingConvention.Cdecl)]
       // unsafe public delegate void TestNoise_Buffer_delegate(char *input,char *ouput,int fileSize);
        
        unsafe static void Main(string[] args)
        {
            //加载c++对应的dll库
            IntPtr dll = LoadLibrary("SpeexWinProj.dll");



            IntPtr TestNoise_func = GetProcAddress(dll, "TestNoise");
            //根据库函数TestNoise_func获取委托实例
            TestNoise_delegate TestNoise = (TestNoise_delegate)Marshal.GetDelegateForFunctionPointer(TestNoise_func, typeof(TestNoise_delegate));

            string fileNameInput = "test1.wav";

            char* fileName_Iniput = (char*)System.Runtime.InteropServices.Marshal.StringToHGlobalAnsi(fileNameInput).ToPointer();

            string fileNameOutPut = "out.wav";

            char* fileName_Output = (char*)System.Runtime.InteropServices.Marshal.StringToHGlobalAnsi(fileNameOutPut).ToPointer();

            TestNoise(fileName_Iniput, fileName_Output);
            Console.WriteLine("转换完成");




            //IntPtr TestNoise_Buffer_func = GetProcAddress(dll, "TestNoise_Buffer");
            ////根据库函数TestNoise_func获取委托实例
            //TestNoise_Buffer_delegate TestNoise_Buffer = (TestNoise_Buffer_delegate)Marshal.GetDelegateForFunctionPointer(TestNoise_Buffer_func, typeof(TestNoise_Buffer_delegate));

            //FileStream fs = new FileStream("test1.wav", FileMode.Open);
            //byte []fileBuffer=new byte[fs.Length];
            
            //fs.Read(fileBuffer, 0, fileBuffer.Length);
            
            //fs.Close();

            
            //int sampleRate = 1024;
            //byte[] outbuffer = new byte[fileBuffer.Length];
           
            //for (int i = 0; i < 44; i++)
            //{
            //    outbuffer[i]=fileBuffer[i];
            //}

            //byte[] input = new byte[fileBuffer.Length - 44];
            //byte[] output = new byte[fileBuffer.Length];
            //for (int i = 0; i <input.Length; i++)
            //{
            //    input[i] = fileBuffer[i+44];
            //}
            //char* inPut = (char*)System.Runtime.InteropServices.Marshal.UnsafeAddrOfPinnedArrayElement(input,0).ToPointer();
            //char* outPut = (char*)System.Runtime.InteropServices.Marshal.UnsafeAddrOfPinnedArrayElement(output, 0).ToPointer();
            //TestNoise_Buffer(inPut, outPut, input.Length);
            
            ////[MarshalAs(UnmanagedType.LPArray)] byte[]
           
            //for (int i = 0; i < output.Length; i++)
            //{

            //    fileBuffer[i + 44] = Convert.ToByte(*(outPut++)) < 255 ? Convert.ToByte(*(outPut++)) : Convert.ToByte(254);
            //}      
            
           
            

            //FileStream fw = new FileStream("out1.wav", FileMode.Create);
            //fw.Write(outbuffer, 0, outbuffer.Length);
            //fw.Close();
            //Console.WriteLine("转换完成");
           
            Console.ReadKey();
        }
    }
}

 

使用Speex处理音频的字节数组,可以参考:https://blog.csdn.net/zxy13826134783/article/details/106297974

其中对数据类型转型的核心方法:

1  把字符串转换为指针

 string fileNameInput = "test1.wav";

 char* fileName_Iniput = (char*)System.Runtime.InteropServices.Marshal.StringToHGlobalAnsi(fileNameInput).ToPointer();

2  把字节数组转为指针

byte[] input = new byte[1024];

char* inPut = (char*)System.Runtime.InteropServices.Marshal.UnsafeAddrOfPinnedArrayElement(input,0).ToPointer();

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章