基本思想是把時域信號轉換到頻域進行處理,處理完畢後再轉回時域信號,具體算法可以參考:
https://blog.csdn.net/godloveyuxu/article/details/69225790
2020年5月10日補充:新增C#使用Speex降噪的代碼,在文章最後
使用C#對語音信號降噪處理比較困難,查閱資料知道可以使用Webrtc或者speex進行降噪,不過核心思想都是把C++轉成dll庫供C#調用,由於對C++不是很熟悉,折騰了好久都沒有實現,如果想了解一下,下面的文章可以參考一下:
Webrtc: https://www.cnblogs.com/mod109/p/5767867.html
https://www.cnblogs.com/Hard/p/csharp-use-webrtc-noisesuppression.html
speex::https://www.cnblogs.com/mod109/p/5744468.html
https://blog.csdn.net/u012931018/article/details/16927583
https://www.cnblogs.com/zhuweisky/archive/2010/09/16/1827896.html(這個是傲瑞科技的馬甲)
speex源碼: http://www.speex.org
http://zxy15914507674.gitee.io/shared_resource_name/speex-1.2beta3-win32.zip (這個源碼我改動過,有點問題)
上面的鏈接被碼雲廢掉了,直接去我的倉庫下載:https://gitee.com/zxy15914507674/shared_resource_name,找打對應的
speex-1.2beta3-win32.zip 下載即可
國內在多人語音聊天中,能使用C#進行二次開發的公司有傲瑞科技http://www.oraycn.com/Download_Free.aspx,但是要收錢,而且說好的提供源碼的,屁都不是,核心的全部封裝成dll了,網上的文章只不過是爲了宣傳它的公司的產品罷了
最後考慮使用python的librosa模塊實現,採用WCF和XML-RPC的方式進行調用(本文並沒有實現)
本文大部分內容轉自:https://blog.csdn.net/Boogyman/article/details/103264392
測試環境:
window server 2012
Anaconda
步驟:
下面代碼中的測試文件可以從這裏下載:http://zxy15914507674.gitee.io/shared_resource_name/librosa資源文件.rar
上面的鏈接被碼雲廢掉了,直接去我的倉庫下載:https://gitee.com/zxy15914507674/shared_resource_name,找打對應的
libtosa資源文件.rar 下載即可
1 安裝librosa模塊,參考:https://blog.csdn.net/zzc15806/article/details/79603994
由於我使用的的Anaconda,所以使用命令
conda install -c conda-forge librosa
進行安裝
2 當報NoBackendError這樣的錯誤時,還需要安裝ffmpeg模塊,輸入下面的命令
conda install ffmpeg -c conda-forge
3 輸入代碼如下:
import numpy as np
import librosa
import scipy
from scipy import io
class SpecSub(object):
def __init__(self, input_wav):
self.data, self.fs = librosa.load(input_wav, sr=None, mono=True)
self.noise_frame = 3 # 使用前三幀作爲噪聲估計
self.frame_duration = 200/1000 # 200ms 幀長
self.frame_length = np.int(self.fs * self.frame_duration)
self.fft = 2048 # 2048點fft
def main(self):
noise_data = self.get_noise_data()
oris = librosa.stft(self.data, n_fft=self.fft) # Short-time Fourier transform,
mag = np.abs(oris) # get magnitude
angle = np.angle(oris) # get phase
ns = librosa.stft(noise_data, n_fft=self.fft)
mag_noise = np.abs(ns)
mns = np.mean(mag_noise, axis=1) # get mean
sa = mag - mns.reshape((mns.shape[0], 1)) # reshape for broadcast to subtract
sa0 = sa * np.exp(1.0j * angle) # apply phase information
y = librosa.istft(sa0) # back to time domain signal
scipy.io.wavfile.write('./output.wav', self.fs, (y * 32768).astype(np.int16)) # save signed 16-bit WAV format
def get_noise_data(self):
noise_data = self.data[0:self.frame_length]
for i in range(1, self.noise_frame):
noise_data = noise_data + self.data[i*self.frame_length:(i+1)*self.frame_length]
noise_data = noise_data / self.noise_frame
return noise_data
ss = SpecSub('./test.wav')
ss.main()
print('done')
輸出的效果還算不錯,但發現1M不到的音頻文件降噪後變成3M多的音頻文件,在實時語音聊天中,這明顯不符合要求,而且該模塊讀入的是待處理的音頻文件,而不是字節流,這意味着C#發送過來的音頻數據(字節數組形式的數組)只能還原爲音頻文件才能給python進行處理,這明顯是不行的,不知你有什麼好的辦法,請多多指教。
2020年5月10日補充:
使用C#封裝C++語言實現的Speex
Demo源碼下載:http://zxy15914507674.gitee.io/shared_resource_name/speexdsp-1.2rc3.rar
上面的鏈接被碼雲廢掉了,直接去我的倉庫下載:https://gitee.com/zxy15914507674/shared_resource_name,找打對應的
speexdsp-1.2rc3.rar 下載即可
Demo源碼目錄結構:
具體封裝的細節請參考我寫的博客:https://blog.csdn.net/zxy13826134783/article/details/105958311
注意:下面的代碼只適用於標準的wav格式的音頻,那些把mp3後綴改爲wav後綴的音頻文件也能播放的不行,因爲文件頭不一樣
C++核心代碼如下:
// SpeexWinProj.cpp : 定義控制檯應用程序的入口點。
//
#include "stdafx.h"
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include "speex/speex_jitter.h"
#include "speex/speex_echo.h"
#include "speex/speex_preprocess.h"
#include "speex/speex_resampler.h"
#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>
#define HEADLEN 44
#define SAMPLE_RATE (48000)
#define SAMPLES_PER_FRAME (1024)
#define FRAME_SIZE (SAMPLES_PER_FRAME * 1000/ SAMPLE_RATE)
#define FRAME_BYTES (SAMPLES_PER_FRAME)
union jbpdata {
unsigned int idx;
unsigned char data[4];
};
void synthIn(JitterBufferPacket *in, int idx, int span) {
union jbpdata d;
d.idx = idx;
in->data = (char*)d.data;
in->len = sizeof(d);
in->timestamp = idx * 10;
in->span = span * 10;
in->sequence = idx;
in->user_data = 0;
}
void jitterFill(JitterBuffer *jb) {
char buffer[65536];
JitterBufferPacket in, out;
int i;
out.data = buffer;
jitter_buffer_reset(jb);
for(i=0;i<100;++i) {
synthIn(&in, i, 1);
jitter_buffer_put(jb, &in);
out.len = 65536;
if (jitter_buffer_get(jb, &out, 10, NULL) != JITTER_BUFFER_OK) {
printf("Fill test failed iteration %d\n", i);
}
if (out.timestamp != i * 10) {
printf("Fill test expected %d got %d\n", i*10, out.timestamp);
}
jitter_buffer_tick(jb);
}
}
void TestJitter()
{
char buffer[65536];
JitterBufferPacket in, out;
int i;
JitterBuffer *jb = jitter_buffer_init(10);
out.data = buffer;
/* Frozen sender case */
jitterFill(jb);
for(i=0;i<100;++i) {
out.len = 65536;
jitter_buffer_get(jb, &out, 10, NULL);
jitter_buffer_tick(jb);
}
synthIn(&in, 100, 1);
jitter_buffer_put(jb, &in);
out.len = 65536;
if (jitter_buffer_get(jb, &out, 10, NULL) != JITTER_BUFFER_OK) {
printf("Failed frozen sender resynchronize\n");
} else {
printf("Frozen sender: Jitter %d\n", out.timestamp - 100*10);
}
return ;
}
///降噪的方法,第一個參數爲需要進行降噪的文件名,第二個參數是降噪完畢後輸出的文件名
void TestNoise(char *pSrcFile,char *pDenoiseFile)
{
size_t n = 0;
FILE *inFile, *outFile;
fopen_s(&inFile, pSrcFile, "rb");
fopen_s(&outFile,pDenoiseFile, "wb");
char *headBuf = (char*)malloc(HEADLEN);
char *dataBuf = (char*)malloc(FRAME_BYTES * 2 );
memset(headBuf, 0, HEADLEN);
memset(dataBuf, 0, FRAME_BYTES);
assert(headBuf != NULL);
assert(dataBuf != NULL);
SpeexPreprocessState *state = speex_preprocess_state_init(1024, SAMPLE_RATE);
int denoise = 1;
int noiseSuppress = -25;
speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_DENOISE, &denoise);
speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_NOISE_SUPPRESS, &noiseSuppress);
int i;
i = 0;
speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_AGC, &i);
i = 80000;
speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_AGC_LEVEL, &i);
i = 0;
speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_DEREVERB, &i);
float f = 0;
speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_DEREVERB_DECAY, &f);
f = 0;
speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_DEREVERB_LEVEL, &f);
bool flag = true;
while (1)
{
//直接讀入音頻文件前44個字節並寫入目標文件,因爲前44個字節表示頭文件,不能進行降噪,不然會打不開
if (flag == true)
{
flag = false;
n = fread(headBuf, 1, HEADLEN, inFile);
if (n == 0)
break;
fwrite(headBuf, 1, HEADLEN, outFile);
}
else
{
//每次讀入1024個字節
n = fread(dataBuf, 1, SAMPLES_PER_FRAME, inFile);
if (n == 0)
break;
//對讀入的1024個字節進行降噪
speex_preprocess_run(state, (spx_int16_t*)(dataBuf));
//寫入降噪後的1024個字節
fwrite(dataBuf, 1, SAMPLES_PER_FRAME, outFile);
}
}
free(headBuf);
free(dataBuf);
fclose(inFile);
fclose(outFile);
speex_preprocess_state_destroy(state);
}
//本來是研究傳入和傳出都是字節數組的,這樣就可以遠程傳輸了,奈何對C++不熟悉,失敗了
void TestNoise_Buffer(char *input,char *output,int fileSize)
{
// SpeexPreprocessState *state = speex_preprocess_state_init(1024, SAMPLE_RATE);
// int denoise = 1;
// int noiseSuppress = -25;
// speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_DENOISE, &denoise);
// speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_NOISE_SUPPRESS, &noiseSuppress);
//
// int i;
// i = 0;
// speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_AGC, &i);
// i = 80000;
// speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_AGC_LEVEL, &i);
// i = 0;
// speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_DEREVERB, &i);
// float f = 0;
// speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_DEREVERB_DECAY, &f);
// f = 0;
// speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_DEREVERB_LEVEL, &f);
//
// char *dataBuf = (char*)malloc(FRAME_BYTES * 2 );
// memset(dataBuf, 0, FRAME_BYTES);
//
// int numCount=0;
//
//
// while (numCount<fileSize-1024)
// {
// for(int i=0;i<1024;i++)
// {
// *(dataBuf++)=*(input++);
// }
// speex_preprocess_run(state,(spx_int16_t*)dataBuf);
// for(int i=0;i<1024;i++)
// {
// *(output++)=*(dataBuf++);
// }
// numCount=numCount+1024;
//
// }
//
// free(dataBuf);
//
// speex_preprocess_state_destroy(state);
//
//
}
void _TestEcho(char *pSrcFile,char *pEchoFile,char *pAudioFile)
{
#define NN_ECHO 128
#define TAIL 1024
FILE *echo_fd, *ref_fd, *e_fd;
short echo_buf[NN_ECHO], ref_buf[NN_ECHO], e_buf[NN_ECHO];
SpeexEchoState *st;
SpeexPreprocessState *den;
int sampleRate = 8000;
echo_fd = fopen(pSrcFile, "rb");
ref_fd = fopen(pEchoFile, "rb");
e_fd = fopen(pAudioFile, "wb");
st = speex_echo_state_init(NN_ECHO, TAIL);
den = speex_preprocess_state_init(NN_ECHO, sampleRate);
speex_echo_ctl(st, SPEEX_ECHO_SET_SAMPLING_RATE, &sampleRate);
speex_preprocess_ctl(den, SPEEX_PREPROCESS_SET_ECHO_STATE, st);
while (!feof(ref_fd) && !feof(echo_fd))
{
fread(ref_buf, sizeof(short), NN_ECHO, ref_fd);
fread(echo_buf, sizeof(short), NN_ECHO, echo_fd);
speex_echo_cancellation(st, ref_buf, echo_buf, e_buf);
speex_preprocess_run(den, e_buf);
fwrite(e_buf, sizeof(short), NN_ECHO, e_fd);
}
speex_echo_state_destroy(st);
speex_preprocess_state_destroy(den);
fclose(e_fd);
fclose(echo_fd);
fclose(ref_fd);
}
int TestResampler()
{
#define NNTR 256
spx_uint32_t i;
short *in;
short *out;
float *fin, *fout;
int count = 0;
SpeexResamplerState *st = speex_resampler_init(1, 8000, 12000, 10, NULL);
speex_resampler_set_rate(st, 96000, 44100);
speex_resampler_skip_zeros(st);
in = (short*)malloc(NNTR*sizeof(short));
out = (short*)malloc(2*NNTR*sizeof(short));
fin = (float*)malloc(NNTR*sizeof(float));
fout = (float*)malloc(2*NNTR*sizeof(float));
while (1)
{
spx_uint32_t in_len;
spx_uint32_t out_len;
fread(in, sizeof(short), NNTR, stdin);
if (feof(stdin))
break;
for (i=0;i<NNTR;i++)
fin[i]=in[i];
in_len = NNTR;
out_len = 2*NNTR;
/*if (count==2)
speex_resampler_set_quality(st, 10);*/
speex_resampler_process_float(st, 0, fin, &in_len, fout, &out_len);
for (i=0;i<out_len;i++)
out[i]=floor(.5+fout[i]);
/*speex_warning_int("writing", out_len);*/
fwrite(out, sizeof(short), out_len, stdout);
count++;
}
speex_resampler_destroy(st);
free(in);
free(out);
free(fin);
free(fout);
return 0;
}
其中的void TestNoise(char *pSrcFile,char *pDenoiseFile)方法是降噪的方法,也是需要封裝的方法,其它方法沒有研究
C#代碼:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Runtime.InteropServices;
using System.Text;
using System.Threading.Tasks;
namespace CSharpCall
{
class Program
{
//加載dll庫,參數爲dll庫的名稱,返回句柄
[DllImport("kernel32")]
public static extern IntPtr LoadLibrary(string lpFileName);
//通過句柄釋放dll庫
[DllImport("Kernel32")]
public static extern bool FreeLibrary(IntPtr handle);
//根據函數名輸出庫函數,返回函數的指針
[DllImport("Kernel32")]
public static extern IntPtr GetProcAddress(IntPtr handle, String funcname);
[UnmanagedFunctionPointerAttribute(CallingConvention.Cdecl)]
unsafe public delegate void TestNoise_delegate(char* pSrcFile, char* pDenoiseFile);
//[UnmanagedFunctionPointerAttribute(CallingConvention.Cdecl)]
// unsafe public delegate void TestNoise_Buffer_delegate(char *input,char *ouput,int fileSize);
unsafe static void Main(string[] args)
{
//加載c++對應的dll庫
IntPtr dll = LoadLibrary("SpeexWinProj.dll");
IntPtr TestNoise_func = GetProcAddress(dll, "TestNoise");
//根據庫函數TestNoise_func獲取委託實例
TestNoise_delegate TestNoise = (TestNoise_delegate)Marshal.GetDelegateForFunctionPointer(TestNoise_func, typeof(TestNoise_delegate));
string fileNameInput = "test1.wav";
char* fileName_Iniput = (char*)System.Runtime.InteropServices.Marshal.StringToHGlobalAnsi(fileNameInput).ToPointer();
string fileNameOutPut = "out.wav";
char* fileName_Output = (char*)System.Runtime.InteropServices.Marshal.StringToHGlobalAnsi(fileNameOutPut).ToPointer();
TestNoise(fileName_Iniput, fileName_Output);
Console.WriteLine("轉換完成");
//IntPtr TestNoise_Buffer_func = GetProcAddress(dll, "TestNoise_Buffer");
////根據庫函數TestNoise_func獲取委託實例
//TestNoise_Buffer_delegate TestNoise_Buffer = (TestNoise_Buffer_delegate)Marshal.GetDelegateForFunctionPointer(TestNoise_Buffer_func, typeof(TestNoise_Buffer_delegate));
//FileStream fs = new FileStream("test1.wav", FileMode.Open);
//byte []fileBuffer=new byte[fs.Length];
//fs.Read(fileBuffer, 0, fileBuffer.Length);
//fs.Close();
//int sampleRate = 1024;
//byte[] outbuffer = new byte[fileBuffer.Length];
//for (int i = 0; i < 44; i++)
//{
// outbuffer[i]=fileBuffer[i];
//}
//byte[] input = new byte[fileBuffer.Length - 44];
//byte[] output = new byte[fileBuffer.Length];
//for (int i = 0; i <input.Length; i++)
//{
// input[i] = fileBuffer[i+44];
//}
//char* inPut = (char*)System.Runtime.InteropServices.Marshal.UnsafeAddrOfPinnedArrayElement(input,0).ToPointer();
//char* outPut = (char*)System.Runtime.InteropServices.Marshal.UnsafeAddrOfPinnedArrayElement(output, 0).ToPointer();
//TestNoise_Buffer(inPut, outPut, input.Length);
////[MarshalAs(UnmanagedType.LPArray)] byte[]
//for (int i = 0; i < output.Length; i++)
//{
// fileBuffer[i + 44] = Convert.ToByte(*(outPut++)) < 255 ? Convert.ToByte(*(outPut++)) : Convert.ToByte(254);
//}
//FileStream fw = new FileStream("out1.wav", FileMode.Create);
//fw.Write(outbuffer, 0, outbuffer.Length);
//fw.Close();
//Console.WriteLine("轉換完成");
Console.ReadKey();
}
}
}
使用Speex處理音頻的字節數組,可以參考:https://blog.csdn.net/zxy13826134783/article/details/106297974
其中對數據類型轉型的核心方法:
1 把字符串轉換爲指針
string fileNameInput = "test1.wav";
char* fileName_Iniput = (char*)System.Runtime.InteropServices.Marshal.StringToHGlobalAnsi(fileNameInput).ToPointer();
2 把字節數組轉爲指針
byte[] input = new byte[1024];
char* inPut = (char*)System.Runtime.InteropServices.Marshal.UnsafeAddrOfPinnedArrayElement(input,0).ToPointer();