相關的函數
melbankm、mfcc_m、melcepst、cepstralFeatureExtractor、mfcc、HelperComputePitchAndMFCC、 melSpectrogram
幾種函數對比及說明
- melbankm
由Voicebox提供,在Mel頻率上設計平均分佈的濾波器,此函數與音頻信號沒有關係,只是做MFCC前對濾波器的設計。
function [x,mc,mn,mx]=melbankm(p,n,fs,fl,fh,w)
%MELBANKM determine matrix for a mel/erb/bark-spaced filterbank [X,MN,MX]=(P,N,FS,FL,FH,W)
%
% Inputs:
% p number of filters in filterbank or the filter spacing in k-mel/bark/erb [ceil(4.6*log10(fs))]
% n length of fft
% fs sample rate in Hz
% fl low end of the lowest filter as a fraction of fs [default = 0]
% fh high end of highest filter as a fraction of fs [default = 0.5]
% w any sensible combination of the following:
% 可取代Mel頻率的選項:
% 'b' = bark scale instead of mel
% 'e' = erb-rate scale
% 'l' = log10 Hz frequency scale
% 'f' = linear frequency scale
%
% 'c' = fl/fh specify centre of low and high filters
% 'h' = fl/fh are in Hz instead of fractions of fs
% 'H' = fl/fh are in mel/erb/bark/log10
%
% 濾波器形狀:
% 't' = triangular shaped filters in mel/erb/bark domain (default)
% 'n' = hanning shaped filters in mel/erb/bark domain
% 'm' = hamming shaped filters in mel/erb/bark domain
%
% 'z' = highest and lowest filters taper down to zero [default]
% 'y' = lowest filter remains at 1 down to 0 frequency and
% highest filter remains at 1 up to nyquist freqency
%
% 'u' = scale filters to sum to unity
%
% 's' = single-sided: do not double filters to account for negative frequencies
%
% 輸出濾波器組的響應曲線:
% 'g' = plot idealized filters [default if no output arguments present]
%
% Note that the filter shape (triangular, hamming etc) is defined in the mel (or erb etc) domain.
% Some people instead define an asymmetric triangular filter in the frequency domain.
%
% If 'ty' or 'ny' is specified, the total power in the fft is preserved.
%
% Outputs: x a sparse matrix containing the filterbank amplitudes
% If the mn and mx outputs are given then size(x)=[p,mx-mn+1]
% otherwise size(x)=[p,1+floor(n/2)]
% Note that the peak filter values equal 2 to account for the power
% in the negative FFT frequencies.
% mc the filterbank centre frequencies in mel/erb/bark濾波器中心頻率
% mn the lowest fft bin with a non-zero coefficient
% mx the highest fft bin with a non-zero coefficient
% Note: you must specify both or neither of mn and mx.mn與mx必須同時指定或者不指定
%
% =============================!用法舉例(MFCC流程)==============================
%
% (a) Calcuate the Mel-frequency Cepstral Coefficients
%
% f=rfft(s); % rfft() returns only 1+floor(n/2) coefficients去除虛數部分
% x=melbankm(p,n,fs); % n is the fft length, p is the number of filters wanted
% z=log(x*abs(f).^2); % multiply x by the power spectrum
% c=dct(z); % take the DCT
%
% (b) Calcuate the Mel-frequency Cepstral Coefficients efficiently
%
% f=fft(s); % n is the fft length, p is the number of filters wanted
% [x,mc,na,nb]=melbankm(p,n,fs); % na:nb gives the fft bins that are needed
% z=log(x*(f(na:nb)).*conj(f(na:nb)));
%
% (c) Plot the calculated filterbanks
%
% plot((0:floor(n/2))*fs/n,melbankm(p,n,fs)') % fs=sample frequency
%
% (d) Plot the idealized filterbanks (without output sampling)
%
% melbankm(p,n,fs);
該函數只是設計濾波器組,屬於MFCC處理的一部分。
- mfcc_m
由宋知用老師書中提供,涉及到歸一化Mel濾波器組係數、歸一化倒譜提升窗口。
bank=melbankm(p,frameSize,fs,0,0.5,'m');
% 歸一化Mel濾波器組係數
bank=full(bank);
bank=bank/max(bank( : ));
% 歸一化倒譜提升窗口:對MFCC係數中某些譜線進行增強
w = 1 + 6 * sin(pi * [1:p2] ./ p2);
w = w/max(w);
需要修正的地方:
只有一階差分系數;
濾波器選擇後並不能只截取想要的部分;
歸一化Mel濾波器組係數、歸一化倒譜提升窗口有待考證。
- melcepst
屬於voicebox工具箱,現在官方已經不提供了,程序中調用了melbankm函數。
function [c,tc]=melcepst(s,fs,w,nc,p,n,inc,fl,fh)
%MELCEPST Calculate the mel cepstrum of a signal C=(S,FS,W,NC,P,N,INC,FL,FH)
%
%
% Simple use: (1) c=melcepst(s,fs) % calculate mel cepstrum with 12 coefs, 256 sample frames
% (2) c=melcepst(s,fs,'E0dD') % include log energy, 0th cepstral coef, delta and delta-delta coefs
%
% Inputs:
% s speech signal
% fs sample rate in Hz (default 11025)
% w mode string (see below)
% nc number of cepstral coefficients excluding 0'th coefficient [default 12] MFCC維數設定
% p number of filters in filterbank [default: floor(3*log(fs)) = approx 2.1 per ocatave] 濾波器數量
% n length of frame in samples [default power of 2 < (0.03*fs)] 幀長
% inc frame increment [default n/2] 幀移
% fl low end of the lowest filter as a fraction of fs [default = 0] 濾波器最低頻率
% fh high end of highest filter as a fraction of fs [default = 0.5] 濾波器最高頻率,通過fs歸一化
%
% w any sensible combination of the following:
% 時域窗函數:
% 'R' rectangular window in time domain
% 'N' Hanning window in time domain
% 'M' Hamming window in time domain (default)
%
% 頻域窗函數:
% 't' triangular shaped filters in mel domain (default)
% 'n' hanning shaped filters in mel domain
% 'm' hamming shaped filters in mel domain
%
%
% 'p' filters act in the power domain
% 'a' filters act in the absolute magnitude domain (default)
%
% MFCC除12維基本參數之外的選擇:
% '0' include 0'th order cepstral coefficient
% 'E' include log energy
% 'd' include delta coefficients (dc/dt)
% 'D' include delta-delta coefficients (d^2c/dt^2)
%
% 濾波器頻率設置:
% 'z' highest and lowest filters taper down to zero (default)
% 'y' lowest filter remains at 1 down to 0 frequency and
% highest filter remains at 1 up to nyquist freqency
%
% If 'ty' or 'ny' is specified, the total power in the fft is preserved.
%
% Outputs: c mel cepstrum output: one frame per row. Log energy, if requested, is the
% first element of each row followed by the delta and then the delta-delta
% coefficients.
% tc fractional time in samples at the centre of each frame
% with the first sample being 1.
% ==================================設置默認參數=================================
if nargin<2 fs=11025; end% 濾波器的最高頻率
if nargin<3 w='M'; end% hamming窗
if nargin<4 nc=12; end% MFCC維數
if nargin<5 p=floor(3*log(fs)); end% p個濾波器
if nargin<6 n=pow2(floor(log2(0.03*fs))); end% n是一幀FFT後數據的長度
if nargin<9
fh=0.5;% 濾波器的最高頻率,用fs歸一化
if nargin<8
fl=0;% 設計濾波器的最低頻率
if nargin<7
inc=floor(n/2);
end
end
end
if isempty(w)
w='M';
end
if any(w=='R')
[z,tc]=enframe(s,n,inc);
elseif any (w=='N')
[z,tc]=enframe(s,hanning(n),inc);
else
[z,tc]=enframe(s,hamming(n),inc);
end
% =================================!理論核心部分=================================
f=rfft(z.');
[m,a,b]=melbankm(p,n,fs,fl,fh,w);% m爲濾波器的頻域響應
pw=f(a:b,:).*conj(f(a:b,:));% 計算幀能量
pth=max(pw(:))*1E-20;
if any(w=='p')
y=log(max(m*pw,pth));
else
ath=sqrt(pth);
y=log(max(m*abs(f(a:b,:)),ath));
end
c=rdct(y).';% 得到13維繫數
nf=size(c,1);
nc=nc+1;
if p>nc
c(:,nc+1:end)=[];% 當濾波器個數比所需維數多的時候,就將後面濾波器獲得的參數刪去
elseif p<nc
c=[c zeros(nf,nc-p)];% 濾波器個數少的時候,用0補齊
end
if ~any(w=='0')
c(:,1)=[];
nc=nc-1;
end
if any(w=='E')
c=[log(max(sum(pw),pth)).' c];
nc=nc+1;
end
% ===============================計算一階和二階差分==============================
if any(w=='D')
vf=(4:-1:-4)/60;
af=(1:-1:-1)/2;
ww=ones(5,1);
cx=[c(ww,:); c; c(nf*ww,:)];
vx=reshape(filter(vf,1,cx(:)),nf+10,nc);
vx(1:8,:)=[];
ax=reshape(filter(af,1,vx(:)),nf+2,nc);
ax(1:2,:)=[];
vx([1 nf+2],:)=[];
if any(w=='d')
c=[c vx ax];
else
c=[c ax];
end
elseif any(w=='d')
vf=(4:-1:-4)/60;
ww=ones(4,1);
cx=[c(ww,:); c; c(nf*ww,:)];
vx=reshape(filter(vf,1,cx(:)),nf+8,nc);
vx(1:8,:)=[];
c=[c vx];
end
% =======================如果不輸出任何參數,就會輸出語譜圖==========================
if nargout<1
[nf,nc]=size(c);
% t=((0:nf-1)*inc+(n-1)/2)/fs;
ci=(1:nc)-any(w=='0')-any(w=='E');
imh = imagesc(tc/fs,ci,c.');
axis('xy');
xlabel('Time (s)');
ylabel('Mel-cepstrum coefficient');
map = (0:63)'/63;
colormap([map map map]);
colorbar;
end
- melcepst默認得到12維MFCC參數,時域中用hamming窗,頻域中用三角窗,最低頻率爲0,最高頻率爲採樣頻率的一半(採樣定理),幀移爲幀長的一半,幀長爲2的次冪但是小於0.03*fs。
E:包括對數能量
0:包括0階倒譜系數
d:包括一階差分
D:包括二階差分 - melcepst對參數’0’的處理
if ~any(w=='0')
c(:,1)=[];
nc=nc-1;
end
如果不需要’0’階係數,就將第一列刪除,並得到13-1=12維數據,說明DCT後得到的是13維數據,默認將第一個元素,即0階倒譜系數刪去。第一維比後12維都大很多(直流項?)。
- cepstralFeatureExtractor
由Audio Toolbox提供,需要先將音頻分幀,每一列作爲一幀,再將每一幀依次輸入至cepstralFeatureExtractor,所以輸入的第一幀的delta與deltaDelta都是0。
test = 'D:\DataBase\TIMIT\TRAIN\DR2\MARC0\SX108.WAV';
[x, fs] = audioread(test);
n=pow2(floor(log2(0.03*fs)));
inc=floor(n/2);
f = enframe(x,hamming(n),inc);
cepFeatures = cepstralFeatureExtractor('SampleRate',fs,'LogEnergy','Replace');
[coeffs, delta, deltaDelta]= cepFeatures(f(1,:)');
參數設置中有FilterBankNormalization,選項爲:Area,Bandwidth(默認),None,用於濾波器組的權重分配。
cepstralFeatureExtractor類的部分代碼:
classdef (StrictDefaults)cepstralFeatureExtractor < dsp.private.SampleRateEngine
%cepstralFeatureExtractor Cepstral Feature Extractor
% cepFeatures = cepstralFeatureExtractor returns a System object,
% cepFeatures, that calculates cepstral features. Columns of the input
% are treated as individual channels.
%
% cepFeatures = cepstralFeatureExtractor('Name',Value, ...) returns a
% cepstralFeatureExtractor System object, cepFeatures, with each
% specified property name set to the specified value. You can specify
% additional name-value pair arguments in any order as
% (Name1,Value1,...NameN,ValueN).
%
% step method syntax內置的step()函數:
%
% [COEFFS,DELTA,DELTADELTA] = step(cepFeatures,X) returns the cepstral
% coefficients, the delta, and the delta-delta. The log energy is also
% returned in the COEFFS output based on the LogEnergy property. The
% DELTA and DELTADELTA are initialized as zero-vectors. X must be a
% real-valued, double-precision or single-precision matrix. Each column
% of X is treated as an independent channel.
%
% System objects may be called directly like a function instead of using
% the step method. For example, y = step(obj,x) and y = obj(x) are
% equivalent.
% 對象可以直接作爲函數使用,所以step()與obj()功能一致
%
% cepstralFeatureExtractor methods:
% step - See above description for use of this method
% release - Allow property values and input characteristics to change
% clone - Create cepstralFeatureExtractor object with same property
% values
% isLocked - Locked status (logical)
% <a href="matlab:help matlab.System/reset ">reset</a> - Reset the internal states to initial conditions
% getFilters - Get filterbank used to calculate the cepstral
% coefficients
%
% cepstralFeatureExtractor properties:
% FilterBank - Filter bank ('Mel'/'Gammatone')
% InputDomain - Domain of input signal
% NumCoeffs - Number of coefficients to return
% FFTLength - FFT length
% LogEnergy - Log energy usage ('Append'/'Replace'/'Ignore')
% SampleRate - Sample rate (Hz)
%
% Advanced properties:
% BandEdges - Band edges of mel filter bank (Hz)
% FilterBankNormalization - Normalize filter bank
% FilterBankDesignDomain - Domain for mel filter bank design
% FrequencyRange - Gammatone filter bank frequency range
%#codegen
properties
%SampleRate Input sample rate (Hz)
% Specify the sampling rate of the input in Hertz as a real, finite
% numeric scalar. The default is 16000 Hz. This property is
% tunable.
SampleRate = 16000;
end
properties (Constant, Hidden)
% SampleRateSet is used to setup the choices for SampleRate
SampleRateSet = matlab.system.SourceSet({'PropertyOrMethod', ...
'SystemBlock', 'InheritSampleRate', 'getInheritedSampleRate',true});
end
properties (Nontunable)
%BandEdges Band edges of Mel filter bank (Hz)
% Specify the band edges of the mel filter bank as a monotonically
% increasing vector in the range [0,fs/2]. The number of band edges
% must be in the range [4,160]. The default band edges are spaced
% linearly for the first ten and then logarithmically thereafter.
% This property applies when FilterBank is 'Mel'.
% 只有是Mel的時候,BandEdges屬性纔有用
BandEdges = cepstralFeatureExtractor.getDefaultBandEdges();
%FFTLength FFT length 默認FFT長度是輸入的行數,所以做好分幀!
FFTLength = [];
%NumCoeffs Number of coefficients to return 默認MFCC維數13
NumCoeffs = 13;
%InputDomain Domain of the input signal 默認輸入數據是時域的
InputDomain = 'Time';
%FilterBankNormalization Filter bank normalization 默認以帶寬設置濾波器權重
FilterBankNormalization = 'Bandwidth';
%LogEnergy Log energy usage 默認log能量參數是有的
LogEnergy = 'Append';
end
---------------------------------------------------------略-----------------------------------------------------------
end
- mfcc
由Audio Toolbox提供,最低頻率不是0,它用的是cepstralFeatureExtractor函數。
[audioIn,fs] = audioread('Counting-16-44p1-mono-15secs.wav');
[coeffs,delta,deltaDelta,loc] = mfcc(audioIn,fs);
function varargout = mfcc(x, fs, varargin)
%MFCC Extract the mfcc, log-energy, delta, and delta-delta of audio signal
% coeffs = MFCC(audioIn,fs) returns the mel-frequency cepstral
% coefficients over time for the audio input. Columns of the input are
% treated as individual channels. coeffs is returned as an L-by-M-by-N
% array.
% L - Number of frames the audio signal is partitioned into.
% This is determined by the WINDOWLENGTH and OVERLAPLENGTH
% properties.
% M - Number coefficients returned per frame.
% This is determined by the NUMCOEFFS property.
% N - Number of channels.
%
% 'WindowLength' defaults to round(0.030 * fs).
% 'OverlapLength' defaults to round(fs*0.02).
% 'NumCoeffs' If not specified, the number of coefficients is 13.
% 'FFTLength' By default, the FFT length is set to the WINDOWLENGTH.
% 'DeltaWindowLength' The default is 2.
% coeffs = MFCC(...,'LogEnergy',LOGENERGY) specifies if and how the log
% energy is used. Specify log energy as a character vector:
% 'Append' - Adds the log-energy as the first element of the
% returned coefficients vector. This is the default
% setting.
% 'Replace' - Replaces the zeroth coefficient (first element of
% coeffs) with the log-energy.
% 'Ignore' - Ignores and does not return the log-energy.
% =========================驗證輸入數據的格式=============================
validateRequiredInputs(x, fs)
params = audio.internal.MFCCValidator(fs,size(x,1),varargin{:});% 輸入默認的參數
hopLength = params.WindowLength - params.OverlapLength;% 幀移
% ==========================創建mfcc提取object============================
mfccObject = cepstralFeatureExtractor( ...
'SampleRate', fs, ...
'FFTLength', params.FFTLength, ...
'NumCoeffs', params.NumCoeffs, ...
'LogEnergy', params.LogEnergy);
% ====================驗證所需要的mfcc維數比濾波器個數少===================
numValidBands = sum(mfccObject.BandEdges <= floor(fs/2)) - 2;
coder.internal.errorIf(numValidBands < params.NumCoeffs, ...
'audio:mfcc:BadNumCoeffs', ...
numValidBands,fs);
% ==========================mfcc參數獲取=================================
[nRow,nChan] = size(x);% 一般都是單通道,audiorea讀取到的是一列數據
N = params.WindowLength;
numHops = floor((nRow-N)/hopLength) + 1;
y = audio.internal.buffer(x,N,hopLength);
c = mfccObject(y);% mfccObject是cepstralFeatureExtractor類,所以,與cepstralFeatureExtractor求解方法一樣
c2 = reshape(c , size(c,1) , size(c,2)/nChan , nChan );
coeffs = permute(c2 , [2 1 3]);% 將第1維與第2維轉置,因爲cepstralFeatureExtractor得到的特徵是列排的
varargout{1} = coeffs;
%=========================一階差分====================================
if nargout > 1
delta = audio.internal.computeDelta(coeffs,params.DeltaWindowLength);
varargout{2} = delta;
end
% ============================二階差分=================================
if nargout > 2
deltaDelta = audio.internal.computeDelta(delta,params.DeltaWindowLength);
varargout{3} = deltaDelta;
end
% -------------------------------------------------------------------------
% Output sample stamp -----------------------------------------------------
if nargout > 3
varargout{4} = ...
cast(((0:(numHops-1))*hopLength + params.WindowLength)','like',x);
end
end
% -------------------------------------------------------------------------
% Validate required inputs
% -------------------------------------------------------------------------
function validateRequiredInputs(x,fs)
validateattributes(x,{'single','double'},...
{'nonempty','2d','real'}, ...
'mfcc','audioIn')
validateattributes(fs,{'single','double'}, ...
{'nonempty','positive','real','scalar','nonnan','finite'}, ...
'mfcc','fs');
end
默認有40個濾波器,得到14維參數,相當於melcepst中的’E0’,只是melcepst的最低頻率從0Hz開始;delta與deltaDelta的第一行都是0;loc是每一幀的開始位置。
如何使delta與deltaDelta的首行不爲0?設置DeltaWindowLength
參數即可。
從濾波器組設置可以看出,每個濾波器的起點是上個濾波器帶寬的中點。
-
HelperComputePitchAndMFCC
查看源碼後,發現使用的是mfcc函數
-
melSpectrogram
output的第一維是Number of bandpass filters in filterbank,默認爲32個濾波器;第二維是Number of frames in spectrogram,即幀數。
它不可以計算差分,只是spectrogram的一個小分支,若取40個濾波器,得到的結果與mfcc相近,只是需要轉置一下。
幾種實現方式的對比
實現方式 | MFCC | 頻譜圖 |
---|---|---|
mfcc | ||
cepstralFeatureExtractor | ||
melcepst | ||
melSpectrogram |
結論
可見,cepstralFeatureExtractor與mfcc所用算法基本一致,只是cepstralFeatureExtractor分幀求取,melcepst與它們的第2維數據有數量級的差異,暫時認爲是濾波器歸一化的原因。在mfcc中,log能量是作爲額外係數默認附加的,通常Matlab會提供最好的性能,所以暫時按默認選項進行。melSpectrogram默認32個濾波器,mfcc默認40個濾波器,且melSpectrogram不能計算差分,所以mfcc總的來說,更合適作爲以後的計算使用。
https://www.jianshu.com/p/1c2742096382