Matlab中MFCC的幾種實現方式(轉載)

相關的函數

melbankm、mfcc_m、melcepst、cepstralFeatureExtractor、mfcc、HelperComputePitchAndMFCC、 melSpectrogram

幾種函數對比及說明

  • melbankm
    由Voicebox提供,在Mel頻率上設計平均分佈的濾波器,此函數與音頻信號沒有關係,只是做MFCC前對濾波器的設計。
function [x,mc,mn,mx]=melbankm(p,n,fs,fl,fh,w)
%MELBANKM determine matrix for a mel/erb/bark-spaced filterbank [X,MN,MX]=(P,N,FS,FL,FH,W)
%
% Inputs:
%       p   number of filters in filterbank or the filter spacing in k-mel/bark/erb [ceil(4.6*log10(fs))]
%       n   length of fft
%       fs  sample rate in Hz
%       fl  low end of the lowest filter as a fraction of fs [default = 0]
%       fh  high end of highest filter as a fraction of fs [default = 0.5]
%       w   any sensible combination of the following:
%             可取代Mel頻率的選項:
%             'b' = bark scale instead of mel
%             'e' = erb-rate scale
%             'l' = log10 Hz frequency scale
%             'f' = linear frequency scale
%
%             'c' = fl/fh specify centre of low and high filters
%             'h' = fl/fh are in Hz instead of fractions of fs
%             'H' = fl/fh are in mel/erb/bark/log10
%
%             濾波器形狀:
%             't' = triangular shaped filters in mel/erb/bark domain (default)
%             'n' = hanning shaped filters in mel/erb/bark domain
%             'm' = hamming shaped filters in mel/erb/bark domain
%
%             'z' = highest and lowest filters taper down to zero [default]
%             'y' = lowest filter remains at 1 down to 0 frequency and
%                   highest filter remains at 1 up to nyquist freqency
%
%             'u' = scale filters to sum to unity
%
%             's' = single-sided: do not double filters to account for negative frequencies
%
%             輸出濾波器組的響應曲線:
%             'g' = plot idealized filters [default if no output arguments present]
%
% Note that the filter shape (triangular, hamming etc) is defined in the mel (or erb etc) domain.
% Some people instead define an asymmetric triangular filter in the frequency domain.
%
%              If 'ty' or 'ny' is specified, the total power in the fft is preserved.
%
% Outputs:  x     a sparse matrix containing the filterbank amplitudes
%                 If the mn and mx outputs are given then size(x)=[p,mx-mn+1]
%                 otherwise size(x)=[p,1+floor(n/2)]
%                 Note that the peak filter values equal 2 to account for the power
%                 in the negative FFT frequencies.
%           mc    the filterbank centre frequencies in mel/erb/bark濾波器中心頻率
%           mn    the lowest fft bin with a non-zero coefficient
%           mx    the highest fft bin with a non-zero coefficient
%                 Note: you must specify both or neither of mn and mx.mn與mx必須同時指定或者不指定
%
% =============================!用法舉例(MFCC流程)==============================
%
% (a) Calcuate the Mel-frequency Cepstral Coefficients
%
%       f=rfft(s);                  % rfft() returns only 1+floor(n/2) coefficients去除虛數部分
%       x=melbankm(p,n,fs);         % n is the fft length, p is the number of filters wanted
%       z=log(x*abs(f).^2);         % multiply x by the power spectrum
%       c=dct(z);                   % take the DCT
%
% (b) Calcuate the Mel-frequency Cepstral Coefficients efficiently
%
%       f=fft(s);                        % n is the fft length, p is the number of filters wanted
%       [x,mc,na,nb]=melbankm(p,n,fs);   % na:nb gives the fft bins that are needed
%       z=log(x*(f(na:nb)).*conj(f(na:nb)));
%
% (c) Plot the calculated filterbanks
%
%      plot((0:floor(n/2))*fs/n,melbankm(p,n,fs)')   % fs=sample frequency
%
% (d) Plot the idealized filterbanks (without output sampling)
%
%      melbankm(p,n,fs);

該函數只是設計濾波器組,屬於MFCC處理的一部分。

  • mfcc_m
    由宋知用老師書中提供,涉及到歸一化Mel濾波器組係數、歸一化倒譜提升窗口。
bank=melbankm(p,frameSize,fs,0,0.5,'m');
% 歸一化Mel濾波器組係數
bank=full(bank);
bank=bank/max(bank( : ));

% 歸一化倒譜提升窗口:對MFCC係數中某些譜線進行增強

w = 1 + 6 * sin(pi * [1:p2] ./ p2);
w = w/max(w);

需要修正的地方:
只有一階差分系數;
濾波器選擇後並不能只截取想要的部分;
歸一化Mel濾波器組係數、歸一化倒譜提升窗口有待考證。

  • melcepst
    屬於voicebox工具箱,現在官方已經不提供了,程序中調用了melbankm函數。
function [c,tc]=melcepst(s,fs,w,nc,p,n,inc,fl,fh)
%MELCEPST Calculate the mel cepstrum of a signal C=(S,FS,W,NC,P,N,INC,FL,FH)
%
%
% Simple use: (1) c=melcepst(s,fs)          % calculate mel cepstrum with 12 coefs, 256 sample frames
%             (2) c=melcepst(s,fs,'E0dD')   % include log energy, 0th cepstral coef, delta and delta-delta coefs
%
% Inputs:
%     s   speech signal
%     fs  sample rate in Hz (default 11025)
%     w   mode string (see below)
%     nc  number of cepstral coefficients excluding 0'th coefficient [default 12] MFCC維數設定
%     p   number of filters in filterbank [default: floor(3*log(fs)) =  approx 2.1 per ocatave] 濾波器數量
%     n   length of frame in samples [default power of 2 < (0.03*fs)] 幀長
%     inc frame increment [default n/2] 幀移
%     fl  low end of the lowest filter as a fraction of fs [default = 0] 濾波器最低頻率
%     fh  high end of highest filter as a fraction of fs [default = 0.5] 濾波器最高頻率,通過fs歸一化
%
%     w   any sensible combination of the following:
%               時域窗函數:
%               'R'  rectangular window in time domain
%               'N'  Hanning window in time domain
%               'M'  Hamming window in time domain (default)
%
%               頻域窗函數:
%               't'  triangular shaped filters in mel domain (default)
%               'n'  hanning shaped filters in mel domain
%               'm'  hamming shaped filters in mel domain
%
%
%               'p'  filters act in the power domain
%               'a'  filters act in the absolute magnitude domain (default)
%
%               MFCC除12維基本參數之外的選擇:
%               '0'  include 0'th order cepstral coefficient
%               'E'  include log energy
%               'd'  include delta coefficients (dc/dt)
%               'D'  include delta-delta coefficients (d^2c/dt^2)
%
%               濾波器頻率設置:
%               'z'  highest and lowest filters taper down to zero (default)
%               'y'  lowest filter remains at 1 down to 0 frequency and
%                    highest filter remains at 1 up to nyquist freqency
%
%              If 'ty' or 'ny' is specified, the total power in the fft is preserved.
%
% Outputs:  c     mel cepstrum output: one frame per row. Log energy, if requested, is the
%                 first element of each row followed by the delta and then the delta-delta
%                 coefficients.
%           tc    fractional time in samples at the centre of each frame
%                 with the first sample being 1.

% ==================================設置默認參數=================================
if nargin<2 fs=11025; end% 濾波器的最高頻率
if nargin<3 w='M'; end% hamming窗
if nargin<4 nc=12; end% MFCC維數
if nargin<5 p=floor(3*log(fs)); end% p個濾波器
if nargin<6 n=pow2(floor(log2(0.03*fs))); end% n是一幀FFT後數據的長度
if nargin<9
   fh=0.5;% 濾波器的最高頻率,用fs歸一化   
   if nargin<8
     fl=0;% 設計濾波器的最低頻率
     if nargin<7
        inc=floor(n/2);
     end
  end
end

if isempty(w)
   w='M';
end
if any(w=='R')
   [z,tc]=enframe(s,n,inc);
elseif any (w=='N')
   [z,tc]=enframe(s,hanning(n),inc);
else
   [z,tc]=enframe(s,hamming(n),inc);
end

% =================================!理論核心部分=================================
f=rfft(z.');
[m,a,b]=melbankm(p,n,fs,fl,fh,w);% m爲濾波器的頻域響應
pw=f(a:b,:).*conj(f(a:b,:));% 計算幀能量
pth=max(pw(:))*1E-20;
if any(w=='p')
   y=log(max(m*pw,pth));
else
   ath=sqrt(pth);
   y=log(max(m*abs(f(a:b,:)),ath));
end
c=rdct(y).';% 得到13維繫數

nf=size(c,1);
nc=nc+1;
if p>nc
   c(:,nc+1:end)=[];% 當濾波器個數比所需維數多的時候,就將後面濾波器獲得的參數刪去
elseif p<nc
   c=[c zeros(nf,nc-p)];% 濾波器個數少的時候,用0補齊
end
if ~any(w=='0')
   c(:,1)=[];
   nc=nc-1;
end
if any(w=='E')
   c=[log(max(sum(pw),pth)).' c];
   nc=nc+1;
end

% ===============================計算一階和二階差分==============================
if any(w=='D')
  vf=(4:-1:-4)/60;
  af=(1:-1:-1)/2;
  ww=ones(5,1);
  cx=[c(ww,:); c; c(nf*ww,:)];
  vx=reshape(filter(vf,1,cx(:)),nf+10,nc);
  vx(1:8,:)=[];
  ax=reshape(filter(af,1,vx(:)),nf+2,nc);
  ax(1:2,:)=[];
  vx([1 nf+2],:)=[];
  if any(w=='d')
     c=[c vx ax];
  else
     c=[c ax];
  end
elseif any(w=='d')
  vf=(4:-1:-4)/60;
  ww=ones(4,1);
  cx=[c(ww,:); c; c(nf*ww,:)];
  vx=reshape(filter(vf,1,cx(:)),nf+8,nc);
  vx(1:8,:)=[];
  c=[c vx];
end
 
% =======================如果不輸出任何參數,就會輸出語譜圖==========================
if nargout<1
   [nf,nc]=size(c);
%    t=((0:nf-1)*inc+(n-1)/2)/fs;
   ci=(1:nc)-any(w=='0')-any(w=='E');
   imh = imagesc(tc/fs,ci,c.');
   axis('xy');
   xlabel('Time (s)');
   ylabel('Mel-cepstrum coefficient');
    map = (0:63)'/63;
    colormap([map map map]);
    colorbar;
end
  1. melcepst默認得到12維MFCC參數,時域中用hamming窗,頻域中用三角窗,最低頻率爲0,最高頻率爲採樣頻率的一半(採樣定理),幀移爲幀長的一半,幀長爲2的次冪但是小於0.03*fs。
    E:包括對數能量
    0:包括0階倒譜系數
    d:包括一階差分
    D:包括二階差分
  2. melcepst對參數’0’的處理
if ~any(w=='0')
   c(:,1)=[];
   nc=nc-1;
end

如果不需要’0’階係數,就將第一列刪除,並得到13-1=12維數據,說明DCT後得到的是13維數據,默認將第一個元素,即0階倒譜系數刪去。第一維比後12維都大很多(直流項?)。

默認12維參數
DCT後13維參數('0')

13維參數'E'

  • cepstralFeatureExtractor
    由Audio Toolbox提供,需要先將音頻分幀,每一列作爲一幀,再將每一幀依次輸入至cepstralFeatureExtractor,所以輸入的第一幀的delta與deltaDelta都是0。
test = 'D:\DataBase\TIMIT\TRAIN\DR2\MARC0\SX108.WAV';
[x, fs] = audioread(test);
n=pow2(floor(log2(0.03*fs)));
inc=floor(n/2);
f = enframe(x,hamming(n),inc);
cepFeatures = cepstralFeatureExtractor('SampleRate',fs,'LogEnergy','Replace');
[coeffs, delta, deltaDelta]= cepFeatures(f(1,:)');

參數設置中有FilterBankNormalization,選項爲:Area,Bandwidth(默認),None,用於濾波器組的權重分配。

濾波器歸一化

cepstralFeatureExtractor類的部分代碼:

classdef (StrictDefaults)cepstralFeatureExtractor < dsp.private.SampleRateEngine
 %cepstralFeatureExtractor Cepstral Feature Extractor
 %   cepFeatures = cepstralFeatureExtractor returns a System object,
 %   cepFeatures, that calculates cepstral features. Columns of the input
 %   are treated as individual channels.
 %
 %   cepFeatures = cepstralFeatureExtractor('Name',Value, ...) returns a
 %   cepstralFeatureExtractor System object, cepFeatures, with each
 %   specified property name set to the specified value. You can specify
 %   additional name-value pair arguments in any order as
 %   (Name1,Value1,...NameN,ValueN).
 %
 %   step method syntax內置的step()函數:
 %
 %   [COEFFS,DELTA,DELTADELTA] = step(cepFeatures,X) returns the cepstral
 %   coefficients, the delta, and the delta-delta. The log energy is also
 %   returned in the COEFFS output based on the LogEnergy property. The
 %   DELTA and DELTADELTA are initialized as zero-vectors. X must be a
 %   real-valued, double-precision or single-precision matrix. Each column
 %   of X is treated as an independent channel.
 %
 %   System objects may be called directly like a function instead of using
 %   the step method. For example, y = step(obj,x) and y = obj(x) are
 %   equivalent.
 %   對象可以直接作爲函數使用,所以step()與obj()功能一致
 %
 %   cepstralFeatureExtractor methods:
 %   step       - See above description for use of this method
 %   release    - Allow property values and input characteristics to change
 %   clone      - Create cepstralFeatureExtractor object with same property 
 %                values
 %   isLocked   - Locked status (logical)
 %   <a href="matlab:help matlab.System/reset   ">reset</a>      - Reset the internal states to initial conditions
 %   getFilters - Get filterbank used to calculate the cepstral 
 %                coefficients
 %
 %   cepstralFeatureExtractor properties:
 %   FilterBank  - Filter bank ('Mel'/'Gammatone')
 %   InputDomain - Domain of input signal
 %   NumCoeffs   - Number of coefficients to return
 %   FFTLength   - FFT length
 %   LogEnergy   - Log energy usage ('Append'/'Replace'/'Ignore')
 %   SampleRate  - Sample rate (Hz)
 %
 %   Advanced properties:
 %   BandEdges               - Band edges of mel filter bank (Hz)
 %   FilterBankNormalization - Normalize filter bank
 %   FilterBankDesignDomain  - Domain for mel filter bank design
 %   FrequencyRange          - Gammatone filter bank frequency range
    
    %#codegen
    properties
        %SampleRate Input sample rate (Hz)
        % Specify the sampling rate of the input in Hertz as a real, finite
        % numeric scalar. The default is 16000 Hz. This property is 
        % tunable.
        SampleRate = 16000;
    end
    
    properties (Constant, Hidden)
        % SampleRateSet is used to setup the choices for SampleRate
        SampleRateSet = matlab.system.SourceSet({'PropertyOrMethod', ...
            'SystemBlock', 'InheritSampleRate', 'getInheritedSampleRate',true});
    end
    
    properties (Nontunable)
        %BandEdges Band edges of Mel filter bank (Hz)
        % Specify the band edges of the mel filter bank as a monotonically
        % increasing vector in the range [0,fs/2]. The number of band edges
        % must be in the range [4,160]. The default band edges are spaced
        % linearly for the first ten and then logarithmically thereafter.
        % This property applies when FilterBank is 'Mel'.
        % 只有是Mel的時候,BandEdges屬性纔有用
        BandEdges = cepstralFeatureExtractor.getDefaultBandEdges();
        %FFTLength FFT length 默認FFT長度是輸入的行數,所以做好分幀!
        FFTLength = [];
        %NumCoeffs Number of coefficients to return 默認MFCC維數13
        NumCoeffs = 13;
        %InputDomain Domain of the input signal 默認輸入數據是時域的
        InputDomain = 'Time';
        %FilterBankNormalization Filter bank normalization 默認以帶寬設置濾波器權重
        FilterBankNormalization = 'Bandwidth';
        %LogEnergy Log energy usage 默認log能量參數是有的
        LogEnergy = 'Append';
    end
--------------------------------------------------------------------------------------------------------------------
end
  • mfcc
    由Audio Toolbox提供,最低頻率不是0,它用的是cepstralFeatureExtractor函數。
[audioIn,fs] = audioread('Counting-16-44p1-mono-15secs.wav');
[coeffs,delta,deltaDelta,loc] = mfcc(audioIn,fs);
function varargout = mfcc(x, fs, varargin)
%MFCC Extract the mfcc, log-energy, delta, and delta-delta of audio signal
%   coeffs = MFCC(audioIn,fs) returns the mel-frequency cepstral
%   coefficients over time for the audio input. Columns of the input are
%   treated as individual channels. coeffs is returned as an L-by-M-by-N
%   array.
%       L - Number of frames the audio signal is partitioned into.
%           This is determined by the WINDOWLENGTH and OVERLAPLENGTH 
%           properties.
%       M - Number coefficients returned per frame.
%           This is determined by the NUMCOEFFS property.
%       N - Number of channels.
%       
%   'WindowLength' defaults to round(0.030 * fs).
%   'OverlapLength' defaults to round(fs*0.02).
%   'NumCoeffs'  If not specified, the number of coefficients is 13.
%   'FFTLength'  By default, the FFT length is set to the WINDOWLENGTH.
%   'DeltaWindowLength' The default is 2.
%   coeffs = MFCC(...,'LogEnergy',LOGENERGY) specifies if and how the log
%   energy is used. Specify log energy as a character vector:
%       'Append'  - Adds the log-energy as the first element of the
%                   returned coefficients vector. This is the default
%                   setting.
%       'Replace' - Replaces the zeroth coefficient (first element of
%                   coeffs) with the log-energy.
%       'Ignore'  - Ignores and does not return the log-energy.

% =========================驗證輸入數據的格式=============================
validateRequiredInputs(x, fs)

params =  audio.internal.MFCCValidator(fs,size(x,1),varargin{:});% 輸入默認的參數

hopLength = params.WindowLength - params.OverlapLength;% 幀移

% ==========================創建mfcc提取object============================
mfccObject = cepstralFeatureExtractor( ...
    'SampleRate',              fs, ...
    'FFTLength',               params.FFTLength, ...
    'NumCoeffs',               params.NumCoeffs, ...
    'LogEnergy',               params.LogEnergy);

% ====================驗證所需要的mfcc維數比濾波器個數少===================
numValidBands = sum(mfccObject.BandEdges <= floor(fs/2)) - 2;
coder.internal.errorIf(numValidBands < params.NumCoeffs, ...
    'audio:mfcc:BadNumCoeffs', ...
    numValidBands,fs);

% ==========================mfcc參數獲取=================================
[nRow,nChan] = size(x);% 一般都是單通道,audiorea讀取到的是一列數據
N            = params.WindowLength;
numHops      = floor((nRow-N)/hopLength) + 1;

y            = audio.internal.buffer(x,N,hopLength);
c            = mfccObject(y);% mfccObject是cepstralFeatureExtractor類,所以,與cepstralFeatureExtractor求解方法一樣
c2           = reshape(c , size(c,1) , size(c,2)/nChan   , nChan );
coeffs       = permute(c2 , [2 1 3]);% 將第1維與第2維轉置,因爲cepstralFeatureExtractor得到的特徵是列排的

varargout{1} = coeffs;

%=========================一階差分====================================
if nargout > 1
    delta        = audio.internal.computeDelta(coeffs,params.DeltaWindowLength);
    varargout{2} = delta;
end

% ============================二階差分=================================
if nargout > 2
    deltaDelta   = audio.internal.computeDelta(delta,params.DeltaWindowLength);
    varargout{3} = deltaDelta;
end

% -------------------------------------------------------------------------
% Output sample stamp -----------------------------------------------------
if nargout > 3
    varargout{4} = ...
        cast(((0:(numHops-1))*hopLength + params.WindowLength)','like',x);
end

end

% -------------------------------------------------------------------------
% Validate required inputs
% -------------------------------------------------------------------------
function validateRequiredInputs(x,fs)
validateattributes(x,{'single','double'},...
    {'nonempty','2d','real'}, ...
    'mfcc','audioIn')
validateattributes(fs,{'single','double'}, ...
    {'nonempty','positive','real','scalar','nonnan','finite'}, ...
    'mfcc','fs');
end

默認有40個濾波器,得到14維參數,相當於melcepst中的’E0’,只是melcepst的最低頻率從0Hz開始;delta與deltaDelta的第一行都是0;loc是每一幀的開始位置。
如何使delta與deltaDelta的首行不爲0?設置DeltaWindowLength參數即可。

差分的計算
40組濾波器的頻帶範圍

從濾波器組設置可以看出,每個濾波器的起點是上個濾波器帶寬的中點。

  • HelperComputePitchAndMFCC
    查看源碼後,發現使用的是mfcc函數
    HelperComputePitchAndMFCC

  • melSpectrogram
    output的第一維是Number of bandpass filters in filterbank,默認爲32個濾波器;第二維是Number of frames in spectrogram,即幀數。
    它不可以計算差分,只是spectrogram的一個小分支,若取40個濾波器,得到的結果與mfcc相近,只是需要轉置一下。

幾種實現方式的對比

實現方式 MFCC 頻譜圖
mfcc 在這裏插入圖片描述 在這裏插入圖片描述
cepstralFeatureExtractor 在這裏插入圖片描述 在這裏插入圖片描述
melcepst 在這裏插入圖片描述 在這裏插入圖片描述
melSpectrogram 在這裏插入圖片描述 在這裏插入圖片描述

結論

可見,cepstralFeatureExtractor與mfcc所用算法基本一致,只是cepstralFeatureExtractor分幀求取,melcepst與它們的第2維數據有數量級的差異,暫時認爲是濾波器歸一化的原因。在mfcc中,log能量是作爲額外係數默認附加的,通常Matlab會提供最好的性能,所以暫時按默認選項進行。melSpectrogram默認32個濾波器,mfcc默認40個濾波器,且melSpectrogram不能計算差分,所以mfcc總的來說,更合適作爲以後的計算使用。

https://www.jianshu.com/p/1c2742096382

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章