轉自：https://blog.csdn.net/qq_18124075/article/details/78867536

說話人識別

這裏，博主對說話人兩個baseline模型應該matlab的MSR工具箱進行處理。

1、GMM-UBM說話人識別

這裏主要分爲4個步驟：

1、訓練UBM通用背景模型

2、最大後驗準則MAP從UBM通用背景模型裏面訓練每一個說話人的聲學模型

3、交叉得分

4、計算最終的測試效果，這裏用AUC和EER表示，可以方便與最近的深度學習方法做比較。

具體程序

設置環境參數：說話人有20個。每一幀的維度爲13，這裏可以根據MFCC的維度進行修改。一般語音數據都是單信道，這裏可以對信道進行設置，本實驗的信道爲10。

[plain] view plain copy

nSpeakers = 20;
nDims = 13; % dimensionality of feature vectors
nMixtures = 32; % How many mixtures used to generate data
nChannels = 10; % Number of channels (sessions) per speaker
nFrames = 100; % Frames per speaker (10 seconds assuming 100 Hz)
nWorkers = 1; % Number of parfor workers, if available

這裏爲了方便不用一般的語音數據庫如TIMIT，直接生產隨機多信道的音頻數據（10信道）。這裏trainSpeakerData和testSpeakerData爲20*10的cell，20爲說話人的個數，10爲說話人的信道數。每個說話人在訓練和測試集裏面是一一對應的。在每一個cell裏面維度爲13*100,13爲分幀之後的維度，100位幀數，在實際中分幀後的語音都會經過MFCC特徵提取。

[plain] view plain copy

% Pick random centers for all the mixtures.
mixtureVariance = .10;
channelVariance = .05;
mixtureCenters = randn(nDims, nMixtures, nSpeakers);
channelCenters = randn(nDims, nMixtures, nSpeakers, nChannels)*.1;
trainSpeakerData = cell(nSpeakers, nChannels);
testSpeakerData = cell(nSpeakers, nChannels);
speakerID = zeros(nSpeakers, nChannels);
% Create the random data. Both training and testing data have the same
% layout.
disp('Create the random data');
for s=1:nSpeakers
trainSpeechData = zeros(nDims, nMixtures);
testSpeechData = zeros(nDims, nMixtures);
for c=1:nChannels
for m=1:nMixtures
% Create data from mixture m for speaker s
frameIndices = m:nMixtures:nFrames;
nMixFrames = length(frameIndices);
trainSpeechData(:,frameIndices) = ...
randn(nDims, nMixFrames)*sqrt(mixtureVariance) + ...
repmat(mixtureCenters(:,m,s),1,nMixFrames) + ...
repmat(channelCenters(:,m,s,c),1,nMixFrames);
testSpeechData(:,frameIndices) = ...
randn(nDims, nMixFrames)*sqrt(mixtureVariance) + ...
repmat(mixtureCenters(:,m,s),1,nMixFrames) + ...
repmat(channelCenters(:,m,s,c),1,nMixFrames);
end
trainSpeakerData{s, c} = trainSpeechData;
testSpeakerData{s, c} = testSpeechData;
speakerID(s,c) = s; % Keep track of who this is
end
end

現在正式進入GMM-UBM階段：

訓練UBM通用背景模型，UBM也可以理解成混合高斯模型，說白了就是多個告訴模型的加權和。它的作用可以在說話人語料不足的情況下，依據UBM模型自適應得到集內說話人的模型。我們對高斯模型進行參數估計，會得到一個ubm的結構體，裏面包含了每個說話人的權值、mu、sigma。

[plain] view plain copy

% Step1: Create the universal background model from all the training speaker data
disp('Create the universal background model');
nmix = nMixtures; % In this case, we know the # of mixtures needed
final_niter = 10;
ds_factor = 1;
ubm = gmm_em(trainSpeakerData(:), nmix, final_niter, ds_factor, nWorkers);

最大後驗準則MAP從UBM通用背景模型裏面訓練每一個說話人的聲學模型，自適應的策略是根據目標說話人的訓練集trainSpeakerData特徵矢量與第一步求得的UBM的相似程度，將UBM的各個高斯分量按訓練集特徵矢量進行調整，從而形成目標說話人的聲學模型。再根據EM重估公式，計算每一個說話人修正模型的最優參數。

[html] view plain copy

% Step2: Now adapt the UBM to each speaker to create GMM speaker model.
disp('Adapt the UBM to each speaker');
map_tau = 10.0;
config = 'mwv';
gmm = cell(nSpeakers, 1);
for s=1:nSpeakers
disp(['for the ',num2str(s),' speaker...']);
gmm{s} = mapAdapt(trainSpeakerData(s, :), ubm, map_tau, config);
end

計算每個說話人模型的得分。因爲在說話人確認系統中，與說話人辨認不同，測試目標testSpeakerData變爲確認某段測試語音是否來源於某個目標說話人，本實驗爲20個說話人。如果測試語音與目標語音來源於相同的說話人，則此次測試爲目標測試(target test)；反之，如果測試語音與目標語音來源與不同的說話人，則此次測試爲非目標測試(non-target test)。將目標測試與非目標測試的後驗概率比作爲得分。

[plain] view plain copy

% Step3: Now calculate the score for each model versus each speaker's data.
% Generate a list that tests each model (first column) against all the
% testSpeakerData.
trials = zeros(nSpeakers*nChannels*nSpeakers, 2);
answers = zeros(nSpeakers*nChannels*nSpeakers, 1);
for ix = 1 : nSpeakers,
b = (ix-1)*nSpeakers*nChannels + 1;
e = b + nSpeakers*nChannels - 1;
trials(b:e, :) = [ix * ones(nSpeakers*nChannels, 1), (1:nSpeakers*nChannels)'];
answers((ix-1)*nChannels+b : (ix-1)*nChannels+b+nChannels-1) = 1;
end
disp('Calculate the score for each model vs test speaker');
gmmScores = score_gmm_trials(gmm, reshape(testSpeakerData', nSpeakers*nChannels,1), trials, ubm);

計算指標AUC和EER。對於開集的說話人辨認系統，需要將測試語音的輸出得分與特定的閾值進行比較，以做出是否是集外說話人的判決。對於說話人確認系統，需要對測試語音的輸出得分進行判決，一般是將其與一特定的閾值進行比較，若大於此閾值則接受其爲目標說話人，否則判定其爲冒認說話人。因而，閾值的選取對說話人識別系統的性能有着直接的影響，尤其是在實用的說話人識別系統研究中，閾值選取問題更是得到了研究者們的廣泛關注，提出了許多有效的閾值選取方法，其中比較常用的有等錯誤率(equal error rate，EER)閾值。這裏，博主加入了AUC，可以方便與深度學習方法做對比。

[plain] view plain copy

% Step4: Now compute the EER and plot the DET curve and confusion matrix
imagesc(reshape(gmmScores,nSpeakers*nChannels, nSpeakers))
title('Speaker Verification Likelihood (GMM Model)');
ylabel('Test # (Channel x Speaker)'); xlabel('Model #');
colorbar; drawnow; axis xy
figure
disp('Compute the EER');
[eer,auc] = compute_eer(gmmScores, answers, true);

2、基於ivector的GMM-UBM說話人識別

基於ivector的GMM-UBM模型是最近比較流行的baseline方法，這篇博客說得比較詳細。這裏就不再囉嗦地說明了。具體實現步驟爲

1、訓練UBM通用背景模型

2、計算通用背景模型的總變化空間

3、訓練Gaussian 概率線性判別PLDA模型，這樣可以極大程度地提高ivector對說話人識別的影響

4、交叉得分

5、計算最終的測試效果，這裏用AUC和EER表示，可以方便與最近的深度學習方法做比較。

其具體部分程序如下：

[plain] view plain copy

% Step1: Create the universal background model from all the training speaker data
nmix = nMixtures; % In this case, we know the # of mixtures needed
final_niter = 10;
ds_factor = 1;
ubm = gmm_em(trainSpeakerData(:), nmix, final_niter, ds_factor, nWorkers);
%%
% Step2.1: Calculate the statistics needed for the iVector model.
stats = cell(nSpeakers, nChannels);
for s=1:nSpeakers
for c=1:nChannels
[N,F] = compute_bw_stats(trainSpeakerData{s,c}, ubm);
stats{s,c} = [N; F];
end
end
% Step2.2: Learn the total variability subspace from all the speaker data.
tvDim = 100;
niter = 5;
T = train_tv_space(stats(:), ubm, tvDim, niter, nWorkers);
%
% Now compute the ivectors for each speaker and channel. The result is size
% tvDim x nSpeakers x nChannels
devIVs = zeros(tvDim, nSpeakers, nChannels);
for s=1:nSpeakers
for c=1:nChannels
devIVs(:, s, c) = extract_ivector(stats{s, c}, ubm, T);
end
end
%%
% Step3.1: Now do LDA on the iVectors to find the dimensions that matter.
ldaDim = min(100, nSpeakers-1);
devIVbySpeaker = reshape(devIVs, tvDim, nSpeakers*nChannels);
[V,D] = lda(devIVbySpeaker, speakerID(:));
finalDevIVs = V(:, 1:ldaDim)' * devIVbySpeaker;
% Step3.2: Now train a Gaussian PLDA model with development i-vectors
nphi = ldaDim; % should be <= ldaDim
niter = 10;
pLDA = gplda_em(finalDevIVs, speakerID(:), nphi, niter);
%%
% Step4.1: OK now we have the channel and LDA models. Let's build actual speaker
% models. Normally we do that with new enrollment data, but now we'll just
% reuse the development set.
averageIVs = mean(devIVs, 3); % Average IVs across channels.
modelIVs = V(:, 1:ldaDim)' * averageIVs;
% Step4.2: Now compute the ivectors for the test set
% and score the utterances against the models
testIVs = zeros(tvDim, nSpeakers, nChannels);
for s=1:nSpeakers
for c=1:nChannels
[N, F] = compute_bw_stats(testSpeakerData{s, c}, ubm);
testIVs(:, s, c) = extract_ivector([N; F], ubm, T);
end
end
testIVbySpeaker = reshape(permute(testIVs, [1 3 2]), ...
tvDim, nSpeakers*nChannels);
finalTestIVs = V(:, 1:ldaDim)' * testIVbySpeaker;

3、參考文獻

[1] N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, "Front-end factor analysis for speaker verification," IEEE TASLP, vol. 19, pp. 788-798, May 2011.
[2] P. Kenny, "A small footprint i-vector extractor," in Proc. Odyssey, The Speaker and Language Recognition Workshop, Jun. 2012.

[3] D. Matrouf, N. Scheffer, B. Fauve, J.-F. Bonastre, "A straightforward and efficient implementation of the factor analysis model for speaker verification," in Proc. INTERSPEECH, Antwerp, Belgium, Aug. 2007, pp. 1242-1245.

轉自：http://blog.csdn.net/qq_18124075/article/details/78867536

說話人識別matlab實現

說話人識別

1、GMM-UBM說話人識別

2、基於ivector的GMM-UBM說話人識別

3、參考文獻

杭州的 IT 崩盤了麼？

開源高性能結構化日誌模塊NanoLog

Python 潮流週刊#55：分享 9 個高質量的技術類信息源！

WinForm應用實戰開發指南 - 表格數據錄入問題解析

Azure Virtual Network (22) 多訂閱使用Azure DNS解析問題 Windows Azure Platform 系列文章目錄

【簡寫Mybatis-02】註冊機的實現以及SqlSession處理

手繪二維碼

.NET藉助虛擬網卡實現一個簡單異地組網工具

C/C++語言文件操作之fgets

說話人識別matlab實現

聲紋識別

理解字節序

聲紋識別綜述

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結