PCA的應用示例

在 PCA 詳細算法介紹 (http://blog.csdn.net/watkinsong/article/details/38536463) 中, 因爲篇幅問題 沒有給出詳細的代碼示例, 這裏給出代碼示例。

通過對人臉圖像進行降維深入瞭解PCA算得使用。

首先看一下數據集, 我們有12張人臉圖像, 用10張人臉訓練PCA降維矩陣, 剩下的兩張可以用作測試。 

   

   

   

需要特別注意: 只能使用訓練集樣本進行所有的PCA訓練過程。

這裏所有的代碼都採用 octave實現, 跟matlab應該是一致的。


1. 加載圖像

%% Initialization
clear ; close all; clc

fprintf('this code will load 12 images and do PCA for each face.\n');
fprintf('10 images are used to train PCA and the other 2 images are used to test PCA.\n');

trainset = zeros(10, 32 * 32); % image size is : 32 * 32
m = 10; % number of samples

for i = 1 : m
	img = imread(strcat(int2str(i), '.bmp'));
	img = double(img);
	trainset(i, :) = img(:);
end

2. 特徵向量做 normalization


%% before training PCA, do feature normalization
mu = mean(trainset);
trainset_norm = bsxfun(@minus, trainset, mu);

sigma = std(trainset_norm);
trainset_norm = bsxfun(@rdivide, trainset_norm, sigma);

3. 在做特徵向量歸一化的過程中, 我們爲了以後使用歸一化參數, 需要保存這些歸一化參數。

比如這裏可能需要保存mu 和 sigma, 這裏我們已 mean face 的方式保存 mu, 因爲本示例比較小, 所以沒有保存 sigma, 這裏保存mu 的目的也僅僅是 爲了讓大家看一下平均臉的樣子。 如果在做項目的過程中, 可能訓練PCA是分開進行的, 以後需要進行降維, 那麼就需要保存這兩個歸一化參數。 

%% we could save the mean face mu to take a look the mean face
imwrite(uint8(reshape(mu, 32, 32)), 'mf.bmp');

看一下由10張人臉生成的平均臉:


是不是比較醜?  因爲人臉太少了, 再來看看由5000個人臉圖像生成的平均臉: 



4. 計算降維矩陣


%% compute reduce matrix
X = trainset_norm; % just for convience
[m, n] = size(X);

U = zeros(n);
S = zeros(n);

Cov = 1 / m * X' * X;
[U, S, V] = svd(Cov);
fprintf('compute cov done.\n');

5.  查看特徵臉


降維矩陣U中的特徵向量, 在關於人臉的降維中,又被稱爲特徵臉,  U 中的每個特徵向量相當於找到的降維空間的一個方向。 利用U可以將特徵映射到這個空間中。


這裏我們把的U中的前幾個特徵向量保存下來,  看一下特徵臉。 U 中的特徵向量是按照特徵值進行由大到小排序的, 這個排序的順序也決定了對於降維的影響最大的向量放在最前面。 


       

這裏的 eigen face 和人臉的相似性比較高, 因爲我們的樣本數量比較少, 就10個樣本。。。所以會出現這種相似度比較高的情況。


補充: 給出幾張用5000張人臉圖像訓練得到的eigen face, 如下所示:

     


6. 降維

%% dimension reduction
k = 100; % reduce to 100 dimension
test = zeros(2, 32 * 32);
for i = 1:2
	img = imread(strcat(int2str(i + 10), '.bmp'));
	img = double(img);
	test(i, :) = img(:);
end

% test set need to do normalization
test_norm = bsxfun(@minus, test, mu);
test_norm = bsxfun(@rdivide, test_norm, sigma);

% reduction
Uk = U(:, 1:k);
Z = test_norm * Uk;
fprintf('reduce done.\n');

7. 還原特徵(Reconstruction)


%% reconstruction
%% for the test set images, we only minus the mean face,
% so in the reconstruct process, we need add the mean face back
Xp = Z * Uk';
% show reconstructed face
for i = 1:5
	face = Xp(i, :) + mu;
	face = reshape((face), 32, 32);
	imwrite(uint8(face), strcat('./reconstruct/', int2str(4000 + i), '.bmp'));
end

%% for the train set reconstruction, we minus the mean face and divide by standard deviation during the train
% so in the reconstruction process, we need to multiby standard deviation first, 
% and then add the mean face back
trainset_re = trainset_norm * Uk; % reduction
trainset_re = trainset_re * Uk'; % reconstruction
for i = 1:5
	train = trainset_re(i, :);
	train = train .* sigma;
	train = train + mu;
	train = reshape(train, 32, 32);
	imwrite(uint8(train), strcat('./reconstruct/', int2str(i), 'train.bmp'));
end



注: 這裏我使用了訓練樣本爲4000張, 因爲樣本數量太少還原的效果很差。 

看一下特徵還原的效果: 左邊爲原始圖像, 右邊爲還原的圖像

對於測試樣本還原, 測試樣本在降維之前減去了mean face, 所以在還原之後還要加上mean face纔是真正的還原的圖像。

 

 

 

 


對於訓練樣本還原, 因爲訓練樣本即減去了mean face 還除以了 standard deviation, 所以在計算得到還原的樣本特徵後, 還首先要將特徵 按元素乘上 standard deviation, 也就是 .* ,  然後再加上mean face纔是最後得到的真實的還原的數據。 

看以下幾個關於訓練樣本的還原: 同樣左邊爲原始圖像, 右邊爲還原之後的圖像

  

  

  

  



整個工程的全部代碼: 


%% Initialization
clear ; close all; clc

fprintf('this code will load 12 images and do PCA for each face.\n');
fprintf('10 images are used to train PCA and the other 2 images are used to test PCA.\n');

m = 4000; % number of samples
trainset = zeros(m, 32 * 32); % image size is : 32 * 32

for i = 1 : m
	img = imread(strcat('./img/', int2str(i), '.bmp'));
	img = double(img);
	trainset(i, :) = img(:);
end


%% before training PCA, do feature normalization
mu = mean(trainset);
trainset_norm = bsxfun(@minus, trainset, mu);

sigma = std(trainset_norm);
trainset_norm = bsxfun(@rdivide, trainset_norm, sigma);

%% we could save the mean face mu to take a look the mean face
imwrite(uint8(reshape(mu, 32, 32)), 'meanface.bmp');
fprintf('mean face saved. paused\n');
pause;

%% compute reduce matrix
X = trainset_norm; % just for convience
[m, n] = size(X);

U = zeros(n);
S = zeros(n);

Cov = 1 / m * X' * X;
[U, S, V] = svd(Cov);
fprintf('compute cov done.\n');

%% save eigen face
for i = 1:10
	ef = U(:, i)';
	img = ef;
	minVal = min(img);
	img = img - minVal;
	max_val = max(abs(img));
	img = img / max_val;
	img = reshape(img, 32, 32);
	imwrite(img, strcat('eigenface', int2str(i), '.bmp'));
end

fprintf('eigen face saved, paused.\n');
pause;

%% dimension reduction
k = 100; % reduce to 100 dimension
test = zeros(10, 32 * 32);
for i = 4001:4010
	img = imread(strcat('./img/', int2str(i), '.bmp'));
	img = double(img);
	test(i - 4000, :) = img(:);
end

% test set need to do normalization
test = bsxfun(@minus, test, mu);

% reduction
Uk = U(:, 1:k);
Z = test * Uk;
fprintf('reduce done.\n');

%% reconstruction
%% for the test set images, we only minus the mean face,
% so in the reconstruct process, we need add the mean face back
Xp = Z * Uk';
% show reconstructed face
for i = 1:5
	face = Xp(i, :) + mu;
	face = reshape((face), 32, 32);
	imwrite(uint8(face), strcat('./reconstruct/', int2str(4000 + i), '.bmp'));
end

%% for the train set reconstruction, we minus the mean face and divide by standard deviation during the train
% so in the reconstruction process, we need to multiby standard deviation first, 
% and then add the mean face back
trainset_re = trainset_norm * Uk; % reduction
trainset_re = trainset_re * Uk'; % reconstruction
for i = 1:5
	train = trainset_re(i, :);
	train = train .* sigma;
	train = train + mu;
	train = reshape(train, 32, 32);
	imwrite(uint8(train), strcat('./reconstruct/', int2str(i), 'train.bmp'));
end

fprintf('job done.\n');



發佈了249 篇原創文章 · 獲贊 147 · 訪問量 148萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章