PCA與Whitening

一、PCA 

    PCA即主成分分析(Principle Components Analysis),是統計機器學習、數據挖掘中對數據進行預處理的常用的一種方法。PCA的作用有2個,一個是數據降維,一個是數據的可視化。在實際應用數據中,樣本的維數可能很大,遠遠大於樣本數目,這樣模型的複雜度會很大,學習到的模型會過擬合,而且訓練速度也會比較慢,內存消耗比較大,但實際數據可能有些維度是線性相關的,可能也含有噪聲,這樣降維處理就很有必要了,不過PCA對防止過擬合的效果也不明顯,一般是根據正則化來防止過擬合。PCA是一種線性降維方法,它通過找出樣本空間變化較大的一些正交的座標方向,可認爲就是樣本的主成分,然後將樣本投影到這些座標從而降維到一個線性子空間。PCA找出了數據的主成分,丟掉了數據次要的成分,這些次要的成分可能是冗餘或者噪聲信息。PCA是一種特徵選擇的方法,假設數據維度爲n,我們可以選取前k個方差最大的投影方向,然後把這些數據投影到這個子空間得到k維的數據,得到原數據的“壓縮”表示,如果還是全部n個投影方向,我們實際只是選擇了與原數據空間座標軸方向不同的另一組座標軸方向來表示數據而已。


   PCA的兩種解釋分別是最大方差和最小平方誤差。最大方差就是找出數據方差最大的方向,最小平方誤差就是找出投影方向,使得數據投影后還是儘量地逼近原數據,跟線性迴歸有點類似,但實質是不一樣的,線性迴歸是有監督的,用x值來預測y值,目的是投影后與y值誤差最小,誤差是樣本點到直線的座標軸距離,而PCA是無監督的,目的是要使樣本與降維後的樣本誤差最小,誤差是樣本點與降維後樣本點的直線距離。

     

   例如上圖所示,u1爲數據中方差最大的方向,u2爲次大的方向。哪該如何找出這些投影方向呢?


   首先要對數據進行零均值處理,然後求出協方差矩陣及其特徵向量,特徵向量就是這些投影方向。零均值處理即消減歸一化、移除直流分量,要求出樣本的均值,然後對每個樣本點減去均值,這樣,樣本的每個維度的均值都爲零了。協方差矩陣是一個正方矩陣,(i,j)元素表示維度i與維度j的協方差。我們的目的是要使協方差矩陣非對角上元素值爲0,這樣任兩個不同的維度就沒有相關關係了,但通過計算出來的協方差矩陣不是這樣的,所以我們要找出投影方向,原數據投影后再計算的協方差矩陣的非對角元素值都爲0。

   協方差矩陣的計算公式爲:


 

   對協方差矩陣求特徵分解求出特徵向量也可以用奇異值分解(SVD)來求解,SVD求解出的U即爲協方差矩陣的特徵向量。U是變換矩陣或投影矩陣,即爲上面說的座標方向,U的每一列就是一個投影方向,則U'x即爲x在U下的投影,如果U'取前k行(k<n),x就壓縮到k維了。

特別地,對於2維空間,投影后的座標計算如下:


   投影后的數據變爲:

   

    我們可以只選擇第一個投影方向(最大的特徵值對應的特徵向量),對數據降維:


   降維後的數據也可以近似還原出原來的數據,用投影矩陣U乘於降維數據尾接了n-k個0後的數據,其實就是降維時把那些方差變化小的方向的座標值變爲了0.

   那如何選擇k的大小,即主成分的個數呢? 一般是根據要保留多少百分比的主成分來選擇k,比如可以保留99%的主成分,這樣就capture了數據中的99%的成分。


二、白化

    與PCA相關的一個預處理步驟是白化。假設訓練數據是圖像,由於圖像各相鄰像素是相關的,所以用來訓練時的輸入是冗餘的,白化就是要降低輸入的冗餘性,並且每個特徵具有相同的方差。
    在進行PCA處理後,維度之間的相關度變爲0,然後我們對每個維度都除以那個維度的標準差,這樣每個維度就具有單位方差了。
    在PCA白化時可與降維相結合,即我們可以僅保留前k個成分。
    但在ZCA白化時,我們通常保留數據的全部成分。

三、PCA與白化處理自然圖像

練習來自http://ufldl.stanford.edu/wiki/index.php/Exercise:PCA_and_Whitening,實驗數據是從自然圖像中採集10000個12x12的patch。
 
%%================================================================
%% Step 0a: Load data
%  Here we provide the code to load natural image data into x.
%  x will be a 144 * 10000 matrix, where the kth column x(:, k) corresponds to
%  the raw image data from the kth 12x12 image patch sampled.
%  You do not need to change the code below.

x = sampleIMAGESRAW();
figure('name','Raw images');
randsel = randi(size(x,2),200,1); % A random selection of samples for visualization
display_network(x(:,randsel));

%%================================================================
%% Step 0b: Zero-mean the data (by row)
%  You can make use of the mean and repmat/bsxfun functions.

% -------------------- YOUR CODE HERE -------------------- 
[n,m] = size(x); %m爲樣本數,n爲樣本維度
avg = mean(x, 2);
x = x - repmat(avg, 1, size(x,2));
display_network(x(:,randsel));
%%================================================================
%% Step 1a: Implement PCA to obtain xRot
%  Implement PCA to obtain xRot, the matrix in which the data is expressed
%  with respect to the eigenbasis of sigma, which is the matrix U.


% -------------------- YOUR CODE HERE -------------------- 
xRot = zeros(size(x)); % You need to compute this
sigma = x * x' / size(x, 2);
[U,S,V] = svd(sigma);
xRot = U'*x;


%%================================================================
%% Step 1b: Check your implementation of PCA
%  The covariance matrix for the data expressed with respect to the basis U
%  should be a diagonal matrix with non-zero entries only along the main
%  diagonal. We will verify this here.
%  Write code to compute the covariance matrix, covar. 
%  When visualised as an image, you should see a straight line across the
%  diagonal (non-zero entries) against a blue background (zero entries).

% -------------------- YOUR CODE HERE -------------------- 
covar = zeros(size(x, 1)); % You need to compute this
covar = xRot*xRot';

% Visualise the covariance matrix. You should see a line across the
% diagonal against a blue background.
figure('name','Visualisation of covariance matrix');
imagesc(covar);

%%================================================================
%% Step 2: Find k, the number of components to retain
%  Write code to determine k, the number of components to retain in order
%  to retain at least 99% of the variance.

% -------------------- YOUR CODE HERE -------------------- 
k = 0; % Set k accordingly
var_sum = sum(diag(covar));
curr_var_sum = 0;
for i=1:length(covar)
    curr_var_sum = curr_var_sum + covar(i,i);
    if curr_var_sum / var_sum >= 0.99
        k = i;
        break
    end
end  
    

%%================================================================
%% Step 3: Implement PCA with dimension reduction
%  Now that you have found k, you can reduce the dimension of the data by
%  discarding the remaining dimensions. In this way, you can represent the
%  data in k dimensions instead of the original 144, which will save you
%  computational time when running learning algorithms on the reduced
%  representation.
% 
%  Following the dimension reduction, invert the PCA transformation to produce 
%  the matrix xHat, the dimension-reduced data with respect to the original basis.
%  Visualise the data and compare it to the raw data. You will observe that
%  there is little loss due to throwing away the principal components that
%  correspond to dimensions with low variation.

% -------------------- YOUR CODE HERE -------------------- 
xHat = zeros(size(x));  % You need to compute this
xTilde = U(:, 1:k)'*x;
xHat = U*[xTilde; zeros(n-k,m)];

% Visualise the data, and compare it to the raw data
% You should observe that the raw and processed data are of comparable quality.
% For comparison, you may wish to generate a PCA reduced image which
% retains only 90% of the variance.

figure('name',['PCA processed images ',sprintf('(%d / %d dimensions)', k, size(x, 1)),'']);
display_network(xHat(:,randsel));
figure('name','Raw images');
display_network(x(:,randsel));

%%================================================================
%% Step 4a: Implement PCA with whitening and regularisation
%  Implement PCA with whitening and regularisation to produce the matrix
%  xPCAWhite. 

epsilon = 1;
xPCAWhite = zeros(size(x));
xPCAWhite = diag(1./sqrt(diag(S)+epsilon)) * xRot;
% -------------------- YOUR CODE HERE -------------------- 

%%================================================================
%% Step 4b: Check your implementation of PCA whitening 
%  Check your implementation of PCA whitening with and without regularisation. 
%  PCA whitening without regularisation results a covariance matrix 
%  that is equal to the identity matrix. PCA whitening with regularisation
%  results in a covariance matrix with diagonal entries starting close to 
%  1 and gradually becoming smaller. We will verify these properties here.
%  Write code to compute the covariance matrix, covar. 
%
%  Without regularisation (set epsilon to 0 or close to 0), 
%  when visualised as an image, you should see a red line across the
%  diagonal (one entries) against a blue background (zero entries).
%  With regularisation, you should see a red line that slowly turns
%  blue across the diagonal, corresponding to the one entries slowly
%  becoming smaller.

% -------------------- YOUR CODE HERE -------------------- 
covar = xPCAWhite*xPCAWhite';
% Visualise the covariance matrix. You should see a red line across the
% diagonal against a blue background.
figure('name','Visualisation of covariance matrix');
imagesc(covar);

%%================================================================
%% Step 5: Implement ZCA whitening
%  Now implement ZCA whitening to produce the matrix xZCAWhite. 
%  Visualise the data and compare it to the raw data. You should observe
%  that whitening results in, among other things, enhanced edges.

xZCAWhite = zeros(size(x));
xZCAWhite = U'*xPCAWhite;
% -------------------- YOUR CODE HERE -------------------- 

% Visualise the data, and compare it to the raw data.
% You should observe that the whitened images have enhanced edges.
figure('name','ZCA whitened images');
display_network(xZCAWhite(:,randsel));
figure('name','Raw images');
display_network(x(:,randsel));


 
參考資料:
Andrew NG機器學習PCA講義
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章