【Machine Learning】Understanding Principal Component Analysis (PCA)

Introduction

The pupose of this post is my note in PCA learning, but maybe you can learn some details about PCA. If you want to understand PCA, you should have knowledge of eigen vector , eigen values, and Linear Algebra. But don’t worry about this, I will write some post about these knowledge of PCA in the future.

Principal Component Analysis (PCA)

We usually meet complex data in real world, which is multi-dimensional. As all know, it is so diffcult if you want to plot the data more than 3-dimension. With increasing of data dimensions, the demand of computations is also increase. Therefore, it is so important to reduce the computations. Reducing the dimensions of the data was considered as a useful way.

How to reduce the dimensions of the data?

  1. Remove the redundant dimensions.
  2. Keep the most important dimensions.

Try to understand some terms.

Variance: It is a measure how spread the datas is. Formally, it is the average squared deviation from the mean score. It is denoted by var(x)var(x) as the variance of xx.
var(x)=(xix)Nvar(x)={{\sum{(x_i-\overline x)}}\over{N}}
where xix_i is the ii-th dimension of the data xx, and NN is the number of dimensions.

Covariance: It is a measure of the extent to which corresponding elements from two data sets move in the some direction. It is denoted by cov(x,y)cov(x,y) as the covariance of xx and yy.
cov(x,y)=(xix)(yiy)Ncov(x, y)={{\sum(x_i-\overline x)(y_i-\overline y)}\over{N}}
在這裏插入圖片描述
This illustration is from here.
Positive covariance means the X and Y are positively related, when X increases Y also increases. Negative covariance is the exact opposite relation. Interestingly, zero covariance meas X and Y are not related.

What does PCA do?

PCA want to find a new set of dimensions whom are orthogonal. Therefore, these dimensions should be sort by their extent of importance. More importantly, we say a dimension is more important when datas more spread out in it. In other words, more variance, more importance.

The way PCA works is as follow:

  1. Calculate the covariance matrix XX of data points.
  2. Calculate eigen vectors and corresponding eigen values.
  3. Sort the eigen vectors according to their eigen values in decreasing order.
  4. Choose first kk eight vectors and than will be the new k dimensions.
  5. Transform the original nn dimensional data points into new kk dimensions.

In order to understand the detail working of PCA, I will write a post to introduce the knowlegde about eigen vectors, eigen values etc.

References

Understanding Principal Component Analysis
Eigenvectors and Eigenvalues
Principal Component Analysis

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章