Introduction
The pupose of this post is my note in PCA learning, but maybe you can learn some details about PCA. If you want to understand PCA, you should have knowledge of eigen vector , eigen values, and Linear Algebra. But don’t worry about this, I will write some post about these knowledge of PCA in the future.
Principal Component Analysis (PCA)
We usually meet complex data in real world, which is multi-dimensional. As all know, it is so diffcult if you want to plot the data more than 3-dimension. With increasing of data dimensions, the demand of computations is also increase. Therefore, it is so important to reduce the computations. Reducing the dimensions of the data was considered as a useful way.
How to reduce the dimensions of the data?
- Remove the redundant dimensions.
- Keep the most important dimensions.
Try to understand some terms.
Variance: It is a measure how spread the datas is. Formally, it is the average squared deviation from the mean score. It is denoted by as the variance of .
where is the -th dimension of the data , and is the number of dimensions.
Covariance: It is a measure of the extent to which corresponding elements from two data sets move in the some direction. It is denoted by as the covariance of and .
This illustration is from here.
Positive covariance means the X and Y are positively related, when X increases Y also increases. Negative covariance is the exact opposite relation. Interestingly, zero covariance meas X and Y are not related.
What does PCA do?
PCA want to find a new set of dimensions whom are orthogonal. Therefore, these dimensions should be sort by their extent of importance. More importantly, we say a dimension is more important when datas more spread out in it. In other words, more variance, more importance.
The way PCA works is as follow:
- Calculate the covariance matrix of data points.
- Calculate eigen vectors and corresponding eigen values.
- Sort the eigen vectors according to their eigen values in decreasing order.
- Choose first eight vectors and than will be the new k dimensions.
- Transform the original dimensional data points into new dimensions.
In order to understand the detail working of PCA, I will write a post to introduce the knowlegde about eigen vectors, eigen values etc.
References
Understanding Principal Component Analysis
Eigenvectors and Eigenvalues
Principal Component Analysis