Introduction

The pupose of this post is my note in PCA learning, but maybe you can learn some details about PCA. If you want to understand PCA, you should have knowledge of eigen vector , eigen values, and Linear Algebra. But don’t worry about this, I will write some post about these knowledge of PCA in the future.

Principal Component Analysis (PCA)

We usually meet complex data in real world, which is multi-dimensional. As all know, it is so diffcult if you want to plot the data more than 3-dimension. With increasing of data dimensions, the demand of computations is also increase. Therefore, it is so important to reduce the computations. Reducing the dimensions of the data was considered as a useful way.

How to reduce the dimensions of the data?

Remove the redundant dimensions.
Keep the most important dimensions.

Try to understand some terms.

Variance: It is a measure how spread the datas is. Formally, it is the average squared deviation from the mean score. It is denoted by $var(x)$ as the variance of $x$ .
$var(x)={{\sum{(x_i-\overline x)}}\over{N}}$
where $x_i$ is the $i$ -th dimension of the data $x$ , and $N$ is the number of dimensions.

Covariance: It is a measure of the extent to which corresponding elements from two data sets move in the some direction. It is denoted by $cov(x,y)$ as the covariance of $x$ and $y$ .
$cov(x, y)={{\sum(x_i-\overline x)(y_i-\overline y)}\over{N}}$

This illustration is from here.
Positive covariance means the X and Y are positively related, when X increases Y also increases. Negative covariance is the exact opposite relation. Interestingly, zero covariance meas X and Y are not related.

What does PCA do?

PCA want to find a new set of dimensions whom are orthogonal. Therefore, these dimensions should be sort by their extent of importance. More importantly, we say a dimension is more important when datas more spread out in it. In other words, more variance, more importance.

The way PCA works is as follow:

Calculate the covariance matrix $X$ of data points.
Calculate eigen vectors and corresponding eigen values.
Sort the eigen vectors according to their eigen values in decreasing order.
Choose first $k$ eight vectors and than will be the new k dimensions.
Transform the original $n$ dimensional data points into new $k$ dimensions.

In order to understand the detail working of PCA, I will write a post to introduce the knowlegde about eigen vectors, eigen values etc.

References

Understanding Principal Component Analysis
Eigenvectors and Eigenvalues
Principal Component Analysis

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

【Machine Learning】Understanding Principal Component Analysis (PCA)

Introduction

Principal Component Analysis (PCA)

What does PCA do?

References

【面試準備】又一次失敗的面試經歷，題目離譜～資深軟件測試工程師

dotnet 8 版本與銀河麒麟V10和UOS系統的 glibc 兼容性

【深度學習】logistic regression 中的反向傳播 (Back Propagation)

【深度學習】Grad-CAM 使用 MNIST + LeNet 基於 tensorflow 生成分類器對於數據的位置權重(熱圖 HeatMap)

【深度學習】GoogLeNet 中 inception v2 (filter: 33，1n) 的 tensorflow 的簡單實現(沒有使用 slim)

【深度學習】CNN + CIFAR10 學習筆記(數據輸入 mini-batch)(基於 TENSORFLOW)

【機器學習】用 tensorflow 實現隨機森林 RandomForest in tensorflow (mnist 示例)

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結