PCA代碼（wine數據）

（注意：np.linalg.eig函數求出的特徵值從大到小排列，且一一對應特徵向量，但是特徵向量是每一列，不是每一行！！！！！）

數據未標準化的PCA

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
'''***************************************************************
 * @Fun_Name    : def getSample(fileName)
 * @Function    : 獲取文件內的樣本,存入列表
 * @Parameter   : 文件名
 * @Return      : 樣本特徵 標籤
 * @Creed       : Talk is cheap , show me the code
 ***********************xieqinyu creates in 16:08 2020/5/17***'''
def getSample(fileName):
    dataSet = pd.read_csv(fileName,header=None).values   # 將字典轉化爲列表形式,不要設表頭，默認第一行爲表頭，會刪除表頭
    labels = dataSet[:,0]
    feature = dataSet[:,1:14]
    return labels,feature

'''***************************************************************
 * @Fun_Name    : def reduceMean(feature):
 * @Function    : 特徵去中心化
 * @Parameter   : 特徵矩陣
 * @Return      : 去中心化後的特徵矩陣
 * @Creed       : Talk is cheap , show me the code
 ***********************xieqinyu creates in 19:54 2020/5/17***'''
def reduceMean(feature):
    featureMean =  np.mean(feature,axis=0)               # 求均值
    featureDeal = feature - featureMean                  # 去均值後的特徵
    return featureDeal
'''***************************************************************
 * @Fun_Name    : def getC(featureDeal):
 * @Function    : 得到C矩陣
 * @Parameter   : 去中心化後的特徵矩陣
 * @Return      : C
 * @Creed       : Talk is cheap , show me the code
 ***********************xieqinyu creates in 20:03 2020/5/17***'''
def getC(featureDeal):
    m,n = np.shape(featureDeal)
    featureDeal = np.mat(featureDeal)
    C = (featureDeal.T * featureDeal)/m;
    return C
'''***************************************************************
 * @Fun_Name    : def getFeatureValuesVector(C):
 * @Function    : 得到C的特徵值特徵向量
 * @Parameter   : C n降到幾維
 * @Return      : 因爲是降到二維 返回前二大的特徵向量
 * @Creed       : Talk is cheap , show me the code
 ***********************xieqinyu creates in 20:14 2020/5/17***'''
def getFeatureValuesVector(C,n):
    featureValues,featureVector = np.linalg.eig(C)
    return (featureVector[:,0:n])                    #特徵向量已排好，對應的特徵值從大到小

label,feature = getSample('wine.txt')
featureDeal = reduceMean(feature)
C = getC(featureDeal)
featureVector = getFeatureValuesVector(C,2)
Coord = np.mat(featureDeal)*np.mat(featureVector)
plt.scatter(Coord[0:59,0].tolist(),Coord[0:59,1].tolist(),color = "b")
plt.scatter(Coord[59:130,0].tolist(),Coord[59:130,1].tolist(),color = "r")
plt.scatter(Coord[130:178,0].tolist(),Coord[130:178,1].tolist(),color = "g")
plt.show()
# print(Coord)

效果

數據標準化後的PCA：

數據標準化意義和方法：
https://www.cnblogs.com/fonttian/p/9162822.html

把上面程序中reduceMean替換成這個：

def reduceMean(feature):
    # 數據標準化
    featureMean =  np.mean(feature,axis=0)
    featureStd  =  np.std(feature,axis= 0)
    featureDeal = (feature - featureMean)/featureStd
    return featureDeal

效果：

有個理論地方我有點模糊，希望路過的大佬幫我解答下

Coord = np.mat(featureDeal)*np.mat(featureVector)

這一步是將向量投影到二維空間，這個向量爲什麼不是原始向量，而是標準化後的向量。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

py實現PCA降維wine葡萄酒數據，標準化和不標準化代碼（無框架）

PCA代碼（wine數據）

數據未標準化的PCA

數據標準化後的PCA：

記一次 .NET某工業設計軟件崩潰分析

創建 Vue3 項目

TS + Webpack 整合 Jest

分享5款.NET開源免費的Redis客戶端組件庫

安卓手機如何登錄抖音境外版

golang開發 gorilla websocket的使用

面試官：如果不允許線程池丟棄任務，應該選擇哪個拒絕策略？

Mac卸載 Node npm，升級 Node

嵌入式汽車電子學習路線

uni.showModel內容換行

C_means（C均值聚類）算法 c++遞歸實現

歸併選擇算法

py實現多分類正則化邏輯迴歸手寫訓練集（精煉加註）

四：SVM

py實現LDA降維wine葡萄酒數據（無框架）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結