PCA是主成分分析,用來降維,用少量的變量去解釋大部分變量,使得原來相關的變成不相關的,獨立的變量。
sklearn.decomposition.PCA(n_components=None,copy=True,whiten=False)
n_components保留下來的特徵個數n,缺省是所有都保留。賦值爲int就是要保留幾個。賦值爲‘mle’,自動選取,使得滿足要求的方差滿分比。
copy,True就是原來的數據不會改變,False原始數據會改變。
whiten,白化,使得每個特徵有相同的方差。
#-*- coding: utf-8 -*-
#主成分分析 降維
import pandas as pd
#參數初始化
inputfile = 'E:/PythonMaterial/chapter4/demo/data/principal_component.xls'
outputfile = 'E:/PythonMaterial/chapter4/demo/data/dimention_reducted.xls' #降維後的數據
data = pd.read_excel(inputfile, header = None) #讀入數據
from sklearn.decomposition import PCA
pca = PCA()
pca.fit(data)
a=pca.components_ #返回模型的各個特徵向量
print a
b=pca.explained_variance_ratio_ #返回各個成分各自的方差百分比(貢獻率)
print " "
print b
[[-0.56788461 -0.2280431 -0.23281436 -0.22427336 -0.3358618 -0.43679539
-0.03861081 -0.46466998]
[-0.64801531 -0.24732373 0.17085432 0.2089819 0.36050922 0.55908747
-0.00186891 -0.05910423]
[-0.45139763 0.23802089 -0.17685792 -0.11843804 -0.05173347 -0.20091919
-0.00124421 0.80699041]
[-0.19404741 0.9021939 -0.00730164 -0.01424541 0.03106289 0.12563004
0.11152105 -0.3448924 ]
[ 0.06133747 0.03383817 -0.12652433 -0.64325682 0.3896425 0.10681901
-0.63233277 -0.04720838]
[-0.02579655 0.06678747 -0.12816343 0.57023937 0.52642373 -0.52280144
-0.31167833 -0.0754221 ]
[ 0.03800378 -0.09520111 -0.15593386 -0.34300352 0.56640021 -0.18985251
0.69902952 -0.04505823]
[ 0.10147399 -0.03937889 -0.91023327 0.18760016 -0.06193777 0.34598258
0.02090066 -0.02137393]]
[ 7.74011263e-01 1.56949443e-01 4.27594216e-02 2.40659228e-02
1.50278048e-03 4.10990447e-04 2.07718405e-04 9.24594471e-05]
pca=PCA(n_components='mle')
newData=pca.fit_transform(data)#用它來降低維度
pd.DataFrame(newData).to_excel(outputfile)#保存結果
pca.inverse_transform(newData)#必要時可以用inverse_transform()來複原數據
[[ -8.19133694e+00 -1.69040279e+01 3.90991029e+00 7.48106686e+00
5.16142203e-01]
[ -2.85274026e-01 6.48074989e+00 -4.62870368e+00 5.01369607e+00
-1.65278935e+00]
[ 2.37073907e+01 2.85245701e+00 -4.96523096e-01 -1.57285727e+00
-2.09522277e-01]
[ 1.44320264e+01 -2.29917325e+00 -1.50272151e+00 -1.30763061e+00
1.54047215e+00]
[ -5.43045680e+00 -1.00070408e+01 9.52086923e+00 -5.63779544e+00
-9.21974743e-01]
[ -2.41595590e+01 9.36428589e+00 7.26578565e-01 -1.98622218e+00
-9.98528392e-01]
[ 3.66134607e+00 7.60198615e+00 -2.36439873e+00 4.21318409e-02
-8.48196502e-02]
[ -1.39676121e+01 -1.38912398e+01 -6.44917778e+00 -2.92916826e+00
-1.91994563e-01]
[ -4.08809359e+01 1.32568529e+01 4.16539368e+00 1.21239981e+00
1.33543444e+00]
[ 1.74887665e+00 4.23112299e+00 -5.89809954e-01 -1.57477365e+00
4.10612180e-01]
[ 2.19432196e+01 2.36645883e+00 1.33203832e+00 4.39763606e+00
-2.61113312e-02]
[ 3.67086807e+01 6.00536554e+00 3.97183515e+00 -1.54808393e+00
3.00572729e-01]
[ -3.28750663e+00 -4.86380886e+00 1.00424688e+00 8.51193030e-01
-6.27109498e-01]
[ -5.99885871e+00 -4.19398863e+00 -8.59953736e+00 -2.44159234e+00
6.09616105e-01]]