1.前言

由於數據的偏差與跨度會影響機器學習的成效，因此正規化(標準化)數據可以提升機器學習的成效

2.數據標準化

from sklearn import preprocessing #導入用於數據標準化的模塊
import numpy as np

data = np.array([[13,54,7,-5],
                 [67,98,11,34],
                [-56,49,22,39]],dtype = np.float64)
print(data)
print(preprocessing.scale(data))     #preprocessing.scale實現數據標準化

#
[[ 13.  54.   7.  -5.]
 [ 67.  98.  11.  34.]
 [-56.  49.  22.  39.]]
[[ 0.09932686 -0.59050255 -0.99861783 -1.40657764]
 [ 1.17205693  1.40812146 -0.36791183  0.57618843]
 [-1.27138379 -0.81761891  1.36652966  0.83038921]]

數據標準化後服從均值爲0，方差爲1的正太分佈

data_ = preprocessing.scale(data)
print(data_.mean(axis = 0))    
print(data_.std(axis = 0))

3.對比標準化前後

from sklearn import preprocessing    #導入用於數據標準化的模塊
import numpy as np
from sklearn.model_selection import train_test_split   
from sklearn.datasets.samples_generator import make_classification    #用於生成數據的模塊
from sklearn.svm import SVC   

import matplotlib.pyplot as plt


X, y = make_classification(n_samples=400,n_features=2,n_redundant=0,n_informative=2,random_state=42,n_clusters_per_class=1,scale=100)   #特徵個數= n_informative（） + n_redundant + n_repeated
plt.scatter(X[:,0],X[:,1],c=y)

3.1.數據標準化前

x_train, x_test, y_train, y_test = train_test_split(X,y,test_size=0.3)
model = SVC()
model.fit(x_train, y_train)
print("分類準確度:",model.score(x_test, y_test))

#輸出
分類準確度: 0.48333333333333334

標準化前的預測準確率只有0.48

3.2.數據標準化後

數據的單位發生了變化, X 數據也被壓縮到差不多大小範圍.

X = preprocessing.scale(X)
x_train, x_test, y_train, y_test = train_test_split(X,y,test_size=0.3)
model = SVC()
model.fit(x_train, y_train)
print("分類準確度:",model.score(x_test, y_test))

#輸出
分類準確度: 0.9166666666666666

標準化後的預測準確率提升至0.92

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Sklearn——對數據標準化(Normalization)

文章目錄

1.前言

2.數據標準化

3.對比標準化前後

3.1.數據標準化前

3.2.數據標準化後

劍指offer面試題63. 股票的最大利潤(動態規劃)

劍指offer面試題61. 撲克牌中的順子(排序)(遍歷)

Sklearn專題實戰——針對Category特徵進行分類

劍指offer面試題64. 求1+2+…+n(邏輯符短路)(遞歸)

劍指offer面試題65. 不用加減乘除做加法(位運算)

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結