[機器學習-迴歸算法]Sklearn之線性迴歸實戰

一,前言

一元線性迴歸的理論片請看我這個鏈接

二,熱身例子

預測直線 y=1x1+2x2+3y = 1x_1 + 2x_2 +3

導入LinearRegression 從Sklearn.liear_model 包裏

from sklearn.linear_model import LinearRegression

擬合數據也可以說是訓練

reg = LinearRegression().fit(X, y)

檢驗正確率

print(reg.score(X, y))

訓練的係數,也就是X前面的那個係數,這裏打印出 [1. 2.]

print(reg.coef_)

直線的b的係數(其實就是偏置係數), 打印出3

print(reg.intercept_)

完整代碼

import numpy as np
from sklearn.linear_model import LinearRegression
X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
# y = 1 * x_0 + 2 * x_1 + 3
y = np.dot(X, np.array([1, 2])) + 3

print(X)
print(y)
reg = LinearRegression().fit(X, y)
print(reg.score(X, y))
print(reg.coef_)
print(reg.intercept_)
print(reg.predict(np.array([[3, 5]])))
[[1 1]
 [1 2]
 [2 2]
 [2 3]]
[ 6  8  9 11]
1.0
[1. 2.]
3.0000000000000018
[16.]

三,貿易公司的簡單例子

在這裏插入圖片描述
可見隨着廣告費用的增加,公司的銷售額也在增加,但是它們並非絕對的 線性關係,而是趨向於平均
我們用線性擬合一下這個數據吧。

import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error, r2_score

data = np.array([[10, 19],[13,60],[22,71],[37,74],[45,69],[48,89],[59,146]
                 ,[65,130],[66,153],[68,144],[68,128],[71,123],[84,127]
                 ,[88,125],[89,154],[89,150]])
X = data[:,np.newaxis, 0]
y = data[:,1]

print(X)
print(y)
reg = LinearRegression().fit(X, y)
print(reg.score(X, y))
print(reg.coef_)
print(reg.intercept_)
diabetes_y_pred = reg.predict(X)

print('Mean squared error: %.2f'  % mean_squared_error(y, diabetes_y_pred))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'  % r2_score(y, diabetes_y_pred))

plt.scatter(data[:,0], data[:,1],  color='black')
print('y='+str(reg.coef_[0]) +'*x + ' + str(reg.intercept_) )
plt.plot(data[:,0], reg.coef_*data[:,0] + reg.intercept_, color='blue', linewidth=3)
plt.show()
[[10]
 [13]
 [22]
 [37]
 [45]
 [48]
 [59]
 [65]
 [66]
 [68]
 [68]
 [71]
 [84]
 [88]
 [89]
 [89]]
[ 19  60  71  74  69  89 146 130 153 144 128 123 127 125 154 150]
0.7861129941287246
[1.37939644]
30.637280329657003
Mean squared error: 333.71
Coefficient of determination: 0.79
y=1.37939643679554*x + 30.637280329657003

擬合直線的表達是y=1.37939643679554*x + 30.637280329657003,,其中x表示廣告費用,y表示銷 售額,通過線性迴歸的公式就可以預測企業的銷售額了。結果還可以 R2 = 0.79
在這裏插入圖片描述

四,Sklearn 官網裏的一個例子

diabetes這個數據集一共有442個樣本,每個樣本有10 個特徵。
我們選兩個特徵用線性來擬合它。


import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score

# Load the diabetes dataset
diabetes_X, diabetes_y = datasets.load_diabetes(return_X_y=True)

print('diabetes_X', diabetes_X.shape)
print('diabetes_y', diabetes_y.shape)

# Use only one feature
diabetes_X = diabetes_X[:, np.newaxis, 2]

print('diabetes_X', diabetes_X.shape)

# Split the data into training/testing sets
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]

# Split the targets into training/testing sets
diabetes_y_train = diabetes_y[:-20]
diabetes_y_test = diabetes_y[-20:]

# Create linear regression object
regr = linear_model.LinearRegression()

# Train the model using the training sets
regr.fit(diabetes_X_train, diabetes_y_train)

# Make predictions using the testing set
diabetes_y_pred = regr.predict(diabetes_X_test)

# The coefficients
print('Coefficients: \n', regr.coef_)
# The mean squared error
print('Mean squared error: %.2f' % mean_squared_error(diabetes_y_test, diabetes_y_pred))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'% r2_score(diabetes_y_test, diabetes_y_pred))

# Plot outputs
plt.scatter(diabetes_X_test, diabetes_y_test,  color='black')
plt.plot(diabetes_X_test, diabetes_y_pred, color='blue', linewidth=3)

plt.xticks(())
plt.yticks(())

plt.show()
diabetes_X (442, 10)
diabetes_y (442,)
diabetes_X (442, 1)
Coefficients: 
 [938.23786125]
Mean squared error: 2548.07
Coefficient of determination: 0.47

R2 驗證結果是0.47,結果不是很好,不過無所謂,我們就當學習而已,畢竟我們只是選擇了10個裏面的兩個特徵。
在這裏插入圖片描述

參考資料

[1]https://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html#sphx-glr-auto-examples-linear-model-plot-ols-py
[2] https://www.cnblogs.com/wuliytTaotao/p/10837533.html

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章