Sklearn之線性迴歸實戰
一,前言
一元線性迴歸的理論片請看我這個鏈接
二,熱身例子
預測直線
導入LinearRegression 從Sklearn.liear_model 包裏
from sklearn.linear_model import LinearRegression
擬合數據也可以說是訓練
reg = LinearRegression().fit(X, y)
檢驗正確率
print(reg.score(X, y))
訓練的係數,也就是X前面的那個係數,這裏打印出 [1. 2.]
print(reg.coef_)
直線的b的係數(其實就是偏置係數), 打印出3
print(reg.intercept_)
完整代碼
import numpy as np
from sklearn.linear_model import LinearRegression
X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
# y = 1 * x_0 + 2 * x_1 + 3
y = np.dot(X, np.array([1, 2])) + 3
print(X)
print(y)
reg = LinearRegression().fit(X, y)
print(reg.score(X, y))
print(reg.coef_)
print(reg.intercept_)
print(reg.predict(np.array([[3, 5]])))
[[1 1]
[1 2]
[2 2]
[2 3]]
[ 6 8 9 11]
1.0
[1. 2.]
3.0000000000000018
[16.]
三,貿易公司的簡單例子
可見隨着廣告費用的增加,公司的銷售額也在增加,但是它們並非絕對的 線性關係,而是趨向於平均
我們用線性擬合一下這個數據吧。
import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error, r2_score
data = np.array([[10, 19],[13,60],[22,71],[37,74],[45,69],[48,89],[59,146]
,[65,130],[66,153],[68,144],[68,128],[71,123],[84,127]
,[88,125],[89,154],[89,150]])
X = data[:,np.newaxis, 0]
y = data[:,1]
print(X)
print(y)
reg = LinearRegression().fit(X, y)
print(reg.score(X, y))
print(reg.coef_)
print(reg.intercept_)
diabetes_y_pred = reg.predict(X)
print('Mean squared error: %.2f' % mean_squared_error(y, diabetes_y_pred))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f' % r2_score(y, diabetes_y_pred))
plt.scatter(data[:,0], data[:,1], color='black')
print('y='+str(reg.coef_[0]) +'*x + ' + str(reg.intercept_) )
plt.plot(data[:,0], reg.coef_*data[:,0] + reg.intercept_, color='blue', linewidth=3)
plt.show()
[[10]
[13]
[22]
[37]
[45]
[48]
[59]
[65]
[66]
[68]
[68]
[71]
[84]
[88]
[89]
[89]]
[ 19 60 71 74 69 89 146 130 153 144 128 123 127 125 154 150]
0.7861129941287246
[1.37939644]
30.637280329657003
Mean squared error: 333.71
Coefficient of determination: 0.79
y=1.37939643679554*x + 30.637280329657003
擬合直線的表達是y=1.37939643679554*x + 30.637280329657003,,其中x表示廣告費用,y表示銷 售額,通過線性迴歸的公式就可以預測企業的銷售額了。結果還可以 R2 = 0.79
四,Sklearn 官網裏的一個例子
diabetes這個數據集一共有442個樣本,每個樣本有10 個特徵。
我們選兩個特徵用線性來擬合它。
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score
# Load the diabetes dataset
diabetes_X, diabetes_y = datasets.load_diabetes(return_X_y=True)
print('diabetes_X', diabetes_X.shape)
print('diabetes_y', diabetes_y.shape)
# Use only one feature
diabetes_X = diabetes_X[:, np.newaxis, 2]
print('diabetes_X', diabetes_X.shape)
# Split the data into training/testing sets
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]
# Split the targets into training/testing sets
diabetes_y_train = diabetes_y[:-20]
diabetes_y_test = diabetes_y[-20:]
# Create linear regression object
regr = linear_model.LinearRegression()
# Train the model using the training sets
regr.fit(diabetes_X_train, diabetes_y_train)
# Make predictions using the testing set
diabetes_y_pred = regr.predict(diabetes_X_test)
# The coefficients
print('Coefficients: \n', regr.coef_)
# The mean squared error
print('Mean squared error: %.2f' % mean_squared_error(diabetes_y_test, diabetes_y_pred))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'% r2_score(diabetes_y_test, diabetes_y_pred))
# Plot outputs
plt.scatter(diabetes_X_test, diabetes_y_test, color='black')
plt.plot(diabetes_X_test, diabetes_y_pred, color='blue', linewidth=3)
plt.xticks(())
plt.yticks(())
plt.show()
diabetes_X (442, 10)
diabetes_y (442,)
diabetes_X (442, 1)
Coefficients:
[938.23786125]
Mean squared error: 2548.07
Coefficient of determination: 0.47
R2 驗證結果是0.47,結果不是很好,不過無所謂,我們就當學習而已,畢竟我們只是選擇了10個裏面的兩個特徵。
參考資料
[1]https://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html#sphx-glr-auto-examples-linear-model-plot-ols-py
[2] https://www.cnblogs.com/wuliytTaotao/p/10837533.html