線性迴歸
一. 問題概述
迴歸的目的就是建立一個迴歸方程用來預測目標值,迴歸的求解就是求這個迴歸方程的迴歸係數。預測的方法當然十分簡單,迴歸係數乘以輸入值再全部相加就得到了預測值。
迴歸最簡單的定義是,給出一個點集D,用一個函數去擬合這個點集,並且使得點集與擬合函數間的誤差最小,如果這個函數曲線是一條直線,那就被稱爲線性迴歸,如果曲線是一條二次曲線,就被稱爲二次迴歸。
二. 線性迴歸的的求解
- 最小二乘法
1.1. 特點
Normal Equation算法也叫做普通最小二乘法(ordinary least squares),其特點是:給定輸人矩陣X,如果X.T*X的逆存在並可以求得的話,就可以直接採用該方法求解。其求解理論也十分簡單:既然是是求最小誤差平方和,另其導數爲0即可得出迴歸係數。
矩陣X爲(m,n+1)矩陣(m表示樣本數、n表示一個樣本的特徵數),y爲(m,1)列向量。
上述公式中包含X.T*X, 也就是需要對矩陣求逆,因此這個方程只在逆矩陣存在的時候適用。然而,矩陣的逆可能並不存在,後面“嶺迴歸”會討論處理方法。
1.2 推導
(具體見周志華教材)
1.3 實現
# -*- coding:utf-8 -*-
"""
最小二乘法實現線性迴歸
by LeYuan
"""
from numpy import *
import matplotlib.pyplot as plt
from sklearn import datasets
# Ordinary least square(普通的最小二乘法)
def ordinary_least_square(X, Y):
"""
例如:給定數據集D={(X1,y1),(X2,y2),(X3,y3),...,(Xm,ym)}, m爲樣本大小,其中Xi=(xi1;xi2;...xid),d爲特徵個數,欲
求模型f(Xi)=w*Xi+b , w=(w1;w2;...wd)
爲了方便計算,我們擴展 x0=1, W=(b;w1;w2;...;wd),則
:param X: X爲mxd維度矩陣 matrix
:param Y: Y爲mx1維度矩陣 matrix
:return: 迴歸係數 W=(b;w1;w2;...;wd)
"""
X0 = mat(ones((X.shape[0], 1)))
X = hstack((X0, X)) # extend x0=1 for each sample
xTx = X.T * X
if linalg.det(xTx) == 0.0: # 計算矩陣的行列式
print("This matrix is singular, cannot do inverse") # 奇異矩陣,不能求逆
else:
return xTx.I * (X.T*Y) # 返回迴歸係數 W=(b;w1;w2;...;wd) xTx.I表示矩陣的逆
def predict(X, W):
"""
預測數據
:param X: X爲mxd維度矩陣 matrix
:param W: 迴歸係數 W=(b;w1;w2;...;wd)
:return: the predict matrix
"""
X0 = mat(ones((X.shape[0], 1)))
X = hstack((X0, X)) # extend x0=1 for each sample
Y = X*W
return Y
def errors_compute(X, W, Y):
"""
compute the errors
:param X: X爲mxd維度矩陣 matrix
:param W: 迴歸係數 W=(b;w1;w2;...;wd)
:param Y: Y爲mx1維度矩陣 matrix ,the real value matrix
:return: the errors
"""
y_predict = predict(X, W)
total_error = np.mean(multiply(y_predict-Y, y_predict-Y))
return total_error
def test_regulation():
# load the diabetes dataset
diabetes = datasets.load_diabetes()
# use multiple feature
diabetes_X =diabetes.data[:, 0:5]
# Split the data into training/testing sets
diabetes_X_train = mat(diabetes_X[:-20])
diabetes_X_test = mat(diabetes_X[-20:])
# Split the targets into training/testing sets
diabetes_y_train = mat(diabetes.target[:-20]).T
diabetes_y_test = mat(diabetes.target[-20:]).T
W = ordinary_least_square(diabetes_X_train, diabetes_y_train)
# the coeffcients
print('Coefficients: \n', W)
# the mean squared error
yerror = predict(diabetes_X_train, W)-diabetes_y_train
print("Mean squared error: %.2f" % mean(multiply(yerror, yerror)))
# plot outputs
y_test_array = diabetes_y_test.A.reshape(diabetes_y_test.shape[0],)
y_predict_martrix = predict(diabetes_X_test, W)
y_predict_array = y_predict_martrix.A.reshape(y_predict_martrix.shape[0],)
plt.scatter(y_test_array, y_predict_array)
plt.plot(y_test_array, y_test_array)
plt.xlabel('true value')
plt.ylabel('predict value')
plt.title('ordinary_least_square')
plt.show()
if __name__ == "__main__":
test_regulation()
- 梯度下降法
2.1特點
詳細見:
線性迴歸與梯度下降:http://blog.csdn.net/xiazdong/article/details/7950084
多元線性迴歸、梯度下降http://m.blog.csdn.net/article/details?id=51169525
2.2算法
2.3實現
# -*- coding:utf-8 -*-
"""
梯度下降法實現多元線性迴歸
方法思想參考: http://m.blog.csdn.net/article/details?id=51169525
by LeYuan
"""
from numpy import *
import numpy as np
from sklearn import datasets
import matplotlib.pyplot as plt
def gradient_decent(feature, Y, W, a, tolerance, iterations_num):
"""
給定數據矩陣feature,計算參數列表w=(w0;w1;w2;w3;...;wn),學得模型hw(x)=w0X0+w1X1+...+wnXn=w.T*X
:param feature: 特徵值矩陣 mxn m個樣例,n個特徵
:param Y: 標記向量 Y=(y1;y2;y3;...;ym) mx1
:param W: 初始化參數 mx1
:param a: 步長
:param tolerance:下界
:return:w=(w0;w1;w2;w3;...;wn) (n+1)x1
"""
# feature->D extend x0=1 for each sample
x0 = mat(ones((feature.shape[0], 1)))
D = hstack((x0, feature)) # mx(n+1)
# feature scaling
converged = False
for i in range(iterations_num):
y_predict = D * W
# compute error
errors = np.mean(multiply(y_predict-Y, y_predict-Y))
# while we haven't reached the tolerance yet, update each feature's weight
# print (y_predict-Y).T.shape
derive = ((y_predict-Y).T * D) / float(D.shape[0]) # 1x(n+1)
W = W - a * derive.T
print("The iteration_num:", iterations_num, " The errors:", errors)
return W
def predict(feature, W):
"""
輸入數據,預測結果
:param feature: 特徵值矩陣
:param W: 參數向量
:return: 預測的向量
"""
# feature->D extend x0=1 for each sample
x0 = mat(ones((feature.shape[0], 1)))
D = hstack((x0, feature)) # mx(n+1)
# predict
y_predict = D * W
return y_predict
def test_regulation():
# load the diabetes dataset
diabetes = datasets.load_diabetes()
# use multiple feature
diabetes_X = diabetes.data[:, 0:5]
# Split the data into training/testing sets
diabetes_X_train = mat(diabetes_X[:-20])
diabetes_X_test = mat(diabetes_X[-20:])
# Split the targets into training/testing sets
diabetes_y_train = mat(diabetes.target[:-20]).T
diabetes_y_test = mat(diabetes.target[-20:]).T
# 初始化W =(0;0;0;...;0)
W = np.zeros([diabetes_X_train.shape[1]+1, 1])
a = 0.32
tolerance = 2200
iterations_num = 5000
W = gradient_decent(diabetes_X_train, diabetes_y_train, W, a, tolerance, iterations_num)
print("W=", W)
# 測試結果
y_predict_martrix = predict(diabetes_X_test, W)
print("y_predict_martrix", y_predict_martrix)
# plot
y_test_array = diabetes_y_test.A.reshape(diabetes_y_test.shape[0], )
y_predict_array = y_predict_martrix.A.reshape(y_predict_martrix.shape[0], )
plt.scatter(y_test_array, y_predict_array)
plt.plot(y_test_array, y_test_array)
plt.xlabel('true value')
plt.ylabel('predict value')
plt.title('gradientdecent')
plt.show()
if __name__ == "__main__":
test_regulation()
- 局部加權線性迴歸
- 嶺迴歸
5.1特點
5.2算法與思想
5.3實現
"""
嶺迴歸實現線性迴歸
by LeYuan
"""
from numpy import *
import matplotlib.pyplot as plt
from sklearn import datasets
def ridge_regression(X, Y, lam):
"""
例如:給定數據集D={(X1,y1),(X2,y2),(X3,y3),...,(Xm,ym)}, m爲樣本大小,其中Xi=(xi1;xi2;...xid),d爲特徵個數,欲
求模型f(Xi)=w*Xi+b , w=(w1;w2;...wd)
爲了方便計算,我們擴展 x0=1, W=(b;w1;w2;...;wd),則
:param X: mxd矩陣
:param Y: mx1矩陣
:param lam: 得到lambda參數
:return: W=(b;w1;w2;...;wd),
"""
X, Y = featurescaling(X, Y)
# extend x0=1 for each exmple
x0 = mat(ones((X.shape[0], 1)))
X = hstack((x0, X))
xTx = X.T * X
# 產生對角矩陣
I = eye(X.shape[1])
I[0][0] = 0 # w0 has no punish factor
denom = xTx + I * lam
if linalg.det(denom) == 0:
print("this matrix is singular, cannot do inverse")
return
else:
W = denom.I * X.T * Y
return W
def featurescaling(X, Y):
"""
feature scaling : Mean Nomalization ,即(x-mean(x))/(max-min)
:param X: mxd矩陣
:param Y: mx1矩陣
:return: X ,Y
"""
# feature scaling
yMean = mean(Y, 0)
Y = (Y - yMean)/(amax(Y, 0)-amin(Y, 0))
xMean = mean(X, 0) # calc mean the substract it off
xMax = amax(X, 0)
xMin = amin(X, 0)
X = (X - xMean)/(xMax - xMin)
return X, Y
def predict(X, W):
"""
預測數據
:param X: X爲mxd維度矩陣 matrix
:param W: 迴歸係數 W=(b;w1;w2;...;wd)
:return: the predict matrix
"""
X0 = mat(ones((X.shape[0], 1)))
X = hstack((X0, X)) # extend x0=1 for each sample
Y = X*W
return Y
def errors_compute(X, W, Y):
"""
compute the errors
:param X: X爲mxd維度矩陣 matrix
:param W: 迴歸係數 W=(b;w1;w2;...;wd)
:param Y: Y爲mx1維度矩陣 matrix ,the real value matrix
:return: the errors
"""
y_predict = predict(X, W)
total_error = np.mean(multiply(y_predict-Y, y_predict-Y))
return total_error
def test_regulation():
# load the diabetes dataset
diabetes = datasets.load_diabetes()
# use multiple feature
diabetes_X = diabetes.data[:, 0:5]
# Split the data into training/testing sets
diabetes_X_train = mat(diabetes_X[:-20])
diabetes_X_test = mat(diabetes_X[-20:])
# Split the targets into training/testing sets
diabetes_y_train = mat(diabetes.target[:-20]).T
diabetes_y_test = mat(diabetes.target[-20:]).T
# print((diabetes_X_test, diabetes_y_test))
# print(featurescaling(diabetes_X_test, diabetes_y_test))
diabetes_X_test, diabetes_y_test = featurescaling(diabetes_X_test, diabetes_y_test)
# set lam
lam = exp(-4)
W = ridge_regression(diabetes_X_train, diabetes_y_train, lam)
# the coeffcients
print('Coefficients: \n', W)
# the mean squared error
diabetes_X_train, diabetes_y_train = featurescaling(diabetes_X_train, diabetes_y_train)
yerror = predict(diabetes_X_train, W)-diabetes_y_train
print("Mean squared error: %.2f" % mean(multiply(yerror, yerror)))
# plot outputs
y_test_array = diabetes_y_test.A.reshape(diabetes_y_test.shape[0],)
y_predict_martrix = predict(diabetes_X_test, W)
y_predict_array = y_predict_martrix.A.reshape(y_predict_martrix.shape[0],)
plt.scatter(y_test_array, y_predict_array)
plt.plot(y_test_array, y_test_array)
plt.xlabel('true value')
plt.ylabel('predict value')
plt.title('ordinary_least_square')
plt.show()
if __name__ == "__main__":
test_regulation()
三. 應用與模型調優
對於需要根據一些特徵的組合來預測一個值(如預測房價、菜價等)且預測值和特徵組合間的關係是線性時既可以採用線性迴歸建立預測模型。通過機器學習算法建立起一個模型之後就需要在使用中不斷的調優和修正,對於線性迴歸來說,最佳模型就是取得預測偏差和模型方差之間的平衡(高偏差就是欠擬合,高方差就是過擬合)。線性迴歸模型中模型調優和修正的方法包括:
獲取更多的訓練樣本 - 解決高方差
嘗試使用更少的特徵的集合 - 解決高方差
嘗試獲得其他特徵 - 解決高偏差
嘗試添加多項組合特徵 - 解決高偏差
嘗試減小 λ - 解決高偏差
嘗試增加 λ -解決高方差
參考網址:
- 機器學習經典算法詳解及Python實現–線性迴歸(Linear Regression)算法: http://blog.csdn.net/suipingsp/article/details/42101139
- Dataasporant : http://dataaspirant.com/2017/02/15/simple-linear-regression-python-without-any-machine-learning-libraries/
- 機器學習理論與實戰(八)迴歸:http://blog.csdn.net/pi9nc/article/details/26616063
- 嶺迴歸的講解與實現:http://blog.csdn.net/u014303046/article/details/53053019?locationNum=2&fps=1
- 機器學習筆記博客csdn(很詳細):http://blog.csdn.net/artprog/article/details/51104192