標準方程的證明

標準方程的證明

線性迴歸模型公式(第i個實例的預測值yi^\hat{y_i}):
yi^=θ0+θ1xi,1+θ2xi,2+...+θnxi,n \hat{y_i}=\theta_0+\theta_1 x_{i,1}+\theta_2 x_{i,2} + ... + \theta_n x_{i,n}
轉化成矩陣:
yi^=[1xi,1xi,2xi,n][θ0θ1θ2θn] \hat{y_i}= \begin{bmatrix} 1 & x_{i,1} & x_{i,2} & \cdots & x_{i,n} \end{bmatrix} \begin{bmatrix} \theta_0 \\ \theta_1 \\ \theta_2 \\ \vdots \\ \theta_n \end{bmatrix}
簡化爲:
yi^=xiTθ \hat{y_i}=\mathbf{x_i}^{T}\theta
誤差公式爲:
MSE(θ)=1mi=1m(y^iyi)2=1mi=1m(xiTθyi)2 MSE(\mathbf{\theta})=\frac{1}{m} \sum_{i=1}^m (\hat{y}_i-y_i)^{2} =\frac{1}{m}\sum_{i=1}^{m}(\mathbf{x_i}^{T} \mathbf{\theta} -y_i)^{2}
設:
c=[x1Tθy1x2Tθy2xmTθym]=[x1Tθx2TθxmTθ][y1y2ym]=[x1Tx2TxmT]θy=Xθy \mathbf c = \begin{bmatrix} \mathbf{x_1}^{T} \mathbf{\theta} -y_1 \\ \mathbf{x_2}^{T} \mathbf{\theta} -y_2 \\ \vdots \\ \mathbf{x_m}^{T} \mathbf{\theta} -y_m \\ \end{bmatrix}= \begin{bmatrix} \mathbf{x_1}^{T} \mathbf{\theta} \\ \mathbf{x_2}^{T} \mathbf{\theta} \\ \vdots \\ \mathbf{x_m}^{T} \mathbf{\theta} \\ \end{bmatrix}- \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_m \end{bmatrix}= \begin{bmatrix} \mathbf{x_1}^{T} \\ \mathbf{x_2}^{T} \\ \vdots \\ \mathbf{x_m}^{T} \\ \end{bmatrix} \mathbf{\theta} -\mathbf{y} =\mathbf{X}\mathbf{\theta}-\mathbf{y}

則:
MSE(θ)=1mc2=1mXθy2 MSE(\mathbf{\theta})=\frac{1}{m} \left \| \mathbf{c} \right \|^{2} =\frac{1}{m} \left \| \mathbf{X}\mathbf{\theta}-\mathbf{y} \right \|^{2}

MSE(θ)MSE(\mathbf{\theta})要取到最小值,則對MSE(θ)=MSE(θ0,θ1,,θn)=EMSE(\mathbf{\theta})=MSE(\theta_0,\theta_1,\cdots,\theta_n)=E,相當於求解該多變量函數梯度爲0的點,梯度向量爲E函數對θ\mathbf{\theta}的偏導數:
Eθ=[Eθ0Eθ1Eθn] \frac{\partial{E}}{\partial{\mathbf{\theta}}}= \begin{bmatrix} \frac{\partial{E}}{\partial{\theta_0}} & \frac{\partial{E}}{\partial{\theta_1}} & \cdots & \frac{\partial{E}}{\partial{\theta_n}} & \end{bmatrix}
由矩陣的求導法則及下一節證明出的公式可證:

g(θ)=Xθy=ug(\mathbf \theta)=\mathbf X \mathbf \theta - \mathbf y=\mathbf u,則
f(u)=MSE(θ)=1mg(θ)2=1mu2f(\mathbf u)=MSE(\mathbf \theta)=\frac{1}{m}\left\| g(\mathbf \theta) \right\|^2=\frac{1}{m} \left\| \mathbf u \right\|^2
MSE(θ)θ=f(u)θ=f(u)uuθ=1mu2uXθyθ=1muTuuX=2muTX \frac{\partial MSE(\mathbf \theta)}{\partial \mathbf \theta}=\frac{\partial f(\mathbf u)}{\partial \mathbf \theta}=\frac{\partial f(\mathbf u)}{\partial \mathbf u} \frac{\partial \mathbf u}{\partial \mathbf \theta}=\frac{\partial \frac{1}{m} \left\| \mathbf u \right\|^2}{\partial \mathbf u} \frac{\partial \mathbf X \mathbf \theta - \mathbf y}{\partial \mathbf \theta} =\frac{1}{m}\frac{\partial \mathbf u^T\mathbf u}{\partial \mathbf u}\mathbf X=\frac{2}{m}\mathbf u^T\mathbf X

則求解梯度全爲0時θ\mathbf \theta的值θ^\hat{\mathbf \theta}:
2m(Xθ^y)TX=0 \frac{2}{m}\left( \mathbf X\hat{\mathbf \theta}-\mathbf y \right)^T\mathbf X=\mathbf 0
θ^TXTXyTX=0 \hat{\mathbf \theta}^T \mathbf X^T \mathbf X-\mathbf y^T \mathbf X=\mathbf 0
θ^T=yTX(XTX)1 \hat{\mathbf \theta}^T=\mathbf y^T\mathbf X\left( \mathbf X^T\mathbf X \right)^{-1}
θ^=(XTX)1XTy \hat{\mathbf \theta}=\left( \mathbf X^T\mathbf X \right)^{-1}\mathbf X^T\mathbf y

本質上來說是矩陣求導的應用,特殊多項式求最小值,該計算涉及到求逆操作,對n×n矩陣的求逆的計算複雜度通常爲O(n2.4)O(n^{2.4})O(n3)O(n^{3})之間,當特徵數量比較大時(例如100000時),標準方程的計算會極其緩慢

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章