深度學習——反向傳播算法推導

前言

之前寫過單層前饋神經網絡,但是其中的推導是針對sigmoid函數的,本篇博客使用矩陣向量求導方式進行反向傳播算法的推導


符號約定

符號 含義
SiniS_{in}^i ii層神經元的輸入,若一層有n個神經元,則SiniS_{in}^i是一個n1n*1的向量
SoutiS_{out}^i ii層神經元的輸出,若一層有n個神經元,則SoutiS_{out}^i是一個n1n*1的向量
WiW^i ii層神經元對應的權重矩陣,若i1i-1層有mm個神經元,第ii層有nn個神經元,則WiW^inmn*m的矩陣
BiB^i ii層的偏移矩陣,若一層有n個神經元,則BiB^i是一個n1n*1的向量
costcost 損失函數值

xx表示[x1x2....xn]\begin{bmatrix} x_1\\ x_2\\ ....\\ x_n \end{bmatrix},第i層的激活函數向量fi(x)f^i(x)表示爲[f(x1)f(x2)....f(xn)]\begin{bmatrix} f(x_1)\\ f(x_2)\\ ....\\ f(x_n) \end{bmatrix}f(x)f(x)爲激活函數,(fi(x))(f^i(x))'表示爲[f(x1)x1f(x2)x2....f(xn)xn]\begin{bmatrix} \frac{\partial{f(x_1)}}{{\partial x_1}}\\ \frac{\partial{f(x_2)}}{{\partial x_2}}\\ ....\\ \frac{\partial{f(x_n)}}{{\partial x_n}} \end{bmatrix}


基於上述符號約定,對於第ii層的神經元,我們有
Souti1=fi(Sini1)Sini=WiSouti1+Bi \begin{aligned} S_{out}^{i-1}=&f^i(S_{in}^{i-1})\\ S_{in}^i=&W^iS_{out}^{i-1}+B^i \end{aligned}


標量對向量求導的鏈式法則

對於nn層前饋神經網絡,我們有
costSinnSinn1.....Sin1cost\leftarrow S_{in}^n\leftarrow S_{in}^{n-1}.....\leftarrow S_{in}^1
左箭頭表示映射,對於前饋神經網絡,映射即爲
Sini+1=Wi+1fi(Sini)+Bi+1 \begin{aligned} S_{in}^{i+1}=W^{i+1}f^{i}(S_{in}^{i})+B^{i+1} \end{aligned}
損失函數與最後一層的映射需要依據損失函數的類型決定(例如均方誤差、交叉熵),在上述映射關係的基礎上,標量對向量求導的鏈式法則定義爲
costSini=(SinnSinn1Sinn1Sinn2.......Sini+1Sini)TcostSinn=(Sini+1Sini)T.....(Sinn1Sinn2)T(SinnSinn1)TcostSinn=(Sini+1Sini)T.....(Sinn1Sinn2)TcostSinn1=.......=(Sini+1Sini)TcostSini+1 \begin{aligned} \frac{\partial cost}{\partial S_{in}^i}&=(\frac{\partial S_{in}^n}{\partial S_{in}^{n-1}}*\frac{\partial S_{in}^{n-1}}{\partial S_{in}^{n-2}}*.......*\frac{\partial S_{in}^{i+1}}{\partial S_{in}^{i}})^T*\frac{\partial cost}{\partial S_{in}^n}\\ &=(\frac{\partial S_{in}^{i+1}}{\partial S_{in}^{i}})^T*.....*(\frac{\partial S_{in}^{n-1}}{\partial S_{in}^{n-2}})^T*(\frac{\partial S_{in}^n}{\partial S_{in}^{n-1}})^T*\frac{\partial cost}{\partial S_{in}^n}\\ &=(\frac{\partial S_{in}^{i+1}}{\partial S_{in}^{i}})^T*.....*(\frac{\partial S_{in}^{n-1}}{\partial S_{in}^{n-2}})^T*\frac{\partial cost}{\partial S_{in}^{n-1}}\\ &=.......\\ &=(\frac{\partial S_{in}^{i+1}}{\partial S_{in}^{i}})^T\frac{\partial cost}{\partial S_{in}^{i+1}} \end{aligned}


常用向量對向量求導的公式

Y=AX+BY=AX+BYYXBX、B爲向量,AA爲矩陣,使用分子佈局,則有YX=A\frac{\partial{Y}}{\partial{X}}=A


反向傳播算法推導

假設有一個n層前饋神經網絡,則第ii層的梯度爲
costSini=(Sini+1Sini)TcostSini+1=(Wi+1)TcostSini+1(fi(Sini))(式1) \begin{aligned} \frac{\partial cost}{\partial S_{in}^i}&=(\frac{\partial S_{in}^{i+1}}{\partial S_{in}^{i}})^T*\frac{\partial cost}{\partial S_{in}^{i+1}}\\ &=(W^{i+1})^T*\frac{\partial cost}{\partial S_{in}^{i+1}} ☉ (f^{i}(S_{in}^i))' \end{aligned}\tag{式1}
☉爲Hadamard乘積,用於矩陣或向量之間點對點的乘法運算,即相同位置的元素相乘,對於最後一步,具體的理解如下,假設第ii層有n個神經元
(Wi+1)TcostSini+1(fi(Sini))=(Sini+1f(Sini))TcostSini+1(fi(Sini))=costf(Sini)(fi(Sini))=[costf((Sini)1)costf((Sini)2)......costf((Sini)n)](fi(Sini))=[costf((Sini)1)costf((Sini)2)......costf((Sini)n)][f((Sini)1)((Sini)1)f((Sini)2)((Sini)2)......f((Sini)n)((Sini)n)]=[costf((Sini)1)f((Sini)1)((Sini)1)costf((Sini)2)f((Sini)2)((Sini)2)......costf((Sini)n)f((Sini)n)((Sini)n)]=[cost((Sini)1)cost((Sini)2)......cost((Sini)n)]=costSini \begin{aligned} (W^{i+1})^T*\frac{\partial cost}{\partial S_{in}^{i+1}}☉ (f^{i}(S_{in}^i))'=&(\frac{\partial S_{in}^{i+1}}{\partial f(S_{in}^{i})})^T*\frac{\partial cost}{\partial S_{in}^{i+1}}☉ (f^{i}(S_{in}^i))'\\ =&\frac{\partial cost}{\partial f(S_{in}^{i})}☉ (f^{i}(S_{in}^i))'\\ =& \begin{bmatrix} \frac{\partial cost}{\partial f((S_{in}^{i})_1)}\\ \frac{\partial cost}{\partial f((S_{in}^{i})_2)}\\ ......\\ \frac{\partial cost}{\partial f((S_{in}^{i})_n)} \end{bmatrix}☉ (f^{i}(S_{in}^i))'\\ =& \begin{bmatrix} \frac{\partial cost}{\partial f((S_{in}^{i})_1)}\\ \frac{\partial cost}{\partial f((S_{in}^{i})_2)}\\ ......\\ \frac{\partial cost}{\partial f((S_{in}^{i})_n)} \end{bmatrix}☉ \begin{bmatrix} \frac{\partial {f((S_{in}^{i})_1)}}{\partial((S_{in}^{i})_1)}\\ \frac{\partial {f((S_{in}^{i})_2)}}{\partial((S_{in}^{i})_2)}\\ ......\\ \frac{\partial {f((S_{in}^{i})_n)}}{\partial((S_{in}^{i})_n)} \end{bmatrix}\\ =&\begin{bmatrix} \frac{\partial cost}{\partial f((S_{in}^{i})_1)}*\frac{\partial {f((S_{in}^{i})_1)}}{\partial((S_{in}^{i})_1)}\\ \frac{\partial cost}{\partial f((S_{in}^{i})_2)}*\frac{\partial {f((S_{in}^{i})_2)}}{\partial((S_{in}^{i})_2)}\\ ......\\ \frac{\partial cost}{\partial f((S_{in}^{i})_n)}*\frac{\partial {f((S_{in}^{i})_n)}}{\partial((S_{in}^{i})_n)} \end{bmatrix}\\ =&\begin{bmatrix} \frac{\partial cost}{\partial((S_{in}^{i})_1)}\\ \frac{\partial cost}{\partial((S_{in}^{i})_2)}\\ ......\\ \frac{\partial cost}{\partial((S_{in}^{i})_n)} \end{bmatrix}\\ =&\frac{\partial cost}{\partial S_{in}^i} \end{aligned}
接下來就是權重更新的梯度,推出第ii層的梯度後,對權重梯度與偏移的求導可以使用定義法求得到:
costWi=costSini(Souti1)T(式2) \begin{aligned} \frac{\partial cost}{\partial W^i}&=\frac{\partial cost}{\partial S_{in}^i}*(S_{out}^{i-1})^T\tag{式2} \end{aligned}
costBi=costSini(式3) \begin{aligned} \frac{\partial cost}{\partial B^i}&=\frac{\partial cost}{\partial S_{in}^i}\tag{式3} \end{aligned}
costSinn\frac{\partial cost}{\partial S_{in}^n}需要依據矩陣求導的定義法自己求出,求出後,即可依據式1、2、3求出各參數的梯度,關於矩陣求導的定義法,可以查看快點我,我等不及了

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章