基於計算圖的全連接層反向傳播導數公式推導

0. 前言

這一部分是齋藤康毅先生所編寫的《深度學習入門·基於Python的理論和實現》的補充,在Chapter 5 · 誤差反向傳播法中,作者對全連接層和Softmax層只給出了公式,對相關推導過程進行了省略,本文主要解決問題爲 : 作者在文中所提出的公式如下所示,該公式應如何理解

假設全連接層計算公式爲 : Y = X * W + B,則有 :
1. LB=LY\frac {\partial L}{\partial B} = \frac {\partial L}{\partial Y}
2. LW=XTLY\frac {\partial L}{\partial W} = X^{T} * \frac {\partial L}{\partial Y}
3. LX=LYWT\frac {\partial L}{\partial X} = \frac {\partial L}{\partial Y} * W^{T}

PS :

  1. 本文爲本人的閱讀筆記,只能作爲對書本的補充理解,具體的知識請參閱書本
  2. 在閱讀前,您需閱讀該文章,並對 在正向傳播時有分支流出,則反向傳播時它們的反向傳播值會相加 這一結論有所理解 : 基於計算圖的Softmax層反向傳播推導

1. 全連接層計算圖

Affine層反向傳播

2. 公式解釋

  1. LB=LY\frac {\partial L}{\partial B} = \frac {\partial L}{\partial Y}

    由圖可知,B 與 X * W 關係爲相加,對加結點而言,下級導數等於上級導數

  2. LW=XTLY\frac {\partial L}{\partial W} = X^{T} * \frac {\partial L}{\partial Y}

    矩陣形狀以圖中樣例爲例,W 矩陣爲 (2,3) ,X 矩陣爲 (1,2),令 M = X * W ,假設 wij 爲對 mi 而言,xj 的權重,由全連接層計算公式,有 : mi=wijxjm_{i} = \sum {w_{ij} * x_{j}}

    所以可知 : wij 在全連接層輸出Y的計算中,出現且只出現一次,所以 : Ywij=xj\frac {\partial Y}{\partial w_{ij}} = x_{j}

    又 : 對 mi 而言,上層傳遞的導數爲Lyi\frac{\partial L}{\partial y_{i}}

    以該圖爲例構造L對參數W的導數矩陣U,以實現更新公式 : W = W - α * U,則有 :
    LW=LYYW=[Ly1x1Ly2x1Ly3x1Ly1x2Ly2x2Ly3x2]=[x1x2][Ly1Ly2Ly3]=XTLY \frac{\partial L}{\partial W} = \frac{\partial L}{\partial Y} * \frac{\partial Y}{\partial W} = \left[ \begin{matrix} \frac{\partial L}{\partial y_{1}} * x_{1} & \frac{\partial L}{\partial y_{2}} * x_{1} & \frac{\partial L}{\partial y_{3}} * x_{1} \\ \frac{\partial L}{\partial y_{1}} * x_{2} & \frac{\partial L}{\partial y_{2}} * x_{2} & \frac{\partial L}{\partial y_{3}} * x_{2} \end{matrix} \right] = \left[ \begin{matrix} x_{1} \\ x_{2} \end{matrix} \right] * \left[ \begin{matrix} \frac{\partial L}{\partial y_{1}} & \frac{\partial L}{\partial y_{2}} & \frac{\partial L}{\partial y_{3}} \end{matrix} \right] = X^{T} * \frac{\partial L}{\partial Y}

  3. LX=LYWT\frac {\partial L}{\partial X} = \frac {\partial L}{\partial Y} * W^{T}

    假設 Y( x1,x2 ) = Y( u( x1 , x2 ) , f( x1, x2 ) , φ( x1, x2 ) ),其中 u , f , φ 對應着 y1 , y2 , y3 的輸出,以 x1 爲例,有 :

    Lx1=LYYx1=LY(Yuux1,Yffx1,Yφφx1)=LY(w11,w12,w13)T=w11Ly1+w12Ly2+w13Ly3\frac {\partial L}{\partial x_{1}} = \frac {\partial L}{\partial Y } * \frac {\partial Y}{\partial x_{1}} = \frac {\partial L}{\partial Y} * (\frac {\partial Y}{\partial u} * \frac {\partial u}{\partial x_{1}} , \frac {\partial Y}{\partial f} * \frac {\partial f}{\partial x_{1}} , \frac {\partial Y}{\partial φ} * \frac {\partial φ}{\partial x_{1}}) = \frac {\partial L}{\partial Y} * (w_{11} , w_{12} , w_{13})^{T} = w_{11}*\frac {\partial L}{\partial y1} + w_{12} * \frac {\partial L}{\partial y2} + w_{13} * \frac {\partial L}{\partial y3}

    即 :

    Lx1=LY(w11,w12,w13)T\frac {\partial L}{\partial x_{1}} = \frac {\partial L}{\partial Y } * (w_{11} , w_{12} , w_{13})^{T}

    Lx2=LY(w21,w22,w23)T\frac {\partial L}{\partial x_{2}} = \frac {\partial L}{\partial Y} * (w_{21} , w_{22} , w_{23})^{T}

    所以 :

    LX=LY[w11w12w13w21w22w23]=LYWT\frac {\partial L}{\partial X} = \frac {\partial L}{\partial Y} * \left[ \begin{matrix} w_{11} & w_{12} & w_{13} \\ w_{21} & w_{22} & w_{23} \end{matrix} \right] = \frac {\partial L}{\partial Y} * W^{T}

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章