基于计算图的全连接层反向传播导数公式推导

0. 前言

这一部分是斋藤康毅先生所编写的《深度学习入门·基于Python的理论和实现》的补充,在Chapter 5 · 误差反向传播法中,作者对全连接层和Softmax层只给出了公式,对相关推导过程进行了省略,本文主要解决问题为 : 作者在文中所提出的公式如下所示,该公式应如何理解

假设全连接层计算公式为 : Y = X * W + B,则有 :
1. LB=LY\frac {\partial L}{\partial B} = \frac {\partial L}{\partial Y}
2. LW=XTLY\frac {\partial L}{\partial W} = X^{T} * \frac {\partial L}{\partial Y}
3. LX=LYWT\frac {\partial L}{\partial X} = \frac {\partial L}{\partial Y} * W^{T}

PS :

  1. 本文为本人的阅读笔记,只能作为对书本的补充理解,具体的知识请参阅书本
  2. 在阅读前,您需阅读该文章,并对 在正向传播时有分支流出,则反向传播时它们的反向传播值会相加 这一结论有所理解 : 基于计算图的Softmax层反向传播推导

1. 全连接层计算图

Affine层反向传播

2. 公式解释

  1. LB=LY\frac {\partial L}{\partial B} = \frac {\partial L}{\partial Y}

    由图可知,B 与 X * W 关系为相加,对加结点而言,下级导数等于上级导数

  2. LW=XTLY\frac {\partial L}{\partial W} = X^{T} * \frac {\partial L}{\partial Y}

    矩阵形状以图中样例为例,W 矩阵为 (2,3) ,X 矩阵为 (1,2),令 M = X * W ,假设 wij 为对 mi 而言,xj 的权重,由全连接层计算公式,有 : mi=wijxjm_{i} = \sum {w_{ij} * x_{j}}

    所以可知 : wij 在全连接层输出Y的计算中,出现且只出现一次,所以 : Ywij=xj\frac {\partial Y}{\partial w_{ij}} = x_{j}

    又 : 对 mi 而言,上层传递的导数为Lyi\frac{\partial L}{\partial y_{i}}

    以该图为例构造L对参数W的导数矩阵U,以实现更新公式 : W = W - α * U,则有 :
    LW=LYYW=[Ly1x1Ly2x1Ly3x1Ly1x2Ly2x2Ly3x2]=[x1x2][Ly1Ly2Ly3]=XTLY \frac{\partial L}{\partial W} = \frac{\partial L}{\partial Y} * \frac{\partial Y}{\partial W} = \left[ \begin{matrix} \frac{\partial L}{\partial y_{1}} * x_{1} & \frac{\partial L}{\partial y_{2}} * x_{1} & \frac{\partial L}{\partial y_{3}} * x_{1} \\ \frac{\partial L}{\partial y_{1}} * x_{2} & \frac{\partial L}{\partial y_{2}} * x_{2} & \frac{\partial L}{\partial y_{3}} * x_{2} \end{matrix} \right] = \left[ \begin{matrix} x_{1} \\ x_{2} \end{matrix} \right] * \left[ \begin{matrix} \frac{\partial L}{\partial y_{1}} & \frac{\partial L}{\partial y_{2}} & \frac{\partial L}{\partial y_{3}} \end{matrix} \right] = X^{T} * \frac{\partial L}{\partial Y}

  3. LX=LYWT\frac {\partial L}{\partial X} = \frac {\partial L}{\partial Y} * W^{T}

    假设 Y( x1,x2 ) = Y( u( x1 , x2 ) , f( x1, x2 ) , φ( x1, x2 ) ),其中 u , f , φ 对应着 y1 , y2 , y3 的输出,以 x1 为例,有 :

    Lx1=LYYx1=LY(Yuux1,Yffx1,Yφφx1)=LY(w11,w12,w13)T=w11Ly1+w12Ly2+w13Ly3\frac {\partial L}{\partial x_{1}} = \frac {\partial L}{\partial Y } * \frac {\partial Y}{\partial x_{1}} = \frac {\partial L}{\partial Y} * (\frac {\partial Y}{\partial u} * \frac {\partial u}{\partial x_{1}} , \frac {\partial Y}{\partial f} * \frac {\partial f}{\partial x_{1}} , \frac {\partial Y}{\partial φ} * \frac {\partial φ}{\partial x_{1}}) = \frac {\partial L}{\partial Y} * (w_{11} , w_{12} , w_{13})^{T} = w_{11}*\frac {\partial L}{\partial y1} + w_{12} * \frac {\partial L}{\partial y2} + w_{13} * \frac {\partial L}{\partial y3}

    即 :

    Lx1=LY(w11,w12,w13)T\frac {\partial L}{\partial x_{1}} = \frac {\partial L}{\partial Y } * (w_{11} , w_{12} , w_{13})^{T}

    Lx2=LY(w21,w22,w23)T\frac {\partial L}{\partial x_{2}} = \frac {\partial L}{\partial Y} * (w_{21} , w_{22} , w_{23})^{T}

    所以 :

    LX=LY[w11w12w13w21w22w23]=LYWT\frac {\partial L}{\partial X} = \frac {\partial L}{\partial Y} * \left[ \begin{matrix} w_{11} & w_{12} & w_{13} \\ w_{21} & w_{22} & w_{23} \end{matrix} \right] = \frac {\partial L}{\partial Y} * W^{T}

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章