卷積神經網絡CNN的反向傳播算法推導

1. 全連接層

與深度神經網絡DNN的反向傳播算法一致,輔助變量:
{δL=JzL=JaLσ(zL)δl=(Wl+1)Tδl+1σ(zl)\left\{\begin{aligned} &\delta^L = \frac{\partial J}{\partial z^L} = \frac{\partial J}{\partial a^L} \odot \sigma'(z^L)\\ &\\ &\delta^l = (W^{l+1})^T\delta^{l+1}\odot \sigma'(z^l) \end{aligned}\right.
進而求得參數WWbb的梯度:
{JWl=JzlzlWl=δl(al1)TJbl=Jzlzlbl=δl\left\{\begin{aligned} &\frac{\partial J}{\partial W^l} = \frac{\partial J}{\partial z^l} \frac{\partial z^l}{\partial W^l} = \delta^l(a^{l-1})^T\\ &\\ & \frac{\partial J}{\partial b^l} = \frac{\partial J}{\partial z^l} \frac{\partial z^l}{\partial b^l} = \delta^l \end{aligned}\right.

2. 池化層

設池化層的輸入爲ala^{l},輸出爲zl+1z^{l+1},則有:
zl+1=pool(al)z^{l+1} = \text{pool}(a^{l})

δl=Jzl=Jzl+1zl+1alalzl=upsample(δl+1)σ(zl)\delta^{l}= \frac{\partial J}{\partial z^{l}}= \frac{\partial J}{\partial z^{l+1}} \frac{\partial z^{l+1}}{\partial a^{l}}\frac{\partial a^{l}}{\partial z^{l}} = \text{upsample} (\delta^{l+1})\odot \sigma'(z^l)
其中,upsample指在反向傳播時,把δl+1\delta^{l+1}的矩陣大小還原成池化之前的大小,一共分爲兩種情況:

  1. 如果是Max,則把δl+1\delta^{l+1}的各元素值放在之前做前向傳播算法得到最大值的位置,所以這裏需要額外記錄每個區塊中最大元素的位置
  2. 如果是Average,則把δl+1\delta^{l+1}的各元素值取平均後,填入對應的區塊位置。

舉例,設池化層的核心大小是2×22\times2,則:
δl+1=(2846)Max upsample(2000000804000060)\delta^{l+1} = \left( \begin{array}{ccc} 2& 8 \\ 4& 6 \end{array} \right) \xrightarrow{\text{Max upsample}} \left( \begin{array}{ccc} 2&0&0&0 \\ 0&0& 0&8 \\ 0&4&0&0 \\ 0&0&6&0 \end{array} \right)
δl+1=(2846)Average upsample(0.50.5220.50.522111.51.5111.51.5)\delta^{l+1} = \left( \begin{array}{ccc} 2& 8 \\ 4& 6 \end{array} \right) \xrightarrow{\text{Average upsample}} \left( \begin{array}{ccc} 0.5&0.5&2&2 \\ 0.5&0.5&2&2 \\ 1&1&1.5&1.5 \\ 1&1&1.5&1.5 \end{array} \right)
注意,對於Average情況下的反向傳播,容易誤認爲是把梯度值複製幾遍之後直接填入對應的區塊位置。其實很容易理解爲什麼要把梯度值求平均,我們用一個小例子來說明:

假設對四個變量a,b,c,da, b, c, d求平均,得到zz,也即:
z=14(a+b+c+d)z=\frac{1}{4}(a+b+c+d)
那麼,zz關於每個變量的導數都是1/4。反向傳播到zz時,累積的梯度值爲δ\delta,那麼,
{Ja=Jzza=14δJb=Jzzb=14δJc=Jzzc=14δJd=Jzzd=14δ\left\{\begin{aligned} &\frac{\partial J}{\partial a} = \frac{\partial J}{\partial z}\frac{\partial z}{\partial a} = \frac{1}{4}\delta\\ &\frac{\partial J}{\partial b}= \frac{\partial J}{\partial z}\frac{\partial z}{\partial b} = \frac{1}{4}\delta\\ &\frac{\partial J}{\partial c}= \frac{\partial J}{\partial z}\frac{\partial z}{\partial c} = \frac{1}{4}\delta\\ &\frac{\partial J}{\partial d}= \frac{\partial J}{\partial z}\frac{\partial z}{\partial d} = \frac{1}{4}\delta \end{aligned}\right.
這樣就很容易理解了。

3. 卷積層

卷積層的前向傳播公式:
al+1=σ(zl+1)=σ(alWl+1+bl+1)a^{l+1} = \sigma(z^{l+1}) = \sigma(a^l*W^{l+1} + b^{l+1})

δl=Jzl=Jzl+1zl+1alalzl=δl+1Rotation180(Wl+1)σ(zl)\delta^{l}= \frac{\partial J}{\partial z^{l}}= \frac{\partial J}{\partial z^{l+1}} \frac{\partial z^{l+1}}{\partial a^{l}}\frac{\partial a^{l}}{\partial z^{l}} = \delta^{l+1} *\text{Rotation180}(W^{l+1})\odot \sigma'(z^l)
其中Rotation180意思是卷積核WW被旋轉180度,也即上下翻轉一次,接着左右翻轉一次。

詳細推導請參見 https://www.cnblogs.com/pinard/p/6494810.html

參數WWbb的梯度:
{JWl=JzlzlWl=al1δlJbl=Jzlzlbl=u,v(δl)u,v\left\{\begin{aligned} &\frac{\partial J}{\partial W^l} = \frac{\partial J}{\partial z^l} \frac{\partial z^l}{\partial W^l} = a^{l-1}*\delta^l\\ &\\ & \frac{\partial J}{\partial b^l} = \frac{\partial J}{\partial z^l} \frac{\partial z^l}{\partial b^l} = \sum\limits_{u,v}(\delta^l)_{u,v} \end{aligned}\right.
其中,關於WW的梯度沒有旋轉操作,u,v(δl)u,v\sum\limits_{u,v}(\delta^l)_{u,v}意思是把δl\delta^l的所有通道沿通道方向求和,累加成一個通道。

4. 參考資料

感謝 https://www.cnblogs.com/pinard/p/6494810.html

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章